Implementing Machine Learning Model Using Tkinter GUI | Machine Learning GUI Project Help

realcode4you
Jun 20, 2022
4 min read

Requirement Details

The purpose of the project is to create a simple analytics program with a Tkinter GUI.

This project is a GUI implementation of the Case Study 15.5 from the Python textbook. For each section, you will need to have user interaction and the ability to restart at any time. The user must also be prevented from completing an action that is invalid. For example, the user cannot explore the data until the data is loaded.

Load the dataset (15.5.1)
Explore the data (15.5.2)
Split the data for training and testing (15.5.4)
Train the data model (15.5.5)
Test the data model (15.5.6)
Visualize the expected vs. predicted (15.5.7)
Create the regression model metrics (15.5.8)

Problem Statement

In this task I want to analyse Boston House Data and then creating basic GUI related python Tkinter tool. I would try to spot Real Estate trends for BostonSuburbs, or predict sale value of residential property in Boston suburbs based on critical factors. I have downloaded the dataset from Kaggle linked here. Recent time large number of houses society developed in every year and it is the good place for residence. Prediction of Real Estate Investment Data is deciding the appropriate price which is applicable for both buyer and investor. Now a day many Real Estate Investor invest the money every year to develop the society from last decades. In every year many societies developed. In this some of the gain profit and some of not success. Main objective of each Real Estate Investor to gain profit to earning point and make it suitable for buyer for both viewer point and price. In future it gives more accurate result when data is increases.

Description of the Data

There are 13 columns and 511 records in this dataset, the details are listed in the table below.

Attributes Data Type Description of attribute

CRIM - Numeric per capita crime rate by town

ZN - Numeric proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - Numeric proportion of non-retail business acres per town.

CHAS - Categorical Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - Numeric nitric oxides concentration (parts per 10 million)

RM - Numeric average number of rooms per dwelling

AGE - Numeric proportion of owner-occupied units built prior to 1940

DIS - Numeric weighted distances to five employment centres

RAD - Categorical index of accessibility to radial highways

TAX - Numeric full-value property-tax rate per $10,000

PTRATIO - Numeric pupil-teacher ratio by town

LSTAT - Numeric % lower status of the population

MEDV - Numeric Median value of occupied homes in $1000’s

Data Pre-Processing

Any data or real-world data generally contains many issues like noises, missing values, and not given in proper format which cannot be directly used for machine learning algorithms. This is the process for cleaning the data and making it suitable for a ML model to increase the model efficiency and increase the accuracy of the model also.

Removing Null Values:

In our code we use below methods to remove null values from dataset column:

#Data Processing
#Remove Missing Values by Median
df['RM'].fillna(df['RM'].median(), inplace=True)

Feature Selection

This is the next steps after pre-process the dataset. In machine learning feature selection is the process of reducing the number of input variables when developing a predictive model.

#Deviding the target and features variables
    X = df.drop('MEDV', axis = 1)
    Y = df['MEDV']
X is a feature variable and Y is the target variable

Code Implementation

Here in above code block I import all important libraries which used to create tkinter GUI and used for data analytics and data visualization.

In the data, the column RM has some missing values so we need to remove these missing values and fill it using median. This is the data pre-processing step.

Here we need to select the features and target variable to predict the machine learning model.

Above the sklearn train_test_split method which used for split the dataset.

The next step is how to train the model

Here Linear Regression use to train the model:

import sys
from tkinter import *
import pandas as pd
from sklearn import linear_model
import tkinter as tk 
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from sklearn import preprocessing

# Import 'train_test_split'
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

def ml_tkinter_gui():
    global root
    # tkinter GUI
    root= tk.Tk()
    
    #Read Dataset
    df = pd.read_csv("Boston Real Est.csv")

    #Data Processing
    #Remove Missing Values by Median
    df['RM'].fillna(df['RM'].median(), inplace=True)

    #Deviding the target and features variables
    X = df.drop('MEDV', axis = 1)
    Y = df['MEDV']

    #Normalize the Features usign MinMaxScaler
    min_max_scaler = preprocessing.MinMaxScaler()
    X_scaled = min_max_scaler.fit_transform(X)

    # Shuffle and split the data 
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=2)


    # Train the model
    regr = linear_model.LinearRegression()
    regr.fit(X, Y)

    #Creating canvas to show the result
    canvas1 = tk.Canvas(root, width = 500, height = 450)
    canvas1.pack()

    #Accuracy Matrices
    def score():
        scores = cross_val_score(regr, X_train, y_train)
        Prediction_result  = ('Scores: ', scores)
        label_Prediction = tk.Label(root, text= Prediction_result, bg='orange')
        canvas1.create_window(20, 200, window=label_Prediction)

    #Test the data model
    def test_score():
        y_pred = regr.predict(X_test)
        Prediction  = ('test Scores: ', y_pred)
        label_Prediction = tk.Label(root, text= Prediction, bg='orange')
        canvas1.create_window(475, 300, window=label_Prediction)

    #Function to close the window
    def close_window():
        root.destroy()


    #Add butoon
    button = tk.Button(text = "Click and Quit", command = close_window, bg='red')
    #Place the button on the x=700 and y=90 window position
    button.place(x=900, y=90)

    #Creating 'Calculate Score' button   
    button1 = tk.Button (root, text='Calculate Score',command=score, bg='orange') # button to call the 'score' command above 
    canvas1.create_window(20, 100, window=button1)

    #Creating 'Calculate Score' button  
    button2 = tk.Button (root, text='Calculate test score',command=test_score, bg='orange') # button to call the 'test_score' command above 
    canvas1.create_window(300, 100, window=button2)


    #Add butoon
    button = tk.Button(text = "Referece & Restart", command = refresh, bg='green')
    #Place the button on the x=700 and y=90 window position
    button.place(x=200, y=400)
            
     
    #plot 1st scatter 
    figure3 = plt.Figure(figsize=(5,4), dpi=100)
    ax3 = figure3.add_subplot(111)
    ax3.scatter(df['PTRATIO'].astype(float),df['LSTAT'].astype(float), color = 'r')
    scatter3 = FigureCanvasTkAgg(figure3, root) 
    scatter3.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
    ax3.legend(['PTRATIO']) 
    ax3.set_xlabel('LSTAT')
    ax3.set_title('PTRATIO Vs. LSTAT')


    #plot 2nd scatter 
    figure4 = plt.Figure(figsize=(5,4), dpi=100)
    ax4 = figure4.add_subplot(111)
    ax4.scatter(df['RM'].astype(float),df['LSTAT'].astype(float), color = 'g')
    scatter4 = FigureCanvasTkAgg(figure4, root) 
    scatter4.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
    ax4.legend(['RM']) 
    ax4.set_xlabel('LSTAT')
    ax4.set_title('RM Vs. LSTAT')

    root.mainloop()


if __name__ == '__main__':
    def refresh():
        root.destroy()
        ml_tkinter_gui()

    ml_tkinter_gui()