top of page

Data Mining Assignment and Project Help | Energy-Classification Using BERPublicsearch.csv Dataset | Realcode4you

Data

Refer to dataset and associated user information guide provided by Sustainable Energy Authority of Ireland (SEAI) at:


Each row of this dataset corresponds to an energy efficiency assessment conducted on a residential or commercial building in Ireland. Based on several factors considered during assessment, each building is assigned a Building Energy Rating (BER).


User information guide on the above web page explains each factor/ variable in the dataset. To know more about BER, you can refer to blog at: https://www.zurich.ie/blog/ber-rating-guide/


Note – this dataset is huge with more than a million rows. You are allowed to truncate it to 200,000 rows to speed up your analysis.


Task

If a BER of B or above indicates high energy efficiency and a BER below B indicates low energy efficiency, then employ classification techniques to understand which factors are the most significant predictors of high energy efficiency of a building.


Implement any three classification algorithms that you deem appropriate to complete the task.


Finally, based on your findings and information available over the internet, make a costbenefit analysis case for property owners thinking of upgrading BER of their buildings.


Deliverables & Assessment Rubric

1. Python Code (.py) (40% Weighting)

Key Assessment Areas: Does the python code demonstrate the functionality required to implement classification effectively on data (i.e., data preparation, model training, and evaluation)? Are implemented algorithms appropriate for the given task?


Is the code running without errors? Does the code come with copious explanatory comments for various steps involved?


2.Presentation and viva

Key assessment areas: Knowledge emergent during presentation and viva. A crisp 5-minute presentation providing critical analysis of the following points in context of the given task:

a) Choice of algorithms and evaluation metric.

b) Factors found significant predictors of high energy efficiency.

c) Cost-benefit analysis case for property owners thinking of upgrading BER of their

buildings.



Code Implementation


#Connecting to Google Drive for Loading Data


#Importing Modules


#Load the Data


#Explore the Data
# Set the display option to show all columns >> 
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None) 1 
# Lets explore the first five rows of dataset >> 
mydata.head()

...

...



Shape of the Dataset

output:

[$] Rows Of Dataset >> 200000

[$] Columns Of Dataset >> 211


Categorical and Numeric Columns

output:

[$] Total Categorical Columns 54

[$] Total Numerical Columns 142


Print Null Values

output:


#Remove More Null Values Columns

output:

[$] Rows Of Dataset >> 200000

[$] Columns Of Dataset >> 168


output:

[$] Total Categorical Columns 46

[$] Total Numerical Columns 107


output:


#Preprocess Target Variable

output:


#Convert Energy Rating to Multiclass Classification Variable


#Exploratory Data Analysis & Data Preprocessing

output:

output:


Count Plot

Output:


# Handling missing values


# Encoding categorical variables


#Data Spliting


#Model Building


output:


3 views0 comments

Comments


bottom of page