top of page

Data Mining Assignment and Project Help | Energy-Classification Using BERPublicsearch.csv Dataset | Realcode4you


Refer to dataset and associated user information guide provided by Sustainable Energy Authority of Ireland (SEAI) at:

Each row of this dataset corresponds to an energy efficiency assessment conducted on a residential or commercial building in Ireland. Based on several factors considered during assessment, each building is assigned a Building Energy Rating (BER).

User information guide on the above web page explains each factor/ variable in the dataset. To know more about BER, you can refer to blog at:

Note – this dataset is huge with more than a million rows. You are allowed to truncate it to 200,000 rows to speed up your analysis.


If a BER of B or above indicates high energy efficiency and a BER below B indicates low energy efficiency, then employ classification techniques to understand which factors are the most significant predictors of high energy efficiency of a building.

Implement any three classification algorithms that you deem appropriate to complete the task.

Finally, based on your findings and information available over the internet, make a costbenefit analysis case for property owners thinking of upgrading BER of their buildings.

Deliverables & Assessment Rubric

1. Python Code (.py) (40% Weighting)

Key Assessment Areas: Does the python code demonstrate the functionality required to implement classification effectively on data (i.e., data preparation, model training, and evaluation)? Are implemented algorithms appropriate for the given task?

Is the code running without errors? Does the code come with copious explanatory comments for various steps involved?

2.Presentation and viva

Key assessment areas: Knowledge emergent during presentation and viva. A crisp 5-minute presentation providing critical analysis of the following points in context of the given task:

a) Choice of algorithms and evaluation metric.

b) Factors found significant predictors of high energy efficiency.

c) Cost-benefit analysis case for property owners thinking of upgrading BER of their


Code Implementation

#Connecting to Google Drive for Loading Data

#Importing Modules

#Load the Data

#Explore the Data
# Set the display option to show all columns >> 
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None) 1 
# Lets explore the first five rows of dataset >> 



Shape of the Dataset


[$] Rows Of Dataset >> 200000

[$] Columns Of Dataset >> 211

Categorical and Numeric Columns


[$] Total Categorical Columns 54

[$] Total Numerical Columns 142

Print Null Values


#Remove More Null Values Columns


[$] Rows Of Dataset >> 200000

[$] Columns Of Dataset >> 168


[$] Total Categorical Columns 46

[$] Total Numerical Columns 107


#Preprocess Target Variable


#Convert Energy Rating to Multiclass Classification Variable

#Exploratory Data Analysis & Data Preprocessing



Count Plot


# Handling missing values

# Encoding categorical variables

#Data Spliting

#Model Building


3 views0 comments


bottom of page