Data
Refer to dataset and associated user information guide provided by Sustainable Energy Authority of Ireland (SEAI) at:
Each row of this dataset corresponds to an energy efficiency assessment conducted on a residential or commercial building in Ireland. Based on several factors considered during assessment, each building is assigned a Building Energy Rating (BER).
User information guide on the above web page explains each factor/ variable in the dataset. To know more about BER, you can refer to blog at: https://www.zurich.ie/blog/ber-rating-guide/
Note – this dataset is huge with more than a million rows. You are allowed to truncate it to 200,000 rows to speed up your analysis.
Task
If a BER of B or above indicates high energy efficiency and a BER below B indicates low energy efficiency, then employ classification techniques to understand which factors are the most significant predictors of high energy efficiency of a building.
Implement any three classification algorithms that you deem appropriate to complete the task.
Finally, based on your findings and information available over the internet, make a costbenefit analysis case for property owners thinking of upgrading BER of their buildings.
Deliverables & Assessment Rubric
1. Python Code (.py) (40% Weighting)
Key Assessment Areas: Does the python code demonstrate the functionality required to implement classification effectively on data (i.e., data preparation, model training, and evaluation)? Are implemented algorithms appropriate for the given task?
Is the code running without errors? Does the code come with copious explanatory comments for various steps involved?
2.Presentation and viva
Key assessment areas: Knowledge emergent during presentation and viva. A crisp 5-minute presentation providing critical analysis of the following points in context of the given task:
a) Choice of algorithms and evaluation metric.
b) Factors found significant predictors of high energy efficiency.
c) Cost-benefit analysis case for property owners thinking of upgrading BER of their
buildings.
Code Implementation
#Connecting to Google Drive for Loading Data
#Importing Modules
#Load the Data
#Explore the Data
# Set the display option to show all columns >>
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None) 1
# Lets explore the first five rows of dataset >>
mydata.head()
...
...
Shape of the Dataset
output:
[$] Rows Of Dataset >> 200000
[$] Columns Of Dataset >> 211
Categorical and Numeric Columns
output:
[$] Total Categorical Columns 54
[$] Total Numerical Columns 142
Print Null Values
output:
#Remove More Null Values Columns
output:
[$] Rows Of Dataset >> 200000
[$] Columns Of Dataset >> 168
output:
[$] Total Categorical Columns 46
[$] Total Numerical Columns 107
output:
#Preprocess Target Variable
output:
#Convert Energy Rating to Multiclass Classification Variable
#Exploratory Data Analysis & Data Preprocessing
output:
output:
Count Plot
Output:
# Handling missing values
# Encoding categorical variables
#Data Spliting
#Model Building
output:
Need help with your Java assignments? Our experienced programmers offer top-quality Java assignment help, covering everything from basic concepts to advanced programming. Get original, well-documented code and timely delivery to ensure excellent grades. Available 24/7 to support your studies. Contact us now for reliable and affordable Java assignment help.