Project Details

“A new baby's gender, name, time of birth, and birth weight are nice information for a birth announcement, but birth weight is especially important for an obstetrician. A large size at delivery has long been associated with an increased risk of injuries to a newborn and its mom. So the better a doctor can predict birth weight, the easier the delivery may be.”

Ultrasound is a popular way of doing it. But, aha! You are a Data Scientist (or going to be one). You can amaze people by predicting the birth weight way earlier than ultrasound, right? In this assignment, let’s do this.

Dataset

• Login to Canvas > Assignments > Programming Assignment 1

• You will get the following 3 files

baby-weights-dataset2.csv

▪ It has 101400 rows (samples) with 37 columns (variables). Each sample represent a case

for a new-born. It contains 37 variables (just mentioned! Haha) about it. Very last column of it is “BWEIGHT”, that true weight of the new-born (in lbs unit). Actually, this needs to be considered as the target variable here.

data-description.txt

▪ You will see that the name of the 37 variables are actually contracted form of some sort. And, the source of the dataset did not offer me description of every single of them. But, after studying about them, I could elaborate only few of them. Please pardon my laziness. Okay, this file contains few descriptions for the variables. All the rest are mostly talking about the Mother’s medical history and all. No big deal, I guess, for you to work with these variables without knowing their meaning.

judge-without-label.csv

▪ This is an interesting file. It contains new samples: additional 2001 rows with 36 columns (without the BWEIGHT target column). Once again, this should be part of the training, as there are no ground truth target labels, right? Once the training is complete with the dataset provided above, you must apply your prediction algorithm to predict BWEIGHT of these 2001 samples, and submit the result as part of your assignment submission.

Tasks

Please read the PA1-skeleton-Ashis-Biswas.ipynb file using Jupyter Notebook to learn about 15 mandatory tasks, and 2 additional tasks for graduate students (CSCI-5930).

TASK 1: Import all the necessary packages here

TASK 2: Load the dataset into memory so that you can play with it here

TASK 3: Compute mean, stdev, min, max, 25% percentile, median and 75% percentile of the dataset (BWEIGHT variable)

TASK 4: Also, draw the histogram plot for the BWEIGHT variable

TASK 5: Present the skewness and kurtosis of the BWEIGHT target variable

TASK 6: Do variable selection from the pool of 36 variables based on correlation score with the target variable BWEIGHT

Please report all the variables you kept for training.

TASK 7: Check for missing data, and apply a "good" strategy to tackle it

TASK 8: Tackle the dummy categorical variables by introducing dummy variables

TASK 9.1: Randomly split the dataset into training, Tr (80%) and testing, Te (20%)

TASK 9.2: On the training dataset, apply a normalization technique

TASK 9.3: Apply the training data statistics to normalize the testing data as well.

TASK 10: Find the linear regression function describing the training dataset using a technique you recently learned in class. CLOSED-FORM vs. Gradient Descent (batch or stochastic or mini-batch).

PLEASE DO NOT CALL ANY LIBRARY FUNCTION THAT MIGHT DO THE TASK FOR YOU. If you do, you are most likely get a ZERO for this assignment.

Task 11: Predict BWEIGHT target variable for each of the testing dataset using the regression line you learned in Task 10, and report RMSE(testing) (Root Mean Squared Error)

### Repeat TASK 10 additional four times : Run linear regression training again

### After each run, Report RMSE(testing)

Task 12: Finally, Report RMSE(testing) = Average(RMSE_test) $\pm$ Stdev(RMSE_test)

Here Average(RMSE_test) = average of all the 5 RMSE(testing) scores you got above.

And, stdev(RMSE_test) = standard deviation of all the 5 RMSE(testing) scores above.

Task 13: Run linear regression one last time on the whole dataset (i.e, training+testing which is preprocessed by you above).

Task 14: Preprocess the judge-without-label.csv file according টo the strategy you applied above on the whole dataset (task 13)

Task 15: Predict BWEIGHT for each of the samples from the judge-without-label.csv file, and save the results in judge-submission-run-1.csv in the format below. Please change the run number and report what changes you have made in a corresponding file run-1.txt.

<Realcode4you> Assignment Help

<Realcode4you> Web Assignment Help

Need help in this type machine learning project assignments, we are ready to help any types of machine learning projects, contact here to get help

#machinelearningprojecthelp #machinelearning #machineLearninghomeworkhelp #machinelearningassignmenthelp