Titanic Survival Prediction Practice Set Questions

realcode4you
Aug 12, 2021
6 min read

Question 1

This assignment is a scenario-based assignment which uses Titanic Dataset and consists of 3 different questions. Read and understand the requirements and answer the questions carefully.

Dataset: Titanic disaster.

Data Dictionary:

Variable | Definition | Key

survival | Survival | 0 = No, 1 = Yes
pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd
sex | Sex | M or F
Age | Age in years
sibsp | # of siblings / spouses aboard the Titanic
parch | # of parents / children aboard the Titanic
ticket | Ticket number
fare | Passenger fare
cabin | Cabin number
embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes:

pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way
Parent = mother, father
Child = daughter, son, stepdaughter, stepson. Some children travelled only with a nanny, therefore parch=0 for them.

Dataset Path:

The dataset Titanic_train.csv is present at the location

res/Titanic_train.csv

The dataset Titanic_test.csv is present at the location

res/Titanic_test.csv

Problem Statement:

You are provided with the datasets about people from the Titanic disaster. Use the dataset resolve the following issues:

Q1: Find the relation of the following columns (having discrete values) with the “Survived” columns and answer the below questions:

Pclass
Sex
Embarked

1. Find the total number of survivors from the 3rd PClass (Titanic_train.csv)

Example: If Total number of suvivor from Pclass(3): 100

Output: 100

2. Find the total number of male who died in the accident (Titanic_train.csv)

3. Find the total number of the survivor who embarked the ship from "Southampton" (Titanic_train.csv)

Hint:

Group | Total | Survived: C | 146 | 78

***Note: Write the code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***

Final Output Sample:

NOTE: Here, 100, 200 and 300 are the answer of 1st, 2nd and 3rd question respectively.

Output Format:

Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

Note: This question will be evaluated based on the number of test cases that your code passes.

Question 2

Dataset: Titanic disaster

Q: Some of the values in the "Age" column are missing. Use Linear Regression model to fill the missing values in the dataset.

(Hint: Dependent Variable(Age)) to fill(predict) the missing values.

1. Print the total number of cells having missing values in the Age column.

Example:

If Total number of cells with missing value is: 100

Output: 100

2. Print the sum of the index number of all the cells with missing values.

Example:

If the Index Number of cells with missing value is: (4,6,20,40)

Output: 70

3. Print the mean of all the new values filled using linear regression. [For this first divide the training dataset into two halves, first half will contain only those rows which have missing values in 'Age' Column(let us say this dataframe (df1), and the second half will contain the rows where you have valid numbers in 'Age' column(let us say this dataframe (df2)). Now we will train our model with df2 and predict the ages on the dataframe df1. Whatever age value we got for the df1 we will calculate the mean of it.]

***NOTE: Please use the features for predicting Age ['Pclass','Survived','GenderLabel']

Example:

If the new filled values are: (25.0,30.0, 30.0,35.0)

Output: 30.0

Steps to be followed: 1. Load the Titanic_train.csv file. 2. Calculate the missing values and count the occurrence. [Hint: You can use the isnull() with sum()] 3. Calculate the sum of the index where missing values are present. [Hint: You can use the is null() and pass the index to a list. Then you can sum the index of the list.] 4. Segregate the rows from the data having missing values(say in dataframe A) and rows from the dataframe having valid age values (say in dataframe B). 5. Convert the encode the string columns. So here we will encode the Sex column to “GenderLabel” columns 6. Now use the datarframe A from step 4 and fit into Linear Regression. [Hint: Use ‘Pclass’, ‘GenderLabel,’ ‘Survived’ as independent features.] 7. Now use the Linear regression model from step 5 and use it to predict the ‘age’ in dataframe B. 8. Once you get the predicted age from step 6, you can use the values to fit into the ‘age’ column of Dataframe B. 9. Calculate the mean for the Dataframe B having the age column and write the integer part of the mean. This will be the answer for part 3

***Note: Do not split the data into train_test split***

Input Dataset path:

res/Titanic_train.csv

Final Output Sample:

NOTE: Here, 100, 200, and 300 are the answer of 1st, 2nd, and 3rd question respectively.

Output Format:

Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

***Note: Write the code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***

Note: This question will be evaluated based on the number of test cases that your code passes.

Question 3

Dataset: Titanic disaster.

Data Dictionary:

Variable | Definition | Key

survival | Survival | 0 = No, 1 = Yes
pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd
sex | Sex | M or F
Age | Age in years
sibsp | # of siblings / spouses aboard the Titanic
parch | # of parents / children aboard the Titanic
ticket | Ticket number
fare | Passenger fare
cabin | Cabin number
embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton

After performing the analysis from the previous question, derive a new column called “AdultOrChild” having categorical values as “Adult” or “Child” derived from Age column

Hint: A person having Age >=18 is an “Adult” and the one having Age < 18 is a “Child”.

1. Find its relation with the “Survived” Column and print the total number of survivors.

Example:

If Total survived children: 100, Total survived adults: 200

Output: 300

2. Consider below features to create a Classification model and predict the survived category

Pclass
Age
Sex (Encode values using LabelEncoder)

For the above prediction create a Confusion matrix for the model built by you and print the sum of all the elements of a matrix

***NOTE: 1. You should create the confusion matrix for the test data, not the training data.

2. Write the solution only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***

Training Data: 'res/Titanic_train.csv'

Testing Data: 'res/Titanic_test.csv'

Example: If the Confusion Matrix is

[2 2 2 2]

(2+2+2+2)

Output: 8

Hint: Use Logistic Regression as the classification model

3. Use confusion matrix to print the accuracy of the model

Example: (2+2)/8*100

Output: 50

***NOTE: You should check the accuracy for the test data not the training data.

Steps to be followed:

Step 1: In this question, you are supposed to read the CSV file using pandas.

Step 2: Print the total number of cells having missing values in the Age column. Hint: Using .isnull().sum()

Step 3: Find the sum of all the index numbers of the missing values.

Step 4: Derive a new column called “AdultOrChild” having categorical values as “Adult” or “Child” derived from Age column. Hint: A person having Age >=18 is an “Adult” and the one having Age < 18 is a “Child”.

Step 5: Find its relation with the “Survived” Column and print the total number of survivors. Obtain the complete dataset by combining it with the target attribute.

Step 6: Consider mentioned features to create a Classification model and predict the survived category. For the above prediction create a Confusion matrix for the model built by you and print the sum of all the elements

of a matrix. Hint: Use confusion_matrix(Y_train, Y_pred)

Step7: Use logistic regression on the titanic_test.csv data calculate accuracy score using: round(accuracy_score(Y_train, Y_pred)*100,2))

Step8: Finally create a dataframe of the final output and write the output to output.csv which is present at

 output/output.csv

Final Output Sample:

NOTE: Here, 100, 200 and 300 are the answer of 1st, 2nd and 3rd question respectively.

Output Format:

Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv
output.csv should contain the answer to each question on consecutive rows.

***NOTE: For all the questions the numerical values saved in output.csv file should be in integer format with no decimals.

Note: This question will be evaluated based on the number of test cases that your code passes.

If you need any programming assignment help in Machine Learning Programming, Machine Learning project or Machine Learning homework or need solution of above problem then we are ready to help you.

Send your request at realcode4you@gmail.com and get instant help with an affordable price.

We are always focus to delivered unique or without plagiarism code which is written by our highly educated professional which provide well structured code within your given time frame.

If you are looking other programming language help like C, C++, Java, Python, PHP, Asp.Net, NodeJs, ReactJs, etc. with the different types of databases like MySQL, MongoDB, SQL Server, Oracle, etc. then also contact us.

29 Comments

laurasanms311989

3 days ago

https://www.play456bd.com/ I clicked in just to see what it looked like after it popped up in a convo, and honestly the layout was easier to read than I expected. I didn’t spend time digging into the actual stuff on there, but the way the main sections are split into clear blocks makes it feel less messy right away. Also noticed the info areas (the table/list-looking parts) are lined up neatly, so your eyes don’t bounce all over the place trying to find what’s what. I’m usually impatient with cluttered sites, so it was nice not having to fight the page to get oriented. The menu placement plus those clean, separated content blocks are what stood out on the interface.

katrinacha.vez.52.0.2

4 days ago

xem bóng đá trực tiếp mình cũng chỉ nghe bạn bè nói rồi bấm vào coi thử cho biết thôi. Mình không phải kiểu canh giờ xem đủ mọi trận, chủ yếu xem trang chạy có mượt và dễ tìm không. Vào cái là thấy giao diện khá thoáng, không bị rối mắt, nhìn lướt là biết chỗ nào là lịch trận với phần thông tin đi kèm. Mình để ý chất lượng hình ổn, kiểu hướng tới Full HD nên xem lâu cũng đỡ khó chịu hơn mấy trang hay giật lag. Cái mình thích nữa là mục tin thông tin thể thao cập nhật liên tục, kéo xuống là thấy các khối nội dung xếp ngay ngắn, menu…

nolafo.wle156+abc123

5 days ago

xoso66 đăng nhập xong mình vào coi thử vì thấy mọi người bàn tán, kiểu tò mò thôi chứ không phải dân “cày” gì. Lướt một vòng thấy trang này làm phần xổ số gọn gàng hơn mình tưởng, chia theo từng miền nhìn phát là biết đang ở mục nào, không bị dồn chữ tùm lum. Mình hay cần xem nhanh lịch quay với kết quả nên thích cách họ để mấy khối thông tin ngay tầm mắt, cuộn trên điện thoại cũng không phải zoom tới zoom lui. Có cái hay là nội dung kiểu “hôm nay có gì” và chỗ tra cứu kết quả được đặt khá rõ, nên vào là thấy liền, khỏi phải mò menu…

elsiebre.we.r1.6.921

Jul 03

new88 mình lướt thử vì thấy bạn bè nhắc, cũng không kỳ vọng gì nhiều. Vừa vào là thấy giao diện khá gọn, kiểu mọi thứ được chia thành từng mảng nên nhìn phát biết chỗ nào là chỗ nào, không bị “ngợp” chữ. Mình để ý cái thanh menu đặt chỗ làm khá rõ ràng, bấm qua mấy mục thấy chuyển nhanh, không phải mò lâu hay bị vòng vòng. Màu sắc nhìn dịu mắt, chữ cũng vừa phải nên đọc thông tin không mệt. Nói chung dùng vài phút thấy ổn áp, nhất là cách họ sắp xếp các khối nội dung theo cột và để menu nổi bật ngay trên giao diện.

terrancecart.e.r.36.0.7

Jul 03

Ball88 mình lướt thử vì thấy mọi người nhắc hoài, kiểu vào xem giao diện thôi chứ không có ý định “soi” gì sâu. Cảm giác đầu tiên là trang làm khá dễ thở, nhìn gọn và không bị nhồi chữ. Mình để ý họ gom nhiều thứ trong cùng một hệ sinh thái nên bấm qua lại giữa các khu cũng nhanh, không phải vòng vèo nhiều bước. Có mấy đoạn giới thiệu tổng quan đặt ngay phía trên nên người mới đọc lướt cũng hiểu đại khái nền tảng này hướng tới trải nghiệm nhanh, dễ dùng cho người Việt. Nói chung mình thấy họ sắp xếp nội dung theo cụm rõ ràng, đặc biệt là phần menu…

RealCode4You

Titanic Survival Prediction Practice Set Questions

Recent Posts

29 Comments