top of page

Titanic Survival Prediction Practice Set Questions



Question 1

This assignment is a scenario-based assignment which uses Titanic Dataset and consists of 3 different questions. Read and understand the requirements and answer the questions carefully.


Dataset: Titanic disaster.


Data Dictionary:

Variable | Definition | Key

  • survival | Survival | 0 = No, 1 = Yes

  • pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd

  • sex | Sex | M or F

  • Age | Age in years

  • sibsp | # of siblings / spouses aboard the Titanic

  • parch | # of parents / children aboard the Titanic

  • ticket | Ticket number

  • fare | Passenger fare

  • cabin | Cabin number

  • embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton


Variable Notes:

  • pclass: A proxy for socio-economic status (SES)

  • 1st = Upper

  • 2nd = Middle

  • 3rd = Lower

  • age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

  • sibsp: The dataset defines family relations in this way...

  • Sibling = brother, sister, stepbrother, stepsister

  • Spouse = husband, wife (mistresses and fiancés were ignored)

  • parch: The dataset defines family relations in this way

  • Parent = mother, father

  • Child = daughter, son, stepdaughter, stepson. Some children travelled only with a nanny, therefore parch=0 for them.


Dataset Path:

The dataset Titanic_train.csv is present at the location


res/Titanic_train.csv 

The dataset Titanic_test.csv is present at the location


res/Titanic_test.csv 

Problem Statement:

You are provided with the datasets about people from the Titanic disaster. Use the dataset resolve the following issues:


Q1: Find the relation of the following columns (having discrete values) with the “Survived” columns and answer the below questions:

  • Pclass

  • Sex

  • Embarked


1. Find the total number of survivors from the 3rd PClass (Titanic_train.csv)


Example: If Total number of suvivor from Pclass(3): 100

Output: 100


2. Find the total number of male who died in the accident (Titanic_train.csv)

3. Find the total number of the survivor who embarked the ship from "Southampton" (Titanic_train.csv)


Hint:

Pclass relation with Survived Column: Group | Total | Survived: 1 | 189 | 116 Sex relation with Survived Column: Group | Total | Survived: female | 262 | 194 Embarked relation with Survived Column:

Group | Total | Survived: C | 146 | 78


***Note: Write the code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***


Final Output Sample:









NOTE: Here, 100, 200 and 300 are the answer of 1st, 2nd and 3rd question respectively.

Output Format:

  • Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

Note: This question will be evaluated based on the number of test cases that your code passes.



Question 2

Dataset: Titanic disaster

Q: Some of the values in the "Age" column are missing. Use Linear Regression model to fill the missing values in the dataset.

(Hint: Dependent Variable(Age)) to fill(predict) the missing values.


1. Print the total number of cells having missing values in the Age column.

Example:

If Total number of cells with missing value is: 100

Output: 100


2. Print the sum of the index number of all the cells with missing values.

Example:

If the Index Number of cells with missing value is: (4,6,20,40)

Output: 70


3. Print the mean of all the new values filled using linear regression. [For this first divide the training dataset into two halves, first half will contain only those rows which have missing values in 'Age' Column(let us say this dataframe (df1), and the second half will contain the rows where you have valid numbers in 'Age' column(let us say this dataframe (df2)). Now we will train our model with df2 and predict the ages on the dataframe df1. Whatever age value we got for the df1 we will calculate the mean of it.]

***NOTE: Please use the features for predicting Age ['Pclass','Survived','GenderLabel']

Example:

If the new filled values are: (25.0,30.0, 30.0,35.0)

Output: 30.0


Steps to be followed: 1. Load the Titanic_train.csv file. 2. Calculate the missing values and count the occurrence. [Hint: You can use the isnull() with sum()] 3. Calculate the sum of the index where missing values are present. [Hint: You can use the is null() and pass the index to a list. Then you can sum the index of the list.] 4. Segregate the rows from the data having missing values(say in dataframe A) and rows from the dataframe having valid age values (say in dataframe B). 5. Convert the encode the string columns. So here we will encode the Sex column to “GenderLabel” columns 6. Now use the datarframe A from step 4 and fit into Linear Regression. [Hint: Use ‘Pclass’, ‘GenderLabel,’ ‘Survived’ as independent features.] 7. Now use the Linear regression model from step 5 and use it to predict the ‘age’ in dataframe B. 8. Once you get the predicted age from step 6, you can use the values to fit into the ‘age’ column of Dataframe B. 9. Calculate the mean for the Dataframe B having the age column and write the integer part of the mean. This will be the answer for part 3

***Note: Do not split the data into train_test split***


Input Dataset path:

res/Titanic_train.csv

Final Output Sample:








NOTE: Here, 100, 200, and 300 are the answer of 1st, 2nd, and 3rd question respectively.

Output Format:

  • Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

***Note: Write the code only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***

Note: This question will be evaluated based on the number of test cases that your code passes.


Question 3

Dataset: Titanic disaster.

Data Dictionary:

Variable | Definition | Key

  • survival | Survival | 0 = No, 1 = Yes

  • pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd

  • sex | Sex | M or F

  • Age | Age in years

  • sibsp | # of siblings / spouses aboard the Titanic

  • parch | # of parents / children aboard the Titanic

  • ticket | Ticket number

  • fare | Passenger fare

  • cabin | Cabin number

  • embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton

After performing the analysis from the previous question, derive a new column called “AdultOrChild” having categorical values as “Adult” or “Child” derived from Age column

Hint: A person having Age >=18 is an “Adult” and the one having Age < 18 is a “Child”.


1. Find its relation with the “Survived” Column and print the total number of survivors.

Example:

If Total survived children: 100, Total survived adults: 200

Output: 300


2. Consider below features to create a Classification model and predict the survived category

  • Pclass

  • Age

  • Sex (Encode values using LabelEncoder)

For the above prediction create a Confusion matrix for the model built by you and print the sum of all the elements of a matrix

***NOTE: 1. You should create the confusion matrix for the test data, not the training data.


2. Write the solution only in solution() function and do not pass any arguments to the function. For predefined stub refer stub.py***

Training Data: 'res/Titanic_train.csv'

Testing Data: 'res/Titanic_test.csv'

Example: If the Confusion Matrix is

[2 2 2 2]

(2+2+2+2)

Output: 8

Hint: Use Logistic Regression as the classification model


3. Use confusion matrix to print the accuracy of the model

Example: (2+2)/8*100

Output: 50

***NOTE: You should check the accuracy for the test data not the training data.


Steps to be followed:

Step 1: In this question, you are supposed to read the CSV file using pandas.

Step 2: Print the total number of cells having missing values in the Age column. Hint: Using .isnull().sum()

Step 3: Find the sum of all the index numbers of the missing values.

Step 4: Derive a new column called “AdultOrChild” having categorical values as “Adult” or “Child” derived from Age column. Hint: A person having Age >=18 is an “Adult” and the one having Age < 18 is a “Child”.

Step 5: Find its relation with the “Survived” Column and print the total number of survivors. Obtain the complete dataset by combining it with the target attribute.

Step 6: Consider mentioned features to create a Classification model and predict the survived category. For the above prediction create a Confusion matrix for the model built by you and print the sum of all the elements

of a matrix. Hint: Use confusion_matrix(Y_train, Y_pred)

Step7: Use logistic regression on the titanic_test.csv data calculate accuracy score using: round(accuracy_score(Y_train, Y_pred)*100,2))

Step8: Finally create a dataframe of the final output and write the output to output.csv which is present at


 output/output.csv

Final Output Sample:









NOTE: Here, 100, 200 and 300 are the answer of 1st, 2nd and 3rd question respectively.

Output Format:

  • Perform the above operations and write (written above as print) your output to a file named output.csv, which should be present at the location output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

***NOTE: For all the questions the numerical values saved in output.csv file should be in integer format with no decimals.

Note: This question will be evaluated based on the number of test cases that your code passes.



If you need any programming assignment help in Machine Learning Programming, Machine Learning project or Machine Learning homework or need solution of above problem then we are ready to help you.


Send your request at realcode4you@gmail.com and get instant help with an affordable price.

We are always focus to delivered unique or without plagiarism code which is written by our highly educated professional which provide well structured code within your given time frame.


If you are looking other programming language help like C, C++, Java, Python, PHP, Asp.Net, NodeJs, ReactJs, etc. with the different types of databases like MySQL, MongoDB, SQL Server, Oracle, etc. then also contact us.

15 Comments


https://tylekeo.design/ mình ghé thử vì đang muốn hiểu nhanh mấy khái niệm kiểu kèo bóng đá với tỷ lệ kèo là gì, chứ trước giờ toàn nghe người ta nói miệng. Lướt vài phút thấy họ viết theo kiểu giải thích từ cơ bản nên đọc khá nhẹ đầu, nhất là đoạn nói tỷ lệ kèo là con số thể hiện khả năng xảy ra của một kèo, đọc phát là hiểu ý luôn. Mình cũng thích cách họ chia đoạn ngắn, tiêu đề đặt rõ nên không bị rối mắt hay phải căng ra tìm ý chính. Nói chung không cần ngồi nghiên cứu lâu vẫn nắm được mạch bài, vì các khối nội dung và heading về kèo/tỷ…

Like

trang chủ lc88 mình vừa lướt thử cho biết thôi, kiểu xem giao diện có dễ dùng không chứ không ngồi đọc kỹ. Cảm giác đầu tiên là trang nhìn thoáng, chữ với mảng nội dung tách ra rõ nên kéo xuống không bị “ngợp”. Mình dùng điện thoại màn hình nhỏ mà vẫn bấm qua lại ổn, không phải zoom nhiều hay mò nút. Tốc độ tải cũng khá nhanh, chuyển mục vài lần không thấy đứng hình gì. Màu sắc nhìn nhẹ mắt nên xem một lúc cũng không bị chói. Nói chung cách họ sắp xếp các khối nội dung trên trang chủ khá gọn và dễ nhìn, nhất là phần bố cục chia ô rõ ràng…

Like

o8 com hôm trước mình thấy bạn bè nhắc qua nên ghé thử cho biết, kiểu vào xem giao diện với cách họ trình bày thôi chứ không có ngồi “cày” gì. Vừa mở trang ra thấy tông màu nhìn sạch, chữ rõ nên lướt một lúc cũng không bị nhức mắt. Mình có đọc qua phần giới thiệu, thấy họ nói có hơn 1000+ trò nên cũng hơi bất ngờ, nhưng mình chỉ quan tâm là tìm thông tin có nhanh không. May là menu đặt khá dễ thấy, bấm qua lại mấy mục không bị loạn, trên điện thoại cuộn cũng mượt. Mấy khối nội dung chia theo tiêu đề gọn gàng, nhìn phát là biết đang ở…

Like

lv88 mình vừa lướt thử vì thấy bạn bè nhắc đâu đó, kiểu tò mò xem trang trông ra sao thôi. Vào cái là thấy giao diện khá sáng sủa, không bị nhồi chữ quá nhiều nên nhìn đỡ mệt mắt. Mình thích kiểu họ chia nội dung rõ ràng, nhất là phần tiêu đề “TIN TỨC” nhìn phát nhận ra ngay chỗ để đọc cập nhật, khỏi phải mò. Kéo xuống cũng ổn, các khối thông tin xếp gọn nên không có cảm giác rối. Mình không có thời gian xem kỹ hết, nhưng cảm giác chung là dễ tìm thứ mình cần hơn mấy trang mình từng bấm nhầm trước đó. Nói chung nhìn qua đã thấy họ…

Like

KKWin mình mới ghé thử do thấy bạn bè nhắc nhẹ, kiểu vào xem giao diện có dễ dùng không thôi. Ấn tượng đầu là trang nhìn thoáng, không bị rối chữ, mấy phần nội dung được chia theo khối nên lướt khá nhanh. Mình để ý họ có mục “Tin tức mới nhất” đặt khá nổi, nên ai thích cập nhật kiểu tin ngắn chắc sẽ tìm được ngay. Đọc qua đoạn giới thiệu thì thấy họ nhấn vào chuyện bảo mật với tốc độ nạp rút, nói ngắn gọn chứ không vòng vo. Mình cũng thích cách tiêu đề to, rõ, kéo xuống là biết đang ở phần nào, không phải bấm qua lại nhiều. Nói chung cảm…

Like

REALCODE4YOU

Realcode4you is the one of the best website where you can get all computer science and mathematics related help, we are offering python project help, java project help, Machine learning project help, and other programming language help i.e., C, C++, Data Structure, PHP, ReactJs, NodeJs, React Native and also providing all databases related help.

Hire Us to get Instant help from realcode4you expert with an affordable price.

USEFUL LINKS

Discount

ADDRESS

Noida, Sector 63, India 201301

Follows Us!

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn

OUR CLIENTS BELONGS TO

  • india
  • australia
  • canada
  • hong-kong
  • ireland
  • jordan
  • malaysia
  • new-zealand
  • oman
  • qatar
  • saudi-arabia
  • singapore
  • south-africa
  • uae
  • uk
  • usa

© 2023 IT Services provided by Realcode4you.com

bottom of page