top of page

Machine Learning Sample | Practice Set


In this post we have add some important machine learning related questions which help to improve your and data science concept:


Question 1:

DESCRIPTION

Problem Statement: Build a tic-tac-toe game classification algorithm using the concept of supervised machine learning.


Requirements:

  • Python 3.6

  • Scikit-Learn

  • Pandas and Numpy

Dataset Used: tic-tac-toe.txt


Attribute Description: Name|Type|Description

top_left_square | string | Value includes x,o or b for blank

top_middle_square | string | Value includes x,o or b for blank

top_right_square | string | Value includes x,o or b for blank

middle_left_square | string | Value includes x,o or b for blank

middle_middle_square | string | Value includes x,o or b for blank

middle_right_square | string | Value includes x,o or b for blank

bottom_left_square | string | Value includes x,o or b for blank

bottom_middle_square | string | Value includes x,o or b for blank

bottom_right_square | string | Value includes x,o or b for blank

class | string | Predictor class: Values can be positive (X won) or negative (X lost or tied)


Dataset Description:

This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when "x" has one of 8 possible ways to create a "three-in-a-row").

Training dataset:

This dataset will be used to test the developer's solution. It will be available at


/data/train/tic-tac-toe.data.txt

Tasks to be performed:

1. Data Preprocessing:

Use random_state = 3 while splitting the dataset into train and test set.


Label Val | Decoded Val (features) | 0 | b | 1 | o | 2 | x Label Val | Decoded Val (class) 0 | negative 1 | positive


Hint: Use the concept of label encoding i.e. map the parameters manually


2. Create a Random Forest Model (random_state = 0) using k- Cross-Validation Technique.


3. Apply Ada Boost algorithm to improve the accuracy score (random_state = 0).


Hint: For the above scenario, you can choose the best value of k (from 2 to 10) for Cross-Validation and use n_esitmator = 100, n_splits=20 (You need to understand which parameter to use and when).


Print the accuracy score before and after implementing Ada Boost Algorithm.


Output Format:

  • Perform the above operations and write your output to a file named output.csv, which should be present at the location /code/output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

NOTE: If accuracy before implementing ada boost is 0.713 and after implementing is 0.811 then create a list result = [0.713, 0.811] and convert it to a CSV file(The process of which is mentioned in the stub).

import pandas as pd
import numpy as np 
import seaborn as sns
train=pd.read_csv('/data/training/tic-tac-toe.data.txt')
#********Write your code here***************
#*******************************************
#*******************************************
result=[0.713, 0.811]
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('/code/output/output.csv', header=False, index=False)

Question 2:

DESCRIPTION

Dataset Used: PredictionsFor4April2019.csv

Problem Statement: ABC Company has made a model to predict the daily number of units sold of different products.











You have to help this company to get the metrics at the Country level.

Write python code for computing the following metrics using mean_squared_error function:

  1. RMSE for Country DE

  2. RMSE for Country AT

  3. RMSE for Country PL

  • Calculate up to 2 decimal places

  • Perform the above operations and write your output to a file named output.csv, which should be present at the location /code/output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

NOTE: If the answer for 1st, 2nd and 3rd questions are 0.7,0.6 and 0.8 respectively, then create a list result = [0.7, 0.6, 0.8] and convert it to a CSV file(The process of which is mentioned in the stub).

import pandas as pd
import numpy as np
forecast=pd.read_csv('/data/training/PredictionsFor4April2019.csv')
#********Write your code here***************
#******************************************


QUESTION 3

DESCRIPTION

Dataset Used: PredictionsFor4April2019.csv

Problem Statement: ABC Company has made a model to predict the daily number of units sold of different products.










You have to help this company to get the metrics at the Country level.

Write python code for computing the following metrics :


1. Percentage of Identical Predictions for Country DE

2. Percentage of Identical Predictions for Country AT

3. Percentage of Identical Predictions for Country PL


Output Format:

  • Calculate up to 2 decimal places (example for DE it is 60.28)

  • Perform the above operations and write your output to a file named output.csv, which should be present at the location /code/output/output.csv

  • output.csv should contain the answer to each question on consecutive rows.

NOTE: If the answer for 1st, 2nd and 3rd questions are 0.7,0.6 and 0.8 respectively, then create a list result = [0.7, 0.6, 0.8] and convert it to a CSV file(The process of which is mentioned in the stub).

import pandas as pd
import numpy as np
forecast=pd.read_csv('/data/training/PredictionsFor4April2019.csv')
#********Write your code here***************
#*******************************************
#*******************************************
result=[0.7, 0.8,0.97]
result=pd.DataFrame(result)
#writing output to output.csv
result.to_csv('/code/output/output.csv', header=False, index=False)


QUESTION 4:

DESCRIPTION

Problem Statement: The dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to diagnostically predict whether a patient is diabetic or not, based on diagnostic measurements included in the dataset. Create a classification model using AdaBoost Algorithm and XGBoost Algorithm. Use grid search to find the optimal value for the hyperparameters: learning_rate and n_estimators.

Dataset:

  • diabetes_train.csv

  • diabetes_test.csv

Dataset Parameters:

  • Pregnancies: Number of times pregnancies(0-14)

  • Glucose: Glucose Level (0-198)

  • BloodPressure: Diastolic blood pressure (0-122)

  • SkinThickness: Triceps skin fold thickness (0-52)

  • Insulin: 2-Hour serum insulin (0-543)

  • BMI: Body Mass Index (0-57.3)

  • DiabetesPedigreeFunction: Diabetes Pedigree Function(0.078-2.288)

  • Age: Age of Patient (21-81)

  • Outcome: Patient is diabetic or not (0 or 1)


Tasks to be Performed: 1. What are the optimal values for learning_rate and n_estimators?

Example: If 1 and 100 are optimal values then the output should be:

Output: 1 , 100

Hint: Take hyperparameters as:

  • learning_rate: 0.1 to 1 step 0.1

  • n_estimators: 50 to 300 step 50

2. Calculate the below precision values for both models( ADA Boost and XGBoost) and find the larger value between them(up to 2 decimal places):

  • Accuracy

  • Sensitivity

  • Specificity


Example: If the precision values of AdaBoost are:

  • Accuracy: 80.0

  • Sensitivity: 40.12

  • Specificity: 30.34

And the precision values of XGBoost are:

  • Accuracy: 90.0

  • Sensitivity: 50.56

  • Specificity: 20.78

Then the output should be:

Output: 90.0, 50.56, 30.34


Hint: Use the confusion matrix to calculate the above values.


Final Output Sample:

1, 100, 90.0, 50.56, 30.34

NOTE: Here, The multiple answers are separated by a comma.


Input Format:

  • The first file ‘diabetes_train.csv’ contains data as mentioned in the problem to train the models. The file is in *.csv format and is present at the location /data/training/diabetes_train.csv.

  • The second file ‘diabetes_test.csv’ contains data as mentioned in the problem to test the models. The file is in *.csv format and is present at the location /data/test/diabetes_test.csv.


Output Format:

  • Perform the above operations and write answers to all queries asked in the questions to a file named output.csv.

  • Each answer should be separated by a comma.

  • Your file output.csv should be present at the location

/code/output/output.csv.
# Import libraries here
# import numpy as np
# from sklearn import linear_model
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

QUESTION 5:

DESCRIPTION

Problem Statement:


You are provided with a data set named “Retail.csv”, you have to perform market basket analysis on the dataset. Apply the Apriori algorithm and association rules using appropriate parameters.

Dataset: Retail.csv

Dataset Parameters:

  • POS Txn : Transaction ID

  • Dept: Department

  • ID: Item ID

  • Sales U: Units sold


Remove the ‘0999: UNSCANNED ITEMS’ from the ‘Dept’ column and print number of times ‘0973:CANDY’ sold.

Example: If the number of times ‘0973:CANDY’ was sold 100 times then the output should be:

Output: 100

Hint: We need to find the number of times ‘0973:CANDY’ was sold not total units sold.


1. For the Frequent Itemsets, keep the minimum support as 0.02 and find maximum support. (up to 5 decimal places)

Example: If maximum support is 0.54321 then the output should be:

Output: 0.54321

Hint: Get rules using the “lift” Metric having minimum_threshold as 2


1. Filter rules having lift>=3 and confidence >=0.1 and calculate the total number of rules and filtered rules.

Example: If the total number of rules is 40 and the number of filtered rules is 20 then output should be:

Output: 40, 20

Final Output Sample:

100, 0.54321, 40, 20


NOTE: Here, The multiple answers are separated by a comma.


Input Format:

  • The first file ‘Retail.csv’ contains data as mentioned in the problem. The file is in *.csv format and is present at the location /data/training/Retail.csv.

Output Format:

  • Perform the above operations and write answers to all queries asked in the questions to a file named output.csv.

  • Each answer should be separated by a comma.

  • Your file output.csv should be present at the location /code/output/output.csv.

help_outline

# Import libraries here
# import numpy as np
# from sklearn import linear_model
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt


QUESTION 6

DESCRIPTION

Dataset:










1: Create a Bayesian Model having structure (fruit -> tasty <- size) and Count how often each state of the variable occurs. If the variable is dependent on parents, the counts are done conditionally on the parents states, i.e. for seperately for each parent configuration.


Carry out the following tasks: 1. Load and read, the data from the input file data1.csv 2. Create a Bayesian Model having structure (fruit -> tasty <- size) 3. Count how often each state of the variable occurs 4. If the variable is dependent on parents, the counts are done conditionally on the parent's states, i.e. for separately for each parent configuration. 4. Print your output to a CSV file Question 2: Using the MLE, estimate the values of the conditional probability distributions (CPDs), for the variables fruit, size, and tasty in a Bayesian Network fruit -> tasty <- size. Print all the CPDs Carry out the following tasks:

1. Using the MLE, estimate the values of the conditional probability distributions (CPDs), for the variables fruit, size, and tasty in a Bayesian Network 2. Print all the CPDs 3. Write the cpd of tasty(up to 2 decimal points) to CSV Question 3: Using the Variable Elimination algorithm, do the MAP Inference on the variable 'tasty' without evidence and elimination order. Carry out the following tasks: 1. Using Variable Elimination algorithm, perform MAP Inference on the variable 'tasty' without an evidence and elimination order 2. Print your output to a CSV file Input format: Read the training data, which is present in the file named data1.csv, which is present at the location /data/training/data1.csv . Output Format: Output1: You have to perform the operations as described and write the frequency of tasty count, under fruit and size in output.csv, which should be present at the location /code/output/output.csv


Example: output.csv will have data looking like







Output 2: You have to perform the operations as described and write the cpd of tasty(upto 2 decimal points) to csv output.csv, which should be present at the location "/code/output/output.csv"


Example: output.csv will have data looking like





Output 3: You have to perform the operations as described and write the frequency of tasty count, under fruit and size in output.csv, which should be present at the location /code/output/output.csv


Example: output.csv will have data looking like this




Question1

#import libraries
 pgmpy.models import BayesianModel
from pgmpy.estimators import ParameterEstimator
#load the dataset
data = pd.read_csv("/data/training/data1.csv", delimiter=" ")
#Write your code below
fruit_counts = 
size_counts = 
tasty_counts = 
#Expected Output
tasty_counts
#write you output to csv
output.to_csv('/code/output/output1.csv')

Question2

#create a Bayesian Model and generate CPD using MLE
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
#Write your code
fruit_cpd = 
size_cpd = 
tasty_cpd = 
#write cpd of tasty to csv
cpd_tasty.to_csv('/code/output/output2.csv')

Question3

#create a Bayesian model and run variable elimination algorithm on it
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
#Expected Output
result
#write you output to csv
result.to_csv('/code/output/output3.csv')

QUESTION 7

DESCRIPTION

Question 1: Create a Dataframe having 8 columns namely, A,B,C,D,E,F,G,H with the following properties:

  1. Number of rows: 2500, Number of columns: 8

  2. Create a Dataframe with random integers ranges from 0 to 3

  3. Values of Column A = Column B + Column C

  4. Values of Column H = Column G - Column A


Question 2: For the dataframe created in Q1, create a new dataframe by choosing following columns [A, B, C, D, E, F], create a bayesian model with the following edges ('A', 'B'), ('C', 'B'), ('C', 'D'),('E', 'F'). Estimate the model parameters of node 'B' using Bayesian Estimator.


Carry out the following tasks:

  1. Create a dataframe as instructed in question 1

  2. Create a new dataframe by choosing the following columns [A, B, C, D, E, F]

  3. Create a Bayesian model with the following edges ('A', 'B'), ('C', 'B'), ('C', 'D'),('E', 'F')

  4. Estimate the model parameters of node 'B' using Bayesian Estimator

  5. Write all the output in a .csv file


Question1

#To be given to students
import pandas as pd
import numpy as np
import random
np.random.seed(2)
data = pd.DataFrame()
##Write your code here
question2
from pgmpy.models import BayesianModel
from pgmpy.estimators import BayesianEstimator
#write code here
print(node_B_cpds)
#write output to a file
np.savetxt("/code/output/output2.csv", node_B_cpds, delimiter=",")


If you need any programming assignment help in Machine learning coursework, Machine learning project or Machine learning homework or need solution of above problem then we are ready to help you.


Send your request at realcode4you@gmail.com and get instant help with an affordable price.

We are always focus to delivered unique or without plagiarism code which is written by our highly educated professional which provide well structured code within your given time frame.


If you are looking other programming language help like C, C++, Java, Python, PHP, Asp.Net, NodeJs, ReactJs, etc. with the different types of databases like MySQL, MongoDB, SQL Server, Oracle, etc. then also contact us.

Comentarios


bottom of page