Build, Train and Evaluate a Deep Neural Network Using 20k Trip Advisor Hotel Reviews

realcode4you
Jun 27, 2022
3 min read

Prepare two python notebooks (recommended- use Google Colab) to build, train and evaluate a deep neural network on the two datasets given below (tensorflow or tensorflow.keras library recommended). Read the instructions carefully.

Question No.1.

NLP Dataset: Dataset consisting of 20k reviews from trip advisor.

Links to an external site. (Links to an external site.)

https://www.kaggle.com/andrewmvd/trip-advisor-hotel-reviews

1. Import Libraries/Dataset

Import the required libraries and the dataset (use Google Drive if required).
Check the GPU available (recommended- use free GPU provided by Google Colab).

2. Data Visualization

Print at least two records from each class of the dataset, for a sanity check that labels match the text.
Plot a bar graph of class distribution in the dataset. Each bar depicts the number of records belonging to a particular class in the dataset. (recommended - matplotlib/seaborn libraries)
Any other visualizations that seem appropriate for this problem are encouraged but not necessary, for the points.
Print the shapes of train and test data.

3. Data Pre-processing

Need for this Step- Since the models we use cannot accept string inputs or cannot be of the string format. We have to come up with a way of handling this step. The discussion of different ways of handling this step is out of the scope of this assignment.
Please usethis pre-trained embedding layer (Links to an external site.)
Links to an external site. (Links to an external site.)
from TensorFlow hub for this assignment. This link also has a code snippet on how to convert a sentence to a vector. Refer to that for further clarity on this subject.
Bring the train and test data in the required format.

4. Model Building

Sequential Model layers- Use AT LEAST 5 hidden layers with appropriate input for each. Choose the best number for hidden units and give reasons.
Add L1 regularization to all the layers.
Add one layer of dropout at the appropriate position and give reasons.
Choose the appropriate activation function for all the layers.
Print the model summary.

5. Model Compilation

Compile the model with the appropriate loss function.
Use an appropriate optimizer. Give reasons for the choice of learning rate and its value.
Use accuracy as a metric.

6. Model Training

Train the model for an appropriate number of epochs. Print the train and validation accuracy and loss for each epoch. Use the appropriate batch size.
Plot the loss and accuracy history graphs for both train and validation set. Print the total time taken for training.

7. Model Evaluation

Print the final train and validation loss and accuracy. Print confusion matrix and classification report for the validation dataset. Analyse and report the best and worst performing class.
Print the two most incorrectly classified records for each class in the test dataset.

Hyperparameter Tuning- Build two more models by changing the following hyperparameters one at a time. Write the code for Model Building, Model Compilation, Model Training and Model Evaluation as given in the instructions above for each additional model.

Regularization: Train a model without regularization
Dropout:Change the position and value of dropout layer

Write a comparison between each model and give reasons for the difference in results.

Question No.2.

Dataset: Attached is the CSV

Load the attached csv file in python. Each row consists of feature 1, feature 2, class label.
Train two single/double hidden layer deep networks by varying the number of hidden nodes (4, 8, 12, 16) in each layer with 70% training and 30% validation data. Use appropriate learning rate, activation, and loss functions and also mention the reason for choosing the same. Report, compare, and explain the observed accuracy and minimum loss achieved.
Visually observe the dataset and design an appropriate feature transformation (derived feature) such that after feature transformation, the dataset can be classified using a minimal network architecture (minimum number of parameters). Design, train this minimal network, and report training and validation errors, and trained parameters of the network. Use 70% training and 30% validation data, appropriate learning rate, activation and loss functions. Explain the final results.

Evaluation Process -

Task Response and Task Completion- All the models should be logically sound and have decent accuracy (models with random guessing, frozen and incorrect accuracy, exploding gradients etc. will lead to deduction of marks. Please do a sanity check of your model and results before submission).
There are a lot of subparts, so answer each completely and correctly, as no partial marks will be awarded for partially correct subparts.
Implementation- The model layers, parameters, hyperparameters, evaluation metrics etc. should be properly implemented.
Only fully connected or dense layers are allowed. CNNs/RNNs are strictly not allowed.
Notebooks without output will not be considered for evaluation.

Additional Tips -

Code organization- Please organize your code with correct line spacing and indentation, and add comments to make your code more readable.
Try to give explanations or cite references wherever required.
Use other combinations of hyperparameters to improve model accuracy.

Send your query at: