Analysing SMS Spam Data Using NLP(Natural Language Processing) | Sample Paper


Instructions :

Each group is expected to submit jupyter notebook (.ipynb).


2. Inside each jupyter notebook, you are required to mention your name, Group details,

and the Assignment dataset you will be working on. Organize your code in separate sections for each task. Add comments to make the code readable. Also, Notebooks without output shall not be considered for evaluation.


3. Convert the notebook to HTML format and upload it on Canvas.

 

Problem Statement

The SMS Spam Collection is a set of SMS tagged messages that have been collected for

SMS Spam research. It contains one set of SMS messages in English of 5,574 messages,

tagged according to being ham (legitimate) or spam.

 

Link to the Dataset: https://www.kaggle.com/uciml/sms-spam-collection-

dataset/download (Links to an external site.)


Things to be  done:

a) Download the file and set it as a Dataframe. 

b) Remove punctuations, special characters and stopwords from the text in ‘sms’ column.

Convert the text to lower case.

c) Create two objects X and y. create a CountVectorizer object and split the data into training and testing sets. Train a MultinomialNB model and Display the confusion Matrix 

d) Display the POS tagging on the first 4 rows of ‘sms’.

e) Build and display a dependency parser tree for the sentence :

  “the series opened 17 years later, as Viserys Targaryen tried to win an eastern tribal

army to his side, so he could retake the Iron Throne”

7 views0 comments