Instructions :
Each group is expected to submit jupyter notebook (.ipynb).
2. Inside each jupyter notebook, you are required to mention your name, Group details,
and the Assignment dataset you will be working on. Organize your code in separate sections for each task. Add comments to make the code readable. Also, Notebooks without output shall not be considered for evaluation.
3. Convert the notebook to HTML format and upload it on Canvas.
Problem Statement
The SMS Spam Collection is a set of SMS tagged messages that have been collected for
SMS Spam research. It contains one set of SMS messages in English of 5,574 messages,
tagged according to being ham (legitimate) or spam.
Link to the Dataset: https://www.kaggle.com/uciml/sms-spam-collection-
dataset/download (Links to an external site.)
Things to be done:
a) Download the file and set it as a Dataframe.
b) Remove punctuations, special characters and stopwords from the text in ‘sms’ column.
Convert the text to lower case.
c) Create two objects X and y. create a CountVectorizer object and split the data into training and testing sets. Train a MultinomialNB model and Display the confusion Matrix
d) Display the POS tagging on the first 4 rows of ‘sms’.
e) Build and display a dependency parser tree for the sentence :
“the series opened 17 years later, as Viserys Targaryen tried to win an eastern tribal
army to his side, so he could retake the Iron Throne”
コメント