In this blog we will learn what types of questions are useful to understand the concept of machine learning.
Q 1. Create your own 10x3 dataframe with named rows and columns. You can name the rows and columns anything you want (make them fun!), but please fill the table with integers. (Hint: You can use lists or dictionaries to create the dataframe)
Q 2. Calculate the mean value of the second column in your dataframe
Q 3.Now calculate the mean value of each of the columns in your dataframe, you should be able to do this all in one line.
Q 4. Calculate the mean value of all of the data in your dataframe - this can also be done in one line.
Q 5. Select the fourth row of your dataframe using the row name.
Q 6. Select the fourth row of your dataframe using the iloc[ ] attribute.
Q 7. Select the second column of your dataframe using the column name
Plotting with Matplotlib
Create a new jupyter notebook.
Create some plots using the following file: under Files/Data on canvas: file1.csv
Download it and put it in the same folder as your lab notebook.
You'll need to add column names when reading the csv, use the following additional parameters in your read_csv method:
anne_df = pd.read_csv(file_path, header=None, names=['name', 'gender', 'year', 'number', 'popularity'])
Q 8: Create a line plot that shows the popularity of the name Anne per year - put years on the x-axis and popularity on the y-axis. Use anne_df.plot.line(x, y).
Q 9: Use plt.bar(anne_df[x], anne_df[y]) to create a bar chart that gives you the number of births per year.
Q 10: What are two observations you can make from the graphs you have created?
Q 11: Now create a scatter plot from using df.plot.scatter showing president's heights, put height on the y-axis and order on the x-axis: https://classes.cs.uoregon.edu/19F/cis199ids/data/presidents/president_heights.csv
Q 12: Is this a good set of sample data to make conclusions about how heights of people