top of page

Exploratory Data Analysis In R Using Penguins Dataset | Sample Paper

Palmer Penguins

The Palmer Station located in the Palmer Archipelago on Anvers Island, Antarctica, has been monitoring the ecology of the Palmer Long-Term Ecological Research (LTER) study area for over 50 years. You can see what’s going on at the Palmer Station currently by clicking here. Being on Antarctica, naturally one of their keen interests is monitoring the local penguin population from which they record data in order to understand their population dynamics, responses to changing climate etc.

The Data

The palmer penguins dataset contains data measured on 333 penguins from the Palmer Archipelago. The variables observed are:

  • species: The species of the penguin (Adelie, Chinstrap or Gentoo)

  • island: The island on which the penguin lives (Biscoe, Dream or Torgerson)

  • bill length mm: The length of the penguin’s bill (in millimetres)

  • bill depth mm: The depth of the penguin’s bill (in millimetres)

  • flipper length mm: The length of the penguin’s flipper (in millimetres)

  • body mass g: The penguin’s body mass (in grams)

  • sex: The sex of the penguin (male or female)

  • year: The year the measurements were taken

Installing the Data

Install the palmer penguins package and access the data

install.packages("palmerpenguins") # You only need to do this once library(palmerpenguins) 
penguins = na.omit(penguins) # Removes missing rows 

Run the following code to access your unique subset of the penguin dataset

my.student.number = 123456789 # Replace this with your student number set.seed(my.student.number) 
my.penguins = penguins[sample(nrow(penguins), 100), ] 

the object my.penguins now contains the data on your 100 penguins.

The Task

You are to produce a report which comprises of an exploratory data analysis of the data on your sample of 100 penguins. In this exploratory analysis you should consider the most appropriate graphical and numerical summaries for your data, along with appropriate measures of uncertainty on these numerical summaries.

Sexing (i.e. determining the sex) of a penguin can often be very difficult without causing distress to the penguin. Researchers at the Palmer station would like to be able to estimate the sex of a penguin from measurement data, thereby avoiding the need for invasive procedures. From your data, which variables appear to be the best at distinguishing between male and female penguins? How reliable do you think they would be at identifying the sex of a penguin?

One scientist wishes to compare the weights of male and female penguins. To do this they propose a statistic p ∗ which is the probability that a randomly selected male from a sample will be heavier than a randomly selected female from the sample. In your report include the calculated value of p ∗ for your sample of penguins. What does this mean for the weights of males vs females? What are the good/bad properties of using a statistic like p ∗ to compare two groups?

The scientists also have several questions they’d like to investigate:

  • Are Gentoo penguins heavier than 5kg (=5000g) on average?

  • Are male penguin’s flippers 200cm in length on average?

  • Do less than 5% of all penguins have a bill depth of at least 20mm?

From your data, statistically evaluate each of these questions and state your conclusions as part of your report.

Marking Criteria

Reports will be marked on the university scale. Credit will be given for:

  • Mathematical accuracy – How well you carry out the statistical techniques in your report

  • Methodology – An understanding of why you have chosen the techniques that you have, and what their output means in terms of your investigation

  • Critical evaluation – A discussion of the strengths and weaknesses of your methods, how things could be improved etc

  • Report structure and presentation – How well your report is written in terms of structure, how well it flows etc (i.e. aiming for a single, coherent piece of writing, as opposed to lots of separate answers jammed together)

  • Extra credit will be given for any reading/techniques implemented from outside the scope of the module, but this is not a requirement to receive a good mark. Similarly any investigations carried out beyond what was described in the report task will also be considered for extra credit.

Get complete Research Document with code implementation using R Programming at

If You need solution of above problem then send your request or contact us and get instant help with an affordable price.


bottom of page