TDM 20200: Project 8 — 2023
Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we will create plots using the plotly
package, as well as do some data manipulation using pandas
.
Context: This is the second project focused around creating visualizations in Python.
Scope: plotly, Python, pandas
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/disney/total.parquet
-
/anvil/projects/tdm/data/disney/metadata.csv
Questions
Interested in being a TA? Please apply: purdue.ca1.qualtrics.com/jfe/form/SV_08IIpwh19umLvbE |
Question 1
Read in the data from the parquet file /anvil/projects/tdm/data/disney/total.parquet
and store it in a variable called dat
. Do the same for /anvil/projects/tdm/data/disney/metadata.csv
and call it meta
.
Plotly express makes it really easy to create nice, clean graphics, and it integrates with pandas
superbly. You can find links to all of the plotly express functions on this page.
Let’s start out simple. Create a bar chart for the total number of observations for each ride. Make sure your plot has labels for the x axis, y axis, and overall plot.
While the default plotly plots look amazing and have great interactivity, they won’t render in your notebook well in Gradescope. For this reason, please use |
You can use |
-
Code used to solve this problem.
-
Output from running the code.
Question 2
Great! Wouldn’t it be interesting to see how the total number of observations changes over time, by ride?
Create a single plot that contains a bar plot for all of the rides, with the total number of observations for each ride by year. Okay, maybe not all of the rides — if you try, you may notice that the graphic becomes quite cluttered. Instead choose 6 rides you are interested to include on the plot.
This page has a good example of making facetted subplots. |
To convert the |
First, create a new column called Next, group by both the The x axis should be the |
-
Code used to solve this problem.
-
Output from running the code.
Question 3
Create a plot that shows the association between the average SPOSTMIN
and WDWMAXTEMP
for a ride of your choice. Some notes to help you below.
-
Create a new column in
dat
calledday
that is the date. Make sure to usepd.to_datetime
to convert the date to the correct type. -
Use the
groupby
method to group by bothride_name
andday
. Get the average by using themean
method on your grouped data. In order to makeride_name
andday
columns instead of indices, call thereset_index
method. Finally, use thequery
method to subset your data to just be data for your given ride. -
Convert the
DATE
column inmeta
to the correct type usingpd.to_datetime
. -
Use the
merge
method to merge your grouped data with the metadata onday
(from the grouped data) andDATE
(frommeta
). -
Make the scatterplot.
Is there an obvious relationship between the two variables for your chosen ride? Did you expect the results?
-
Code used to solve this problem.
-
Output from running the code.
Question 4
This is an extremely rich dataset with lots and lots of plotting potential! In addition, there are a lot of interesting questions you could ask about wait times and rides that could actually be useful if you were to visit Disney!
Create a graphic using a plot we have not yet used from this webpage. Make sure to use proper labels, and make sure the graphic shows some sort of potentially interesting relationship. Write 1-2 sentences about why you decided to create this plot.
-
Code used to solve this problem.
-
Output from running the code.
Question 5
Ask yourself a question regarding the information in the dataset. For example, maybe you think that certain events from the meta
dataframe will influence a certain ride. Perhaps you think the time the park opens is relevant to the time of year? Write down the question you would like to answer using a plot. Choose the type of plot you are going to use, and write 1-2 sentences explaining your reasoning. Create the plot. What were the results? Was the plot an effective way to answer your question?
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |