From 1556a4812554b5c4f5d4e5bc0fd7f61c6793fe0e Mon Sep 17 00:00:00 2001 From: Christopher Brooks Date: Mon, 28 Oct 2019 12:28:57 +0000 Subject: [PATCH] Good luck everyone --- midterm.ipynb | 401 ++++++++++++++++++++++++++++++++++++++++++++ midterm_history.csv | 43 +++++ 2 files changed, 444 insertions(+) create mode 100644 midterm.ipynb create mode 100644 midterm_history.csv diff --git a/midterm.ipynb b/midterm.ipynb new file mode 100644 index 0000000..fa04b63 --- /dev/null +++ b/midterm.ipynb @@ -0,0 +1,401 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SI 330: Midterm Examination\n", + "\n", + "Fall 2019, October 28th, 2019, Christopher Brooks\n", + "\n", + "You have 80 minutes, from 8:30am-9:50am to complete this exam. When you are finished the exam you must upload **your .ipynb notebook** to Canvas here: https://umich.instructure.com/courses/320857/assignments/868191\n", + "\n", + "You are allowed to search for API documentation or examples to refresh your understanding, as well as use regex testing sites if you would like. There is to be no communication with other individuals in the class or out of it.\n", + "\n", + "Advice: Don't over think things, do what you can, show your thinking process. Full grades are awarded for correct and well written solutions, partial grades will be awarded for partial or poorly written solutions. Use your time wisely.\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 1: Data Cleaning\n", + "Continuing my journey of having to learn something about sports, I decided to go look up some stats on the Wolverines Football history. I saved this dataframe in the file `midterm_history.csv`. But like most stuff from Wikipedia, it's in need of some data cleaning in order to be useful." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part A\n", + " I would like you to take the CSV file and demonstrate to me the techniques you learned in this course thus far by transforming it into a table with the following columns:\n", + "1. **coach_firstname**: The firstname of the coach\n", + "2. **coach_lastname**: The surname of the coach\n", + "3. **overall_wins**: A number indicating how many games were won overall\n", + "4. **overall_losses**: A number indicating how many games were lost overall\n", + "5. **overall_ties**: A number indicating how many games were tied overall\n", + "6. **big10_wins**: The same thing as overall_wins but for the Big Ten Record\n", + "7. **big10_losses**: The same thing as overall_losses but for the Big Ten Record\n", + "8. **big10_ties**: The same thing as overall_ties but for the Big Ten Record\n", + "\n", + "Also, please set the index value to the be the **year** the record was made.\n", + "\n", + "Note: the format of most games records in wikipedia is *win-loss-tie*, where ties are omitted if they are 0." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearCoachOverall recordBig Ten record
01898Gustave Ferbert10–03–0
11901 †Fielding H. Yost11–04–0
21902Fielding H. Yost11–05–0
31903 †Fielding H. Yost11–0–13–0–1
41904 †Fielding H. Yost10–02–0
\n", + "
" + ], + "text/plain": [ + " Year Coach Overall record Big Ten record\n", + "0 1898 Gustave Ferbert 10–0 3–0\n", + "1 1901 † Fielding H. Yost 11–0 4–0\n", + "2 1902 Fielding H. Yost 11–0 5–0\n", + "3 1903 † Fielding H. Yost 11–0–1 3–0–1\n", + "4 1904 † Fielding H. Yost 10–0 2–0" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "df=pd.read_csv(\"midterm_history.csv\")\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question 2: Data Manipulation\n", + "Imagine this hypothetical situation, I'm sitting out in the backyard with my daughter (Katie) and we're watching the squirrels climb along the trees. We jointly name each squirrel, and identify their type (e.g. a gray squirel versus a chipmunk). Then I try and find the out which kinds of trees the squirrels live in, while Katie tries to see the speed at which the squirrels run on different species of trees. So we have two `DataFrame` objects created, one about squirrels and their living conditions, and one about squirrels and their observed speed." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nametypetree
0carlchipmunkoak
1suziegraywalnut
2andygraywalnut
3bobblackoak
4johnblackwalnut
5anthonyeastern redpine
\n", + "
" + ], + "text/plain": [ + " name type tree\n", + "0 carl chipmunk oak\n", + "1 suzie gray walnut\n", + "2 andy gray walnut\n", + "3 bob black oak\n", + "4 john black walnut\n", + "5 anthony eastern red pine" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namespeedtree
0carl2oak
1carl5walnut
2carl4walnut
3suzie4walnut
4bob6walnut
5bob2oak
\n", + "
" + ], + "text/plain": [ + " name speed tree\n", + "0 carl 2 oak\n", + "1 carl 5 walnut\n", + "2 carl 4 walnut\n", + "3 suzie 4 walnut\n", + "4 bob 6 walnut\n", + "5 bob 2 oak" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "dad_df=pd.DataFrame([[\"carl\",\"chipmunk\",\"oak\"],\n", + " [\"suzie\",\"gray\",\"walnut\"],\n", + " [\"andy\",\"gray\",\"walnut\"],\n", + " [\"bob\",\"black\",\"oak\"],\n", + " [\"john\",\"black\",\"walnut\"],\n", + " [\"anthony\",\"eastern red\",\"pine\"]], columns=[\"name\",\"type\",\"tree\"])\n", + "\n", + "katie_df=pd.DataFrame([[\"carl\",2,\"oak\"],\n", + " [\"carl\",5,\"walnut\"],\n", + " [\"carl\",4,\"walnut\"],\n", + " [\"suzie\",4,\"walnut\"],\n", + " [\"bob\",6,\"walnut\"],\n", + " [\"bob\",2, \"oak\"]], columns=[\"name\",\"speed\",\"tree\"])\n", + "\n", + "display(dad_df)\n", + "display(katie_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part A\n", + "Write a function to return the mean and standard deviations (from `numpy`) of the average speed of a squirrel by type. Only include squirels for whom Katie has have collected some running data." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part B\n", + "On average, what is the fastest and slowest squirrel/tree combination? Build a dataframe where each column is labeled as to a tree type and the values of the cells are average speeds of different kinds of squirrels (so row index values should be squirrel types)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# Your code here" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/midterm_history.csv b/midterm_history.csv new file mode 100644 index 0000000..faf0be0 --- /dev/null +++ b/midterm_history.csv @@ -0,0 +1,43 @@ +Year,Coach,Overall record,Big Ten record +1898,Gustave Ferbert,10–0,3–0 +1901 †,Fielding H. Yost,11–0,4–0 +1902,Fielding H. Yost,11–0,5–0 +1903 †,Fielding H. Yost,11–0–1,3–0–1 +1904 †,Fielding H. Yost,10–0,2–0 +1906 †,Fielding H. Yost,4–1,1–0 +1918 †,Fielding H. Yost,5–0,2–0 +1922 †,Fielding H. Yost,6–0–1,4–0 +1923 †,Fielding H. Yost,8–0,4–0 +1925,Fielding H. Yost,7–1,5–1 +1926 †,Fielding H. Yost,7–1,5–0 +1930 †,Harry Kipke,8–0–1,5–0 +1931 †,Harry Kipke,8–1–1,5–1 +1932 †,Harry Kipke,8–0,6–0 +1933 †,Harry Kipke,7–0–1,5–0–1 +1943 †,Fritz Crisler,8–1,6–0 +1947,Fritz Crisler,10–0,6–0 +1948,Bennie Oosterbaan,9–0,6–0 +1949 †,Bennie Oosterbaan,6–2–1,4–1–1 +1950,Bennie Oosterbaan,6–3–1,4–1–1 +1964,Bump Elliott,9–1,6–1 +1969 †,Bo Schembechler,8–3,6–1 +1971,Bo Schembechler,11–1,8–0 +1972 †,Bo Schembechler,10–1,7–1 +1973 †,Bo Schembechler,10-0–1,7–0–1 +1974 †,Bo Schembechler,10–1,7–1 +1976 †,Bo Schembechler,10–2,7–1 +1977 †,Bo Schembechler,10–2,7–1 +1978 †,Bo Schembechler,10–2,7–1 +1980,Bo Schembechler,10–2,8–0 +1982,Bo Schembechler,8–4,8–1 +1986 †,Bo Schembechler,11–2,7–1 +1988,Bo Schembechler,9–2–1,7–0–1 +1989,Bo Schembechler,10–2,8–0 +1990 †,Gary Moeller,9–3,6–2 +1991,Gary Moeller,10–2,8–0 +1992,Gary Moeller,9–0–3,6–0–2 +1997,Lloyd Carr,12–0,8–0 +1998 †,Lloyd Carr,10–3,7–1 +2000 †,Lloyd Carr,9–3,6–2 +2003,Lloyd Carr,10–3,7–1 +2004 †,Lloyd Carr,9–3,7–1