Skip to content

Commit

Permalink
File versions after TR2
Browse files Browse the repository at this point in the history
  • Loading branch information
eyrei123 authored Dec 3, 2023
1 parent 65af430 commit 150da33
Show file tree
Hide file tree
Showing 8 changed files with 1,387 additions and 37 deletions.
54 changes: 17 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,17 @@
# Real Python Materials

Bonus materials, exercises, and example projects for Real Python's [Python tutorials](https://realpython.com).

Build Status:
[![GitHub Actions](https://img.shields.io/github/actions/workflow/status/realpython/materials/linters.yml?branch=master)](https://github.com/realpython/materials/actions)

## Got a Question?

The best way to get support for Real Python courses, articles, and code in this repository is to join one of our [weekly Office Hours calls](https://realpython.com/office-hours/) or to ask your question in the [RP Community Chat](https://realpython.com/community/).

Due to time constraints, we cannot provide 1:1 support via GitHub. See you on Slack or on the next Office Hours call 🙂

## Adding Source Code & Sample Projects to This Repo (RP Contributors)

### Running Code Style Checks

We use [flake8](http://flake8.pycqa.org/en/latest/) and [black](https://black.readthedocs.io/) to ensure a consistent code style for all of our sample code in this repository.

Run the following commands to validate your code against the linters:

```sh
$ flake8
$ black --check .
```

### Running Python Code Formatter

We're using a tool called [black](https://black.readthedocs.io/) on this repo to ensure consistent formatting. On CI it runs in "check" mode to ensure any new files added to the repo follow PEP 8. If you see linter warnings that say something like "would reformat some_file.py" it means that black disagrees with your formatting.

**The easiest way to resolve these errors is to run Black locally on the code and then commit those changes, as explained below.**

To automatically re-format your code to be consistent with our code style guidelines, run [black](https://black.readthedocs.io/) in the repository root folder:

```sh
$ black .
```
# Using Python for Data Analysis

This folder contains completed notebooks and other files used in the Real Python tutorial on [Using Python for Data Analysis](https://realpython.com/using-python-for-data-analysis/).

None of the files are mandatory to complete the tutorial, however, you may find them of use for reference during the tutorial.

## Available Files:

`data analysis findings.ipynb` is a Jupyter Notebook containing all the code used in the tutorial.
`data analysis results.ipynb` is a Jupyter Notebook containing the final version of the cleansing and analysis code.
`james_bond_data.csv` contains the data to be cleansed and analyzed in its original form, in CSV format.
`james_bond_data.json` contains the data to be cleansed and analyzed in its original form, in JSON format.
`james_bond_data.parquet` contains the data to be cleansed and analyzed in its original form, in parquet format.
`james_bond_data.xlsx` contains the data to be cleansed and analyzed in its original form, in Microsoft Excel format.
`james_bond_data_cleansed.csv` contains the cleansed data in its final form.

## Although the tutorial can be completed in a range of Python environments, the use of Jupyter Notebook within JupyterLab is highly recommended.
222 changes: 222 additions & 0 deletions data analysis results.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ade4bd3f-543b-460b-980f-0b41aab2c8b6",
"metadata": {},
"source": [
"# Data Cleansing Code"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a360772e-7829-4c15-9af9-d4596efc7351",
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c98c7640-1472-4869-9fdd-f070d665ae1d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"james_bond_data = pd.read_csv(\"james_bond_data.csv\").convert_dtypes()\n",
"\n",
"data = (\n",
" james_bond_data.rename(columns=new_column_names)\n",
" .combine_first(\n",
" pd.DataFrame(\n",
" {\"imdb_rating\": {10: 7.1}, \"rotten_tomatoes_rating\": {10: 6.8}}\n",
" )\n",
" )\n",
" .assign(\n",
" gross_income_usa=lambda data: (\n",
" data[\"gross_income_usa\"]\n",
" .replace(\"[$,]\", \"\", regex=True)\n",
" .astype(float)\n",
" ),\n",
" gross_income_world=lambda data: (\n",
" data[\"gross_income_world\"]\n",
" .replace(\"[$,]\", \"\", regex=True)\n",
" .astype(float)\n",
" ),\n",
" movie_budget=lambda data: (\n",
" data[\"movie_budget\"].replace(\"[$,]\", \"\", regex=True).astype(float)\n",
" * 1000\n",
" ),\n",
" film_length=lambda data: (\n",
" data[\"film_length\"]\n",
" .str.removesuffix(\"mins\")\n",
" .astype(int)\n",
" .replace(1200, 120)\n",
" ),\n",
" release_date=lambda data: pd.to_datetime(\n",
" data[\"release_date\"], format=\"%B, %Y\"\n",
" ),\n",
" release_Year=lambda data: data[\"release_date\"].dt.year,\n",
" bond_actor=lambda data: (\n",
" data[\"bond_actor\"]\n",
" .str.replace(\"Shawn\", \"Sean\")\n",
" .str.replace(\"MOORE\", \"Moore\")\n",
" ),\n",
" car_manufacturer=lambda data: data[\"car_manufacturer\"].str.replace(\n",
" \"Astin\", \"Aston\"\n",
" ),\n",
" martinis_consumed=lambda data: data[\"martinis_consumed\"].replace(\n",
" -6, 6\n",
" ),\n",
" )\n",
").drop_duplicates(ignore_index=True)\n",
"\n",
"data.to_csv(\"james_bond_data_cleansed.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"id": "f50918ee-e61f-46b2-b0c2-1ffa2c62bbc0",
"metadata": {},
"source": [
"# Data Analysis Code"
]
},
{
"cell_type": "markdown",
"id": "86817f68-05a0-4235-a1c8-a5d1f6e9141e",
"metadata": {},
"source": [
"## Performing a Regression Analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bee6d6cb-e418-4c1d-8b75-604b9ab2e63d",
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install matplotlib scikit-learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "669fb9d7-d744-4e6b-899e-a69aebec53ed",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn.linear_model import LinearRegression\n",
"\n",
"x = data.loc[:, [\"imdb_rating\"]]\n",
"y = data.loc[:, \"rotten_tomatoes_rating\"]\n",
"\n",
"model = LinearRegression()\n",
"model.fit(x, y)\n",
"\n",
"r_squared = f\"R-Squared: {model.score(x, y):.2f}\"\n",
"best_fit = f\"y = {model.coef_[0]:.4f}x{model.intercept_:+.4f}\"\n",
"y_pred = model.predict(x)\n",
"\n",
"fig, ax = plt.subplots()\n",
"ax.scatter(x, y)\n",
"ax.plot(x, y_pred, color=\"red\")\n",
"ax.text(7.25, 5.5, r_squared, fontsize=10)\n",
"ax.text(7.25, 7, best_fit, fontsize=10)\n",
"ax.set_title(\"Scatter Plot of Ratings\")\n",
"ax.set_xlabel(\"Average IMDB Rating\")\n",
"ax.set_ylabel(\"Average Rotten Tomatoes Rating\")\n",
"# fig.show()"
]
},
{
"cell_type": "markdown",
"id": "b38df412-c320-49fb-93ae-e253405537a8",
"metadata": {},
"source": [
"## Investigating a Statistical Distribution"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "938e5942-e57f-4e41-99f1-215cfb37d0df",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# fig, ax = plt.subplots()\n",
"length = data[\"film_length\"].value_counts(bins=7).sort_index()\n",
"length.plot.bar(\n",
" title=\"Film Length Distribution\",\n",
" xlabel=\"Time Range (mins)\",\n",
" ylabel=\"Count\",\n",
")\n",
"# fig.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff4e9955-baf4-48eb-b032-fbf55f439194",
"metadata": {},
"outputs": [],
"source": [
"data[\"film_length\"].agg([\"mean\", \"max\", \"min\", \"std\"])"
]
},
{
"cell_type": "markdown",
"id": "1b14c433-c3a6-4484-bc0a-26825bd1e870",
"metadata": {},
"source": [
"## Finding No Relationship"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2bb83374-347f-4cf6-bc21-8180a003371d",
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots()\n",
"ax.scatter(data[\"imdb_rating\"], data[\"bond_kills\"])\n",
"ax.set_title(\"Scatter Plot of Kills vs Ratings\")\n",
"ax.set_xlabel(\"Average IMDb Rating\")\n",
"ax.set_ylabel(\"Kills by Bond\")\n",
"fig.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 150da33

Please sign in to comment.