Kaggle Pandas Exercise 2

mrankitgupta · Apr 5, 2022 · 2a0cc7f · 2a0cc7f
1 parent f6daf20
commit 2a0cc7f
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/pandas-2exercise-indexing-selecting-assigning.ipynb b/pandas-2exercise-indexing-selecting-assigning.ipynb
@@ -0,0 +1 @@
+{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.7.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"**This notebook is an exercise in the [Pandas](https://www.kaggle.com/learn/pandas) course.  You can reference the tutorial at [this link](https://www.kaggle.com/residentmario/indexing-selecting-assigning).**\n\n---\n","metadata":{}},{"cell_type":"markdown","source":"# Introduction\n\nIn this set of exercises we will work with the [Wine Reviews dataset](https://www.kaggle.com/zynicide/wine-reviews). ","metadata":{}},{"cell_type":"markdown","source":"Run the following cell to load your data and some utility functions (including code to check your answers).","metadata":{}},{"cell_type":"code","source":"import pandas as pd\n\nreviews = pd.read_csv(\"../input/wine-reviews/winemag-data-130k-v2.csv\", index_col=0)\npd.set_option(\"display.max_rows\", 5)\n\nfrom learntools.core import binder; binder.bind(globals())\nfrom learntools.pandas.indexing_selecting_and_assigning import *\nprint(\"Setup complete.\")","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:12.936486Z","iopub.execute_input":"2022-04-05T10:57:12.937314Z","iopub.status.idle":"2022-04-05T10:57:14.081492Z","shell.execute_reply.started":"2022-04-05T10:57:12.937265Z","shell.execute_reply":"2022-04-05T10:57:14.080572Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Look at an overview of your data by running the following line.","metadata":{}},{"cell_type":"code","source":"reviews.head()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.08455Z","iopub.execute_input":"2022-04-05T10:57:14.084768Z","iopub.status.idle":"2022-04-05T10:57:14.102906Z","shell.execute_reply.started":"2022-04-05T10:57:14.084742Z","shell.execute_reply":"2022-04-05T10:57:14.101759Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Exercises","metadata":{}},{"cell_type":"markdown","source":"## 1.\n\nSelect the `description` column from `reviews` and assign the result to the variable `desc`.","metadata":{}},{"cell_type":"code","source":"# Your code here\ndesc = reviews ['description']\n\n# Check your answer\nq1.check()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.104443Z","iopub.execute_input":"2022-04-05T10:57:14.104747Z","iopub.status.idle":"2022-04-05T10:57:14.12082Z","shell.execute_reply.started":"2022-04-05T10:57:14.104706Z","shell.execute_reply":"2022-04-05T10:57:14.119816Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Follow-up question: what type of object is `desc`? If you're not sure, you can check by calling Python's `type` function: `type(desc)`.","metadata":{}},{"cell_type":"code","source":"#q1.hint()\n#q1.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.124105Z","iopub.execute_input":"2022-04-05T10:57:14.124589Z","iopub.status.idle":"2022-04-05T10:57:14.13125Z","shell.execute_reply.started":"2022-04-05T10:57:14.124552Z","shell.execute_reply":"2022-04-05T10:57:14.130308Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 2.\n\nSelect the first value from the description column of `reviews`, assigning it to variable `first_description`.","metadata":{}},{"cell_type":"code","source":"first_description = reviews.description.iloc[0]\n\n# Check your answer\nq2.check()\nfirst_description","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.132526Z","iopub.execute_input":"2022-04-05T10:57:14.132743Z","iopub.status.idle":"2022-04-05T10:57:14.151147Z","shell.execute_reply.started":"2022-04-05T10:57:14.132709Z","shell.execute_reply":"2022-04-05T10:57:14.150321Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q2.hint()\n#q2.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.152791Z","iopub.execute_input":"2022-04-05T10:57:14.153606Z","iopub.status.idle":"2022-04-05T10:57:14.158918Z","shell.execute_reply.started":"2022-04-05T10:57:14.153558Z","shell.execute_reply":"2022-04-05T10:57:14.158052Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 3. \n\nSelect the first row of data (the first record) from `reviews`, assigning it to the variable `first_row`.","metadata":{}},{"cell_type":"code","source":"first_row = reviews.iloc[0]\n\n# Check your answer\nq3.check()\nfirst_row","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.160456Z","iopub.execute_input":"2022-04-05T10:57:14.161377Z","iopub.status.idle":"2022-04-05T10:57:14.181819Z","shell.execute_reply.started":"2022-04-05T10:57:14.161334Z","shell.execute_reply":"2022-04-05T10:57:14.180927Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q3.hint()\n#q3.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.183119Z","iopub.execute_input":"2022-04-05T10:57:14.183432Z","iopub.status.idle":"2022-04-05T10:57:14.188019Z","shell.execute_reply.started":"2022-04-05T10:57:14.183392Z","shell.execute_reply":"2022-04-05T10:57:14.186901Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 4.\n\nSelect the first 10 values from the `description` column in `reviews`, assigning the result to variable `first_descriptions`.\n\nHint: format your output as a pandas Series.","metadata":{}},{"cell_type":"code","source":"first_descriptions = reviews.description.head(10)\n\n# Check your answer\nq4.check()\nfirst_descriptions","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.18958Z","iopub.execute_input":"2022-04-05T10:57:14.190443Z","iopub.status.idle":"2022-04-05T10:57:14.211322Z","shell.execute_reply.started":"2022-04-05T10:57:14.190399Z","shell.execute_reply":"2022-04-05T10:57:14.210606Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q4.hint()\n#q4.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.213584Z","iopub.execute_input":"2022-04-05T10:57:14.214043Z","iopub.status.idle":"2022-04-05T10:57:14.217102Z","shell.execute_reply.started":"2022-04-05T10:57:14.214008Z","shell.execute_reply":"2022-04-05T10:57:14.216445Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 5.\n\nSelect the records with index labels `1`, `2`, `3`, `5`, and `8`, assigning the result to the variable `sample_reviews`.\n\nIn other words, generate the following DataFrame:\n\n![](https://i.imgur.com/sHZvI1O.png)","metadata":{}},{"cell_type":"code","source":"a = [1, 2, 3, 5, 8]\nsample_reviews = reviews.loc[a]\n\n# Check your answer\nq5.check()\nsample_reviews","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.218294Z","iopub.execute_input":"2022-04-05T10:57:14.218511Z","iopub.status.idle":"2022-04-05T10:57:14.253469Z","shell.execute_reply.started":"2022-04-05T10:57:14.218485Z","shell.execute_reply":"2022-04-05T10:57:14.252827Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q5.hint()\n#q5.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.272969Z","iopub.execute_input":"2022-04-05T10:57:14.273651Z","iopub.status.idle":"2022-04-05T10:57:14.277734Z","shell.execute_reply.started":"2022-04-05T10:57:14.273602Z","shell.execute_reply":"2022-04-05T10:57:14.276763Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 6.\n\nCreate a variable `df` containing the `country`, `province`, `region_1`, and `region_2` columns of the records with the index labels `0`, `1`, `10`, and `100`. In other words, generate the following DataFrame:\n\n![](https://i.imgur.com/FUCGiKP.png)","metadata":{}},{"cell_type":"code","source":"a = [0, 1, 10, 100]\nb=['country','province','region_1','region_2']\ndf = reviews.loc[a,b]\n\n# Check your answer\nq6.check()\ndf","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.421196Z","iopub.execute_input":"2022-04-05T10:57:14.421489Z","iopub.status.idle":"2022-04-05T10:57:14.441431Z","shell.execute_reply.started":"2022-04-05T10:57:14.421458Z","shell.execute_reply":"2022-04-05T10:57:14.440458Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q6.hint()\n#q6.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.557653Z","iopub.execute_input":"2022-04-05T10:57:14.558546Z","iopub.status.idle":"2022-04-05T10:57:14.562047Z","shell.execute_reply.started":"2022-04-05T10:57:14.558501Z","shell.execute_reply":"2022-04-05T10:57:14.561158Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 7.\n\nCreate a variable `df` containing the `country` and `variety` columns of the first 100 records. \n\nHint: you may use `loc` or `iloc`. When working on the answer this question and the several of the ones that follow, keep the following \"gotcha\" described in the tutorial:\n\n> `iloc` uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. \n`loc`, meanwhile, indexes inclusively. \n\n> This is particularly confusing when the DataFrame index is a simple numerical list, e.g. `0,...,1000`. In this case `df.iloc[0:1000]` will return 1000 entries, while `df.loc[0:1000]` return 1001 of them! To get 1000 elements using `loc`, you will need to go one lower and ask for `df.iloc[0:999]`. ","metadata":{}},{"cell_type":"code","source":"a = ['country','variety']\ndf = reviews.loc[:99, a]\n\n# Check your answer\nq7.check()\ndf","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.652048Z","iopub.execute_input":"2022-04-05T10:57:14.652789Z","iopub.status.idle":"2022-04-05T10:57:14.669421Z","shell.execute_reply.started":"2022-04-05T10:57:14.652741Z","shell.execute_reply":"2022-04-05T10:57:14.668671Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q7.hint()\n#q7.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.720188Z","iopub.execute_input":"2022-04-05T10:57:14.720503Z","iopub.status.idle":"2022-04-05T10:57:14.723892Z","shell.execute_reply.started":"2022-04-05T10:57:14.720471Z","shell.execute_reply":"2022-04-05T10:57:14.72331Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 8.\n\nCreate a DataFrame `italian_wines` containing reviews of wines made in `Italy`. Hint: `reviews.country` equals what?","metadata":{}},{"cell_type":"code","source":"italian_wines = reviews.loc[reviews.country=='Italy']\n\n# Check your answer\nq8.check()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.845959Z","iopub.execute_input":"2022-04-05T10:57:14.846596Z","iopub.status.idle":"2022-04-05T10:57:14.8838Z","shell.execute_reply.started":"2022-04-05T10:57:14.846558Z","shell.execute_reply":"2022-04-05T10:57:14.883204Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q8.hint()\n#q8.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:14.980435Z","iopub.execute_input":"2022-04-05T10:57:14.980852Z","iopub.status.idle":"2022-04-05T10:57:14.983924Z","shell.execute_reply.started":"2022-04-05T10:57:14.980814Z","shell.execute_reply":"2022-04-05T10:57:14.98328Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## 9.\n\nCreate a DataFrame `top_oceania_wines` containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.","metadata":{}},{"cell_type":"code","source":"top_oceania_wines = reviews.loc [(reviews.country.isin(['Australia','New Zealand'])) & (reviews.points>=95)]\n\n# Check your answer\nq9.check()\ntop_oceania_wines","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:15.119265Z","iopub.execute_input":"2022-04-05T10:57:15.119615Z","iopub.status.idle":"2022-04-05T10:57:15.153807Z","shell.execute_reply.started":"2022-04-05T10:57:15.119581Z","shell.execute_reply":"2022-04-05T10:57:15.15317Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"#q9.hint()\n#q9.solution()","metadata":{"execution":{"iopub.status.busy":"2022-04-05T10:57:15.330681Z","iopub.execute_input":"2022-04-05T10:57:15.331107Z","iopub.status.idle":"2022-04-05T10:57:15.335065Z","shell.execute_reply.started":"2022-04-05T10:57:15.331068Z","shell.execute_reply":"2022-04-05T10:57:15.334284Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Keep going\n\nMove on to learn about **[summary functions and maps](https://www.kaggle.com/residentmario/summary-functions-and-maps)**.","metadata":{}},{"cell_type":"markdown","source":"---\n\n\n\n\n*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/pandas/discussion) to chat with other learners.*","metadata":{}}]}