Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix old SCTs that fail assertions #351

Open
filipsch opened this issue Sep 12, 2018 · 2 comments
Open

Fix old SCTs that fail assertions #351

filipsch opened this issue Sep 12, 2018 · 2 comments

Comments

@filipsch
Copy link
Contributor

filipsch commented Sep 12, 2018

Since pythonwhat v2.16.0, different SCT functions do some assertions and verifications to figure out whether the function is appropriately used. You can see it in the CHANGELOG here.

Some of these assertions are only done for courses whose course image contains the PYTHONWHAT_V2_ONLY env variable (more recent courses), because otherwise these changes would break 55 exercises that contain SCTs that violate these rules:

   course_id chapter_id     id number                                                                              title
1        998       2244  18929      3                                                             Loading a pickled file
2        998       2284  18950      8                                  Filtering your database records using SQL's WHERE
3        998       2284  18951     14                  The power of SQL lies in relationships between tables: INNER JOIN
4        998       2284  19468      7                                         Customizing the Hello World of SQL Queries
5        998       2284  19469      9                                            Ordering your SQL records with ORDER BY
6        998       2284  19793     11                                         Pandas and The Hello World of SQL Queries!
7        998       2372  20216     12            Turning a webpage into data using BeautifulSoup: getting the hyperlinks
8        998       2388  20437     13                                                 Load and explore your Twitter data
9        998       2388  20441     15                                              A little bit of Twitter text analysis
10       998       2388  20834      3                                                       Loading and exploring a JSON
11      1115       2469  21424      3                                         Filter data selected from a Table - Simple
12      1115       2470  21436      3                                       Calculating a Difference between Two Columns
13      1115       2470  21442     10                                    Using alias to handle same table joined queries
14      1115       2471  21450      7                                                         Loading a CSV into a Table
15      1115       2471  21452      9                                                        Updating individual records
16      1115       2472  21463      5                                                      Reading the Data from the CSV
17      1115       2472  21466      8                           Build a Query to Determine the Average Age by Population
18      1115       2472  21467      9        Build a Query to Determine the Percentage of Population by Gender and State
19      1115       2472  21468     10 Build a Query to Determine the Difference by State from the 2000 and 2008 Censuses
20      1115       2470  33053     11                          Leveraging Functions and Group_bys with Hierarchical Data
21      1531       3844  38857      8                                                                          Using zip
22      1531       3845  38874      9                                                                Dict comprehensions
23      1550       3936  40011      8                                                                 How is it optimal?
24      1550       3939  40032      8                                             Hypothesis test on Pearson correlation
25      1550       3940  40049     17                                     Is beak depth heritable at all in G. scandens?
26      1550       3937  40360      4                                                      Visualizing bootstrap samples
27      1550       3938  47093      3                                                   Visualizing permutation sampling
28      1550       3936  63252     12                                             Linear regression on all Anscombe data
29      1606       4135  42717     12            Turning a webpage into data using BeautifulSoup: getting the hyperlinks
30      1606       4136  42720      3                                                       Loading and exploring a JSON
31      1606       4140  42795      6                                              A little bit of Twitter text analysis
32      1607       4138  42757      3                                                             Loading a pickled file
33      1607       4139  42780      8                                         Customizing the Hello World of SQL Queries
34      1607       4139  42781      9                                  Filtering your database records using SQL's WHERE
35      1607       4139  42782     10                                            Ordering your SQL records with ORDER BY
36      1639       4284  59225     10                                                                    Sunny or cloudy
37      1681       4403  62438     10                                  Concatenating vertically to get MultiIndexed rows
38      1681       4403  62441     13                                               Concatenating DataFrames from a dict
39      1681       4403  92426      8                                        Reading multiple files to build a DataFrame
40      1822       4794  51256      4                                                             Finding open triangles
41      2072       5698  63534      6                                                        How many clusters of grain?
42      2533       7496  86163      9                                         Encode the labels as categorical variables
43      2533       7496  95226     10                                                             Counting unique labels
44      3629      10390 119216      9                                                       Regex with NLTK tokenization
45      3629      10390 119217     10                                                             Non-ascii tokenization
46      3629      10390 119219     12                                                                  Charting practice
47      3679      10556 121072      8                                                    Using regularization in XGBoost
48      3679      10557 121076      3                                               Tuning the number of boosting rounds
49      3679      10557 227050      6                                                                         Tuning eta
50      3679      10557 227051      7                                                                   Tuning max_depth
51      3679      10557 227052      8                                                            Tuning colsample_bytree
52      4299      13274 151880      8                                                              Filtering on a phrase
53      6221      19972 258084     11                                                                  Balancing classes
54      6221      19972 258085     12                                            Comparison of Employee attrition models
55      7032      27285 321296     14                                                                   Detect edges (2)

This issue is a reminder that we have to rewrite the SCTs for these exercises because they are smelly. As soon as that is done, we can remove this v2_only() check and do the assertions for all SCTs, independent of the PYTHONWHAT_V2_ONLY env variable.

How I came up with this list

  • Make a branch on pythonwhat that does not do the if v2_only() check
  • Install this branch of pythonwhat in docker-python-shared (using pip install git+...) and tag it (with a release candidate format, as it is never the intention to deploy)
  • This makes the validator go through all exercises with this hypothetical shared image, and spit out the exercises where it goes wrong.
  • Finally, you can use this script to programmatically a list of exercise ids from the validator endpoint and their corresponding chapters and courses.
@klmedeiros
Copy link
Contributor

@hermansje Can you re-run the script Filip used to generate this list just so I can be sure it's up to date, if you have access to it? I did try re-running the script he linked to, but I don't have the required .rds files.

A lot of these courses have been very much updated since he filed this issue, and I don't want to duplicate work/accidentally break things further. Thanks!

@hermansje
Copy link
Member

@klmedeiros It's down to 24:

   course_id chapter_id     id number                                                                   title
1       1531       3844  38857      8                                                               Using zip
2       1531       3845  38874      9                                                     Dict comprehensions
3       1606       4135  42717     12 Turning a webpage into data using BeautifulSoup: getting the hyperlinks
4       1606       4136  42720      3                                            Loading and exploring a JSON
5       1606       4140  42795      6                                   A little bit of Twitter text analysis
6       1607       4138  42757      3                                                  Loading a pickled file
7       1607       4138  42763      9                                                     Importing SAS files
8       1607       4139  42780      8                              Customizing the Hello World of SQL Queries
9       1607       4139  42781      9                       Filtering your database records using SQL's WHERE
10      1607       4139  42782     10                                 Ordering your SQL records with ORDER BY
11      1681       4405  62462      2                                       Loading Olympic edition DataFrame
12      1681       4403  92426      8                             Reading multiple files to build a DataFrame
13      1681       4405  95335      3                                             Loading IOC codes DataFrame
14      1822       4794  51256      4                                                  Finding open triangles
15      2072       5698  63534      6                                             How many clusters of grain?
16      3629      10390 119216      9                                            Regex with NLTK tokenization
17      3629      10390 119217     10                                                  Non-ascii tokenization
18      3629      10390 119219     12                                                       Charting practice
19      3679      10556 121072      8                                         Using regularization in XGBoost
20      3679      10557 121076      3                                    Tuning the number of boosting rounds
21      3679      10557 227050      6                                                              Tuning eta
22      3679      10557 227051      7                                                        Tuning max_depth
23      3679      10557 227052      8                                                 Tuning colsample_bytree
24      4299      13274 151880      8                                                   Filtering on a phrase

pythonwhat commit
shared image commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants