This repository contains an implementation of the paper Clustering Recurrent and Semantically-Cohesive Program Statements in Introductory Programming Assignments.
We provide the dataset used in our evaluation as serialized program dependence graphs in the 'assignments.zip' file. Unfortunately, we cannot provide the original source code due to CodeChef's policies. However, each serialized submission is given its original identifier as a name and we provide links to each of the assignments.
- Unzip 'assignments.zip', which contains the serialized program dependence graphs for submissions in the different assignments.
- Run
edu.rit.goal.Exp1
to run a single iteration of the core statement mining with fixed µ and ε. - Run
edu.rit.goal.Exp2
to run an iterative process for core statement mining with µ set to a percentage of the submissions and fixed ε.
Below is a table with links to the different assignments as well as the identifier used in the experiments for each of them.
Assignment | id |
---|---|
JOHNY | 0 |
CARVANS | 1 |
BUYING2 | 2 |
MUFFINS3 | 3 |
CLEANUP | 4 |
CONFLIP | 5 |
LAPIN | 6 |
PERMUT2 | 7 |
STONES | 8 |
SUMTRIAN | 9 |