Command Line Options

SWAN can be executed in two main mode: cross-validation and prediction. In the cross-validation mode, the tool performs cross-validation using the default training set and outputs the metrics. In the prediction mode, the user needs to provide a path to a Java project (JAR or .class files) and SWAN will predict the categories for the methods in finds in the project. In these two modes, it is possible to change numerous settings in SWAN such as the training dataset, machine learning toolkit and flags for the feature and model selection phases.

The command line options available in SWAN are described below and can also be seen on the command line by running SWAN with the -help option.

Parameter	Description
`-p`, `--phase`	Mode in which SWAN should be executed. By default, SWAN runs in `predict` mode where it predicts the category for methods in the test set. However, the `validate` can be used for cross-validation.
`-test`, `-test-data`	Path to Java project (JAR or .class files) that should be used as the test set for which SWAN will predict SRM/CWE categories.
`-d`, `-dataset`	Path to JSON file that contains training examples (contains method signature (fully qualified name, return type, and parameters), method classification (SRM and CWE), data flow information, doc comments, etc.). SWAN uses swan-dataset.json as the default training dataset. A user can also provide their own training dataset, however, they should also provide a path to the Java project that contains the methods in JSON file using the `-train` command.
`-train`, `-train-data`	Path to Java project that contains the methods that belong to the training dataset.
`-s`, `-srm`	List of security-relevant types that should be evaluated. By default all SRMs are processed but a list of SRMs can also be provided. Options: `source`, `sink`, `sanitizer`, `authentication`.
`-c`, `-cwe`	List of CWE types that should be evaluated. Similar to SRMs, all CWEs are used by default but a list of CWEs can also be provided. Options: `cwe078`, `cwe079`, `cwe089`, `cwe306`, `cwe601`, `cwe862` and `cwe863`.
`-in`, `--train-instances`	Instead of computing the features each time for the training set, this information is stored in an ARFF file. Based on the settings provided to SWAN, the corresponding ARFF file is automatically loaded and used to train the ML model. The path to an external ARFF file containing the training instances can also be provided by the user.
`-o`, `--output`	Directory where ARFF data, JSON file containing the predicted categories and other files are exported.
`-f`, `--feature`	SWAN can be executed using various feature sets. By default, SWAN uses all feature setts but a list of feature sets can be provided. Options: `code` (features based on source code), `doc-auto` (embedded ML features) and `doc-manual` (manual features based on CoreNLP)
`-t`, `--toolkit`	Machine learning toolkit used for model selection. By default, `meka` is used but the user can also specify `weka` or `ml-plan` .
`-arff`, `--arff-data`	Flag to export training and test instances to ARFF files
`-doc`, `--documented`	Flag to use only methods from the dataset that have software documentation.
`-i`, `--iterations`	Number of iterations for cross-validation, by default `10`.
`-sp`, `--training-split`	Percentage split for training and test data, by default `0.7` (70% training, 30% test)
`-pt`, `--prediction-threshold`	Threshold for predicting categories, by default `0.5`. Mthods that have a value equal to or above this value will be classified into the category.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command Line Options

Clone this wiki locally