Skip to content

Command Line Options

Oshando Johnson edited this page Mar 11, 2022 · 1 revision

SWAN can be executed in two main mode: cross-validation and prediction. In the cross-validation mode, the tool performs cross-validation using the default training set and outputs the metrics. In the prediction mode, the user needs to provide a path to a Java project (JAR or .class files) and SWAN will predict the categories for the methods in finds in the project. In these two modes, it is possible to change numerous settings in SWAN such as the training dataset, machine learning toolkit and flags for the feature and model selection phases.

The command line options available in SWAN are described below and can also be seen on the command line by running SWAN with the -help option.

Parameter Description
-p, --phase Mode in which SWAN should be executed. By default, SWAN runs in predict mode where it predicts the category for methods in the test set. However, the validate can be used for cross-validation.
-test, -test-data Path to Java project (JAR or .class files) that should be used as the test set for which SWAN will predict SRM/CWE categories.
-d, -dataset Path to JSON file that contains training examples (contains method signature (fully qualified name, return type, and parameters), method classification (SRM and CWE), data flow information, doc comments, etc.). SWAN uses swan-dataset.json as the default training dataset. A user can also provide their own training dataset, however, they should also provide a path to the Java project that contains the methods in JSON file using the -train command.
-train, -train-data Path to Java project that contains the methods that belong to the training dataset.
-s, -srm List of security-relevant types that should be evaluated. By default all SRMs are processed but a list of SRMs can also be provided. Options: source, sink, sanitizer, authentication.
-c, -cwe List of CWE types that should be evaluated. Similar to SRMs, all CWEs are used by default but a list of CWEs can also be provided. Options: cwe078, cwe079, cwe089, cwe306, cwe601, cwe862 and cwe863.
-in, --train-instances Instead of computing the features each time for the training set, this information is stored in an ARFF file. Based on the settings provided to SWAN, the corresponding ARFF file is automatically loaded and used to train the ML model. The path to an external ARFF file containing the training instances can also be provided by the user.
-o, --output Directory where ARFF data, JSON file containing the predicted categories and other files are exported.
-f, --feature SWAN can be executed using various feature sets. By default, SWAN uses all feature setts but a list of feature sets can be provided. Options: code (features based on source code), doc-auto (embedded ML features) and doc-manual (manual features based on CoreNLP)
-t, --toolkit Machine learning toolkit used for model selection. By default, meka is used but the user can also specify weka or ml-plan .
-arff, --arff-data Flag to export training and test instances to ARFF files
-doc, --documented Flag to use only methods from the dataset that have software documentation.
-i, --iterations Number of iterations for cross-validation, by default 10.
-sp, --training-split Percentage split for training and test data, by default 0.7 (70% training, 30% test)
-pt, --prediction-threshold Threshold for predicting categories, by default 0.5. Mthods that have a value equal to or above this value will be classified into the category.
Clone this wiki locally