SWAN is a machine-learning approach that detects security-relevant methods (SRM) in Java programs. SWAN should be used in combination with other static analyses tools and it helps the users to create a set of relevant methods required as an input for static analyses, e.g. taint- and type-state analysis. The tool currently detects four types of security relevant methods, namely: source, sink, sanitizer, and authentication methods. SWAN also labels methods as relevant for 7 Common Weakness Enumeration (CWE), namely: CWE78 OS Command Injection, CWE79 Cross-site Scripting, CWE89 SQL Injection, CWE306 Missing Authentication, CWE601 Open Redirect, CWE862 Missing Authorisation, and CWE863 Incorrect Authorisation.
The project is divided into two main components: the command line tool swan-cmd
and the IntelliJ plugin dev-assist
that provides a GUI for SWAN.
swan-cmd
is the command line implementation for SWAN with components for data collection, feature engineering, model selection and SRM prediction. The command line tool uses the following Maven modules:training-data-jars
contains dependencies from which the training examples are extracted.- Java Doclets to process and export software documentation
coverage-doclet
calculates the software documentation coverage of Java programs based on the presence of doc comments for classes, methods, and other objects.xml-doclet
exports doc comments to XML files so that they can be analyzed by the Natural Language Processing (NLP) module
dev-assist
provides GUI support for SWAN and enables active machine learning.
To run SWAN, you will need to provide a path to the Java project to be analyzed (JAR files or compiled classes) as well an output directory where SWAN will export its results. The easiest way to get started with SWAN is to use the pre-built binary from the newest release. After downloading the necessary files from the most recent release, SWAN can be executed on the command line with the following command:
java -jar swan-cmd-3.x.x.jar -test /path/to/project/files -o /output/directory
This command runs the application and exports the detected security-relevant methods to a JSON file in the provided output directory. This command uses the following default settings: training dataset -in dataset
, code features -f code
, and the MEKA toolkit -t meka
. The remaining default options are found in CLIRunner. The available command line options can be found in the Wiki or by using the -help
command line option.
If you cloned the project or downloaded SWAN as a compressed release (e.g. .zip or .tar.gz), you can use mvn package
to package the project. The commands provided above can then be used to run the generated JAR file. Alternatively, you can import the project directly into your IDE from the repository and package the project via the terminal or the Maven plugin in your IDE.
The following persons have contributed to SWAN: Goran Piskachev ([email protected]), Lisa Nguyen ([email protected]), Oshando Johnson ([email protected]), Eric Bodden ([email protected]).