Skip to content

Anurag-Swarnim-Yadav/DatasetQuality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R+R: Security Vulnerability Dataset Quality Is Critical

ACSAC 2024: TECHNICAL PAPER - PAPER #300

TO BE PRESENTED AT THE ACSAC 2024, DECEMBER 9-13, 2024

ENVIRONMENT SETUP

CONDA SETUP [Anaconda3 4.9.2 Installer for Linux-64]

  1. Open your terminal.
  2. Use the following command to download the specific version of Anaconda3
    curl -O https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
  3. Once the installer has been downloaded, run it using the following command
    bash Anaconda3-2020.11-Linux-x86_64.sh
  4. Once the installation is complete, to ensure that Conda 4.9.2 has been installed correctly, restart your terminal and run the following command
    conda --version
  5. You should see: conda 4.9.2

CONDA ENVIRONMENT SETUP [PYTHON 3.10.12]

  1. After Conda is installed and initialized, open your terminal.
  2. Run the following command to create a new conda environment with Python 3.10.12
    conda create --name DatasetQuality python=3.10.12
  3. Once the environment is created, activate it using the following command
    conda activate DatasetQuality
  4. To make sure that the correct version of Python (3.10.12) is installed in the environment
    python --version
  5. You should see: Python 3.10.12

REQUIRED PACKAGES [GIT CLONE]

  1. First, install pip in your environment (if not already installed)
    conda install pip
  2. Clone our repository to your machine
    git clone https://github.com/Anurag-Swarnim-Yadav/DatasetQuality.git
    cd DatasetQuality
  3. Install the necessary packages using pip
    • For NVIDIA A100-SXM4-80GB
      pip install -r requirements.txt
    • For other setups
      pip install -r requirements-small.txt
  4. You can verify that the packages have been installed correctly by running
    pip list

DATASET OVERVIEW

VULREPAIR DATASET [RQ1]

Samples Train Test
Total Samples (TS) 6,776 1,706

VULREPAIR DATASET ANALYSIS - UNIQUENESS [RQ2A AND RQ2B]

Samples Train Test
Total Samples (TS) 6,776 1,706
In-Set Duplicates (IS Dup) 1,593 91
Samples Left (SL = TS - IS Dup) 5,183 1,615
Cross-Set Duplicates (CS Dup) 796

VULREPAIR DATASET ANALYSIS - CONSISTENCY [RQ3A AND RQ3B]

Samples Train Test
Total Samples (TS) 6,776 1,706
In-Set Duplicates (IS Dup) 1,858 111
Samples Left (SL = TS - IS Dup) 4,918 1,595
Cross-Set Duplicates (CS Dup) 923

BUG-FIX DATASET ANALYSIS [RQ5]

Samples Train Validation
Total Samples (TS) 534,858 10,000
In-Set Duplicates (IS Dup) 6,192 4
Samples Left (SL = TS - IS Dup) 528,666 9,996
Cross-Set Duplicates (CS Dup) 247

Note: The Bug-Fix dataset is available at
https://github.com/ASSERT-KTH/VRepair/releases/download/v20240223/BugFix.tar.bz2.
Thanks to the authors of VRepair.

EXPERIMENTS

In an attempt to provide robust performance evaluations, each result is reported as the mean performance of six networks trained using different random seeds.
Six random Seeds are: 26312, 43511, 67732, 70757, 95541, and 123456

RQ1

DATASET

Samples Train Test Comments
Total Samples (TS) 6,776 1,706 Contains IS and CS Duplicates

Note: IS: In-Set and CS: Cross-Set Duplicates

TO RUN THE RQ1 EXPERIMENT

cd RQ1

VULREPAIR

  1. To train the VulRepair Model. Run the following command in your terminal
    sh run_VulRepair_train.sh
  2. Once the model is trained, navigate to cd RQ1-Code/VulRepair/ to see the VulRepair_train.log file and the new folder VulRepair_model, where the mode will be saved.
  3. To test the VulRepair Model. Go back to cd ../.. and run the following command in your terminal
    sh run_VulRepair_test.sh
  4. Once finished, navigate to cd RQ1-Code/VulRepair/, and you will see VulRepair_test.log as well as the new folder raw_predictions, which will have the model prediction.

CODEBERT

  1. To train the CodeBERT Model. Run the following command in your terminal
    sh run_CodeBERT_train.sh
  2. Once the model is trained, navigate to cd RQ1-Code/CodeBERT/ to see the CodeBERT_train.log file and the new folder CodeBERT_model, where the mode will be saved.
  3. To test the CodeBERT Model. Go back to cd ../.. and run the following command in your terminal
    sh run_CodeBERT_test.sh
  4. Once finished, navigate to cd RQ1-Code/CodeBERT/, and you will see CodeBERT_test.log as well as the new folder raw_predictions, which will have the model prediction.

GRAPHCODEBERT

  1. To train the GraphCodeBERT Model. Run the following command in your terminal
    sh run_GraphCodeBERT_train.sh
  2. Once the model is trained, navigate to cd RQ1-Code/GraphCodeBERT/ to see the GraphCodeBERT_train.log file and the new folder GraphCodeBERT_model, where the mode will be saved.
  3. To test the GraphCodeBERT Model. Go back to cd ../.. and run the following command in your terminal
    sh run_GraphCodeBERT_test.sh
  4. Once finished, navigate to cd RQ1-Code/GraphCodeBERT/, and you will see GraphCodeBERT_test.log as well as the new folder raw_predictions, which will have the model prediction.

UNIXCODER

  1. To train the UniXcoder Model. Run the following command in your terminal
    sh run_UniXcoder_train.sh
  2. Once the model is trained, navigate to cd RQ1-Code/UniXcoder/ to see the UniXcoder_train.log file and the new folder UniXcoder_model, where the mode will be saved.
  3. To test the UniXcoder Model. Go back to cd ../.. and run the following command in your terminal
    sh run_UniXcoder_test.sh
  4. Once finished, navigate to cd RQ1-Code/UniXcoder/, and you will see UniXcoder_test.log as well as the new folder raw_predictions, which will have the model prediction.

REPLICATION RESULT OF VULREPAIR, CODEBERT, GRAPHCODEBERT AND UNIXCODER.

TO VERIFY OUR RESULTS, within the RQ1 folder, we have six different seed subfolders, each containing the raw prediction file for its respective model.

Models PP Reported PP Replicated Change
VulRepair/CodeT5 44% Fu et al. ; 44.96% Zhang et al. 40.42% -3.58% ; -4.54%
CodeBERT 31% Fu et al. ; 32.94% Zhang et al. 33.20% +2.20% ; +0.74%
GraphCodeBERT 37.98% Zhang et al. 38.51% +0.53%
UniXcoder 40.62% Zhang et al. 40.96% +0.34%

Note: The trained models will be released separately.

RQ2A

DATASET

Samples Train Test
Unique Samples 4387 1615

TO RUN THE RQ2A EXPERIMENT

cd RQ2A

VULREPAIR

  1. To train
    sh run_VulRepair_train.sh
  2. To test
    sh run_VulRepair_test.sh

CODEBERT

  1. To train
    sh run_CodeBERT_train.sh
  2. To test
    sh run_CodeBERT_test.sh

GRAPHCODEBERT

  1. To train
    sh run_GraphCodeBERT_train.sh
  2. To test
    sh run_GraphCodeBERT_test.sh

UNIXCODER

  1. To train
    sh run_UniXcoder_train.sh
  2. To test
    sh run_UniXcoder_test.sh

RESULT

TO VERIFY OUR RESULTS, within the RQ2A folder, we have six different seed subfolders, each containing the raw prediction file for its respective model.

Models PP RQ2A % of Replication
VulRepair/CodeT5 8.91% 22.0% (8.91/40.42)
CodeBERT 5.58% 16.8% (5.58/33.20)
GraphCodeBERT 5.31% 13.7% (5.31/38.51)
UniXcoder 4.82% 11.8% (4.82/40.96)

Note: PP RQ2A shows perfect prediction scores on running on RQ2A dataset and % of Replication shows the fraction of perfect prediction in our replicated results from the VulRepair dataset.

RQ2B

DATASET

Samples Train Test
Unique Samples 5183 819

TO RUN THE RQ2B EXPERIMENT

cd RQ2B

VULREPAIR

  1. To train
    sh run_VulRepair_train.sh
  2. To test
    sh run_VulRepair_test.sh

CODEBERT

  1. To train
    sh run_CodeBERT_train.sh
  2. To test
    sh run_CodeBERT_test.sh

GRAPHCODEBERT

  1. To train
    sh run_GraphCodeBERT_train.sh
  2. To test
    sh run_GraphCodeBERT_test.sh

UNIXCODER

  1. To train
    sh run_UniXcoder_train.sh
  2. To test
    sh run_UniXcoder_test.sh

RESULT

TO VERIFY OUR RESULTS, within the RQ2B folder, we have six different seed subfolders, each containing the raw prediction file for its respective model.

Models PP RQ2B % of Replication
VulRepair/CodeT5 13.17% 33% (13.17/40.42)
CodeBERT 8.83% 27% (8.83/33.20)
GraphCodeBERT 9.22% 24% (9.22/38.51)
UniXcoder 9.10% 22% (9.10/40.96)

RQ3A

DATASET

Samples Train Test
Unique Samples 3995 1595

TO RUN THE RQ3A EXPERIMENT

cd RQ3A

VULREPAIR

  1. To train
    sh run_VulRepair_train.sh
  2. To test
    sh run_VulRepair_test.sh

CODEBERT

  1. To train
    sh run_CodeBERT_train.sh
  2. To test
    sh run_CodeBERT_test.sh

GRAPHCODEBERT

  1. To train
    sh run_GraphCodeBERT_train.sh
  2. To test
    sh run_GraphCodeBERT_test.sh

UNIXCODER

  1. To train
    sh run_UniXcoder_train.sh
  2. To test
    sh run_UniXcoder_test.sh

RESULT

TO VERIFY OUR RESULTS, within the RQ3A folder, we have six different seed subfolders, each containing the raw prediction file for its respective model.

Models PP RQ3A % of Replication
VulRepair/CodeT5 7.14% 17.7% (7.14/40.42)
CodeBERT 3.59% 10.8% (3.59/33.20)
GraphCodeBERT 3.75% 9.7% (3.75/38.51)
UniXcoder 4.11% 10.0% (4.11/40.96)

RQ3B

DATASET

Samples Train Test
Unique Samples 4918 672

TO RUN THE RQ3B EXPERIMENT

cd RQ3B

VULREPAIR

  1. To train
    sh run_VulRepair_train.sh
  2. To test
    sh run_VulRepair_test.sh

CODEBERT

  1. To train
    sh run_CodeBERT_train.sh
  2. To test
    sh run_CodeBERT_test.sh

GRAPHCODEBERT

  1. To train
    sh run_GraphCodeBERT_train.sh
  2. To test
    sh run_GraphCodeBERT_test.sh

UNIXCODER

  1. To train
    sh run_UniXcoder_train.sh
  2. To test
    sh run_UniXcoder_test.sh

RESULT

TO VERIFY OUR RESULTS, within the RQ3B folder, we have six different seed subfolders, each containing the raw prediction file for its respective model.

Models PP RQ3B % of Replication
VulRepair/CodeT5 10.27% 25.5% (10.27/40.24)
CodeBERT 5.38% 16.2% (5.38/33.20)
GraphCodeBERT 6.25% 16.2% (6.25/38.51)
UniXcoder 6.18% 15.0% (6.18/40.96)

RQ4A

In this research question, we report the performance of each of the models studied on the top 10 CWEs, showing their performance when duplicate and inconsistent samples are removed from consideration.

RQ4A-Top-10-ModelPerformance

RQ4B

In this research question, we first assessed whether the samples had the correct CWE tags. If a sample was found to have an incorrect CWE tag, we identified the correct tag through manual analysis of the sample. Additionally, we evaluated whether the corresponding fix was complete based on manual analysis.

Rank CWE Type Name RQ2B Samples Accurate Complete Accurate & Complete
1 CWE-787 Out-of-bounds Write 33 15 18 12
2 CWE-79 Cross-site Scripting 0 0 0 0
5 CWE-78 OS Command Injection 1 0 0 0
6 CWE-89 SQL Injection 1 1 1 1
7 CWE-416 Use After Free 29 11 18 7
8 CWE-22 Path Traversa 2 1 0 0
9 CWE-352 Cross-Site Request Forgery 2 2 1 1
10 CWE-434 Dangerous File Type - - - -
Total 68 30 38 21

RQ5 [TRANSFER LEARNING]

BUG-FIX DATASET

Samples Train Validation
Unique Samples 528419 9996

FOR FINETUNING, WE USED RQ3B DATASET.

RESULT

We have released all the models at https://doi.org/10.5281/zenodo.11582874
Unzip the folder using unzip filename.zip

PRE-TRAINING

Pretraning is done on Seed 26312.
Download
VulRepairRQ5_Seed26312
CodeBERTRQ5_Seed26312
GraphCodeBERTRQ5_Seed26312
UniXcoderRQ5_Seed26312

VULREPAIR

  1. To train
    sh run_pretrain.sh
  2. To test
    sh run_pretrain_test.sh

CODEBERT

  1. To train
    sh run_pretrain.sh
  2. To test
    sh run_pretrain_test.sh

GRAPHCODEBERT

  1. To train
    sh run_pretrain.sh
  2. To test
    sh run_pretrain_test.sh

UNIXCODER

  1. To train
    sh run_pretrain.sh
  2. To test
    sh run_pretrain_test.sh

TRANSFER LEARNING

Transfer learning is done on six random Seeds: 26312, 43511, 67732, 70757, 95541, and 123456
Download all the folders.
Testing is done on beam sizes: 1, 3, 5, 10, 20, 30, 40, 50

VULREPAIR

  1. To train
    sh run_train.sh
  2. To test
    sh run_test.sh

CODEBERT

  1. To train
    sh run_train.sh
  2. To test
    sh run_test.sh

GRAPHCODEBERT

  1. To train
    sh run_train.sh
  2. To test
    sh run_test.sh

UNIXCODER

  1. To train
    sh run_train.sh
  2. To test
    sh run_test.sh

TO VERIFY OUR RESULTS, the RQ5 folder is organized into two subfolders: Bug-Fix and Transfer-Learning. The Bug-Fix subfolder contains one seed folder, while the Transfer-Learning subfolder includes six distinct seed folders, each holding the raw prediction file for the corresponding model.

Models Beam = 1 Beam = 3 Beam = 5 Beam = 50
Datasets BF TL BF TL BF TL BF VR TL
VulRepair 3.6% 13.5% 7.4% 19.0% 7.6% 20.2% 6.55% 10.27% 18.67%
CodeBERT 3.0% 12.5% 4.6% 17.3% 5.36% 18.9% 11.76% 5.38% 24.55%
GraphCodeBERT 2.2% 11.5% 4.6% 16.9% 5.8% 19.0% 11.76% 6.25% 25.42%
UniXcoder 1.9% 12.9% 5.2% 18.1% 6.6% 19.7% 11.31% 6.18% 26.07%

Note: BF = Bug-Fix, TL = Transfer Learning, VR = Vulnerability Repair

BF (Bug-Fix): The models are trained on the bug-fix dataset and tested on the RQ3B vulnerability dataset.
TL (Transfer Learning): The models are initially trained on the bug-fix dataset and subsequently fine-tuned on the RQ3B vulnerability dataset.

%PP as a Function of Beam Size for RQ5

RELATED WORK

VISION TRANSFORMER INSPIRED AUTOMATED VULNERABILITY REPAIR PAPER [VQM]

We conducted a brief analysis of both the pre-training bug-fix dataset and VQM vulnerability fine-tuning dataset used in that paper. The pre-training dataset contains 21,246 training samples and 2,362 validation samples. Our review revealed 18,622 duplicated entries in the training set and 782 duplicates in the validation set. After removing those, 1,579 cross-set duplicates (in both train and validation) were identified, which were all of the validation set code samples present in the training set. Additionally, our analysis uncovered a substantial overlap between the bug-fix dataset and the VQM vulnerability fine-tuning dataset. Specifically, there were 511 matching entries in the test set, 243 in the validation set, and 1,747 in the training set of the vulnerability dataset that overlapped with the bug-fix dataset.

CODET5 BEAM ANALYSIS [NEW WORK - NOT INCLUDED IN THE PAPER]

REPLICATION

Zhang et al. investigated the impact of varying beam size values. To verify their findings, we utilized the same dataset provided by the authors and attempted to replicate the results. Our observations indicate that as the beam size increases, the %PP value goes up.

DATASET

Samples Train Validation Test
Total Samples (TS) 5937 839 1706

RESULT

Seed Beam = 1 Beam = 2 Beam = 3 Beam = 4 Beam = 5 Beam = 10 Beam = 15 Beam = 20 Beam = 50 Beam = 100
26312 0.3130 0.3623 0.3816 0.3951 0.3992 0.4127 0.4185 0.4191 0.4220 0.4174
43511 0.2198 0.2708 0.2948 0.3019 0.3095 0.3259 0.3306 0.3271 0.3247 0.3288
67732 0.3212 0.3769 0.3992 0.4127 0.4174 0.4291 0.4332 0.4343 0.4297 0.4308
70757 0.2726 0.3359 0.3517 0.3634 0.3693 0.3851 0.3845 0.3851 0.3875 0.3845
95541 0.2655 0.3253 0.3453 0.3617 0.3664 0.3787 0.3810 0.3810 0.3834 0.3769
123456 0.2521 0.3013 0.3376 0.3470 0.3529 0.3681 0.3681 0.3681 0.3693 0.3681
Average PP 27.33% 32.83% 35.17% 36.33% 36.83% 38.33% 38.67% 38.50% 38.67% 38.50%

CodeT5: %PP as a Function of Beam Size

NO DUPLICATE SAMPLES

For this experiment, we removed Infile and Crossfile duplicates from the dataset. We reran the CodeT5 model and observed that beyond a beam size of 15, %PP decreases. We were unable to find reported in any previously published papers.

DATASET

Samples Train Validation Test
Total Samples (TS) 5937 839 1706
In-Set Duplicates (IS Dup) 1418 27 111
Sample Left(SL = TS - IS Dup) 4519 812 1595
Cross-Set Duplicates (TEST) - - Tr:815,V:108
Cross-Set Duplicates (VALIDATION) - Tr:413 -
Unique Samples (US = SL - CS Dup) 4519 399 672

RESULT

Seed Beam = 1 Beam = 2 Beam = 3 Beam = 4 Beam = 5 Beam = 10 Beam = 15 Beam = 20 Beam = 50 Beam = 100
26312 0.0536 0.0759 0.0818 0.0863 0.0908 0.0878 0.0893 0.0878 0.0848 0.0789
43511 0.0580 0.0833 0.0908 0.0952 0.0967 0.1057 0.1057 0.1027 0.0952 0.0923
67732 0.0625 0.0893 0.0938 0.1042 0.0982 0.0982 0.1042 0.1027 0.0982 0.0923
70757 0.0714 0.0952 0.1027 0.1071 0.1101 0.1146 0.1176 0.1176 0.1057 0.1057
95541 0.0536 0.0729 0.0804 0.0893 0.0908 0.0982 0.0997 0.0982 0.0938 0.0893
123456 0.0491 0.0699 0.0878 0.0923 0.0952 0.0982 0.0952 0.0952 0.0833 0.0804
Average PP 5.83% 8.17% 9.00% 9.50% 9.67% 10.00% 10.17% 10.00% 9.33% 9.00%

CodeT5: %PP as a Function of Beam Size

Note: The source code, dataset, model prediction output, train log, and test log files are under CodeT5_Beam_Analysis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published