Microsoft Malware detection

Problem statement: https://www.kaggle.com/c/malware-classification

Steps to start with this case study:

1. Download the data from the link below Data: https://www.kaggle.com/c/malware-classification/data
2. Extract the data. You can use the below code to install p7zip
!sudo apt install p7zip-full

Then run the below code in the jupyter notebook to unzip files.
!7z x train.7z

EDA notebook -

Understand the problem statement, metric we are using and the sample data for each of Byte and ASM files.
Distribution of class labels in train and test data.

Byte files feature generation notebook -

1. Analyzed the size of byte files.
2. Extract unigram features using custom vectorizer.
3. Extract bigram features using custom vectorizer.

ASM files feature generation notebook -

1. Count the number of prefixes, opcode, keywords, registers for each file using multi-processing.
2. Analyzed the size of ASM files.
3. Extract graph features using multi-processing.
4. Extract image features using multi-processing. (As suggested by 'say no to overfitting' in their video taking the pixel density values of first 800 values)
Multi-variate analysis of ASM features.

Modelling notebook -

1. Multi-variate anaylsis on the final features.
2. Training XGBoost model on the final features.
3. Further possible improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ASM files feature extraction.ipynb		ASM files feature extraction.ipynb
Byte files feature generation.ipynb		Byte files feature generation.ipynb
EDA.ipynb		EDA.ipynb
Modelling.ipynb		Modelling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft Malware detection

Steps to start with this case study:

EDA notebook -

Byte files feature generation notebook -

ASM files feature generation notebook -

Modelling notebook -

This repository contains about the Microsoft malware detection challenge.

About

Releases

Packages

Languages

GowthamChowta/malware_casestudy

Folders and files

Latest commit

History

Repository files navigation

Microsoft Malware detection

Steps to start with this case study:

EDA notebook -

Byte files feature generation notebook -

ASM files feature generation notebook -

Modelling notebook -

This repository contains about the Microsoft malware detection challenge.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages