Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset #1

Open
Divya2895 opened this issue Mar 15, 2019 · 7 comments
Open

dataset #1

Divya2895 opened this issue Mar 15, 2019 · 7 comments
Labels
bug Something isn't working

Comments

@Divya2895
Copy link

Can you provide the dataset for multi class classification

@harrytrinh2
Copy link

harrytrinh2 commented Mar 19, 2019

I agree! Seems like the dataset i downloaded online does not match with the dataset here. That is why i got the error column data mismatch.

@harrytrinh2
Copy link

Please give us this data kddresults/dnn1layer/training_set_dnnanalysis.csv

@rahulvigneswaran
Copy link
Owner

Hi, @Divya2895 and @TrinhDinhPhuc,
Sorry for the delayed reply. Hope this helps.

For DNN-1000 iterations- [Link]
For DNN-100 iterations - [Link]
For Classical Machine Learning - [Test Data] [Train Data]

Let me know if it works.

@Divya2895
Copy link
Author

Divya2895 commented Mar 22, 2019 via email

@Light-City
Copy link

How the original data set is transformed into the data set you provided?
for example: raw data include tcp、icmp,but in your data is 1、0 etc...

preprocess is important ,so please help me!

@rahulvigneswaran rahulvigneswaran added the bug Something isn't working label Mar 5, 2020
@ghost
Copy link

ghost commented Apr 8, 2020

I leave out a similar pre-processing method here that can have the same performance of the preprocessed data.

The Training Dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz

The Testing Dataset:
http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz

The most important part is : One-Hot Encoding for categorical columns ("protocol_type", "service", "flag") and binary classification for normal (class="0") and others (ckass="1")

import pandas as pd
# kddcup-10.data from http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
# kddcup.test from http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz

trainset = pd.read_csv('kddcup-10.data', header=0)
testset = pd.read_csv('kddcup.test', header=0)

# Assign Binary Classification Value
trainset["class"] = 1
trainset.loc[trainset["label"] == "normal.", "class"] = 0
testset["class"] = 1
testset.loc[testset["label"] == "normal.", "class"] = 0

# Drop the string label as replaced by the binary label
train = trainset.drop("label", 1)
test = testset.drop("label", 1)

train = pd.get_dummies(train)
test = pd.get_dummies(test)

# One-Hot Encoding
differences = set(train.columns) ^ set(test.columns)
print("One Hot Field Differences:")
print(differences)
for different in differences:
    if different not in test.columns:
        test[different] = 0
    if different not in train.columns:
        train[different] = 0

X = train.drop("class", 1)
Y = train["class"]
T = test.drop("class", 1)
C = test["class"]

## Follow The Code from the repository

@DeCheri
Copy link

DeCheri commented Sep 25, 2024

I couldn't write my paper.damn anxious

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants