-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the train_test_spliter #22
Comments
here is the function import mne
import matplotlib.pyplot as plt
from sklearn import preprocessing
import numpy as np
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split
"""
Physionet MI-EEG Dataset
64 channels EEG,160hz freq, 4 seconds MI-task
14 runs for each of the 109 subjects
runs [1, 2] is baseline
others with marker
T0 : rest,
T1/T2: left/right fist in runs [3, 4, 7, 8, 11, 12]
both fists/feet in runs [5, 6, 9, 10, 13, 14]
"""
data_path = r'D:\00-data\PhysioNet\ori\S001\\'
LR_fist_run = [3, 4, 7, 8, 11, 12]
fist_feet_run = [5, 6, 9, 10, 13, 14]
rename_mapping = {'Fc5.': 'FC5', 'Fc3.': 'FC3', 'Fc1.': 'FC1', 'Fcz.': 'FCZ', 'Fc2.': 'FC2', 'Fc4.': 'FC4',
'Fc6.': 'FC6', 'C5..': 'C5', 'C3..': 'C3', 'C1..': 'C1', 'Cz..': 'CZ', 'C2..': 'C2', 'C4..': 'C4',
'C6..': 'C6', 'Cp5.': 'CP5', 'Cp3.': 'CP3', 'Cp1.': 'CP1', 'Cpz.': 'CPZ', 'Cp2.': 'CP2',
'Cp4.': 'CP4', 'Cp6.': 'CP6', 'Fp1.': 'FP1', 'Fpz.': 'FPZ', 'Fp2.': 'FP2', 'Af7.': 'AF7',
'Af3.': 'AF3', 'Afz.': 'AFZ', 'Af4.': 'AF4', 'Af8.': 'AF8', 'F7..': 'F7', 'F5..': 'F5', 'F3..': 'F3',
'F1..': 'F1', 'Fz..': 'FZ', 'F2..': 'F2', 'F4..': 'F4', 'F6..': 'F6', 'F8..': 'F8', 'Ft7.': 'FT7',
'Ft8.': 'FT8', 'T7..': 'T7', 'T8..': 'T8', 'T9..': 'T9', 'T10.': 'T10', 'Tp7.': 'TP7', 'Tp8.': 'TP8',
'P7..': 'P7', 'P5..': 'P5', 'P3..': 'P3', 'P1..': 'P1', 'Pz..': 'PZ', 'P2..': 'P2', 'P4..': 'P4',
'P6..': 'P6', 'P8..': 'P8', 'Po7.': 'PO7', 'Po3.': 'PO3', 'Poz.': 'POZ', 'Po4.': 'PO4', 'Po8.': 'PO8',
'O1..': 'O1', 'Oz..': 'OZ', 'O2..': 'O2', 'Iz..': 'IZ'}
def get_physionet(subject: int):
"""
:param subject: SN of subject : [1,109]
:return: data shapes (-1, channels, 640)
"""
# loading from file
for r in LR_fist_run:
raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
if r == LR_fist_run[0]:
raw_LR_fist = raw_new
else:
raw_LR_fist.append(raw_new)
for r in fist_feet_run:
raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
if r == fist_feet_run[0]:
raw_fist_feet = raw_new
else:
raw_fist_feet.append(raw_new)
raw_LR_fist.rename_channels(rename_mapping)
raw_fist_feet.rename_channels(rename_mapping)
ch_pick = ["FC1", "FC2", "FC3", "FC4", "C3", "C4", "C1", "C2",
"CP1", "CP2", "CP3", "CP4"]
# get the data and labels
event_id_LR_fist = dict(T1=0, T2=1)
events, _ = mne.events_from_annotations(raw_LR_fist, event_id_LR_fist, verbose='ERROR')
epochs_LR_fist = mne.Epochs(raw_LR_fist, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
verbose='ERROR')
event_id_fist_feet = dict(T1=2, T2=3)
events, _ = mne.events_from_annotations(raw_fist_feet, event_id_fist_feet, verbose='ERROR')
epochs_fist_feet = mne.Epochs(raw_fist_feet, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
verbose='ERROR')
data = np.concatenate((epochs_LR_fist.get_data(picks=ch_pick), epochs_fist_feet.get_data(picks=ch_pick)))
scaler = preprocessing.StandardScaler()
for i in range(len(data)):
scaler.fit(data[i])
data[i] = scaler.transform(data[i])
labels = np.concatenate((epochs_LR_fist.events[:, 2], epochs_fist_feet.events[:, 2]))
labels = to_categorical(labels) # one-hot
# reshape and return
train_data_ori, test_data_ori, train_label_ori, test_label_ori = train_test_split(data, labels, test_size=0.2,
random_state=42)
train_data = np.empty((0, 2, train_data_ori.shape[2]))
train_label = np.empty((0, 4))
test_data = np.empty((0, 2, test_data_ori.shape[2]))
test_label = np.empty((0, 4))
for i in range(0, len(ch_pick), 2):
train_data = np.concatenate((train_data, train_data_ori[:, i:i + 2, :]))
test_data = np.concatenate((test_data, test_data_ori[:, i:i + 2, :]))
train_label = np.concatenate((train_label, train_label_ori))
test_label = np.concatenate((test_label, test_label_ori))
print('data loaded.')
return train_data, test_data, train_label, test_label
if __name__ == '__main__':
res = get_physionet(1)
for r in res:
print(r.shape) |
In order to test your assertion: "in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at the same time, that may be cause the acc rise not for the Model cause." I used the following script, which basically tests 2 things:
import sys
sys.path.append("/workspace")
import numpy as np
import tensorflow as tf
from data_processing.general_processor import Utils
from sklearn.model_selection import train_test_split
tf.autograph.set_verbosity(0)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print(physical_devices)
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
#Params
source_path = "/dataset/paper/"
# Load data
channels = Utils.combinations["a"] #["FC1", "FC2"], ["FC3", "FC4"], ["FC5", "FC6"]]
exclude = [38, 88, 89, 92, 100, 104]
subjects = [n for n in np.arange(1,110) if n not in exclude]
#Load data
x, y = Utils.load(channels, subjects, base_path=source_path)
#Transform y to one-hot-encoding
y_one_hot = Utils.to_one_hot(y, by_sub=False)
#Reshape for scaling
reshaped_x = x.reshape(x.shape[0], x.shape[1] * x.shape[2])
#Grab a test set before SMOTE
def check_duplicate(element_list):
for elem in element_list:
if element_list.count(elem) > 1:
return True
return False
x_train_raw, x_valid_test_raw, y_train_raw, y_valid_test_raw = train_test_split(reshaped_x,
y_one_hot,
stratify=y_one_hot,
test_size=0.20,
random_state=42)
reshaped = reshaped_x.tolist()
x_train = x_train_raw.tolist()
x_valid = x_valid_test_raw.tolist()
print(check_duplicate(reshaped))
print(check_duplicate(x_train))
print(check_duplicate(x_valid))
for sample in reshaped_x.tolist():
if (sample in x_train) and (sample in x_valid):
print("Problems") This simple script shows that train and test/valid instances are not present in both sets simultaneously. |
Forgive me for responding so late. I took some time to check your hypothesis and unfortunately it is correct! Thank you for finding this serious bug in the code. I am actively working to see if the same accuracy can be achieved by removing this bug. At the moment I see it as a little difficult to achieve the same accuracy as the network is heavily overfitting. If it is not possible to solve this bug I will personally contact the journal. I will post in the next few days the resolution of this bug. In case you are working and want to share some thoughts please don't hesitate. Thanks again for the support. |
I'm more than sorry to hear that. |
Thank you so much for pointing out this error. 🥇 |
I tried split train/test/valid set before concatenating channels, and my acc in BCI IV2a dataset turned out to be 24.6%, which means the network didn't work at all......I read your paper and I read "A_Simplified_CNN_Classification_Method_for_MI-EEG" from you references. The author's accuracy is about 97%, maybe she made the same mistake? |
I tried to replicate the code from "A Simplified CNN Classification Method for MI-EEG". Unfortunately, I was unable to achieve the accuracy they claim. Be careful the problem is not reshaping the data. The problem is in the generation. Give me a few days and I'll insert the fix below. |
Here are some papers that may have the same bug: |
@Kubasinska
Could you please explain the bug in the generation process? |
Hi everyone, sorry I am responding late. I integrated the fix into the main branch; find the new generator and loader in the fix folder. I also updated the readme and alerted the Journal. Take a single trial, the subject thinks about moving the right fist for 4 seconds, and 64 channels record the brain activity. Of these 64 channels, we take only 4 for this example: C3, C4, CP3 and CP4. The idea described in the paper is to divide this single instance into two, one consisting of C3 and C4 and one consisting of CP3 and CP4. So, now we have two arrays: one that has a size of (2, 640) composed of C3 and C4 and one that has the same size (2, 640) but is composed of the channels CP3 and CP4. The label corresponding to these two examples is the same: imagined movement of the right fist. Ideally, these two examples should both go in the same dataset; either both in the training dataset or both in the test dataset. What happens instead is that they go one on the train and one on the test. The image below clarifies this example. |
Hi, I just started a PhD thesis about EEG analysis and BCI so I am currently looking for papers to reproduce / compare my results with The next week I will try to integrate this kind of processing, as well as the STFT proposed in this paper, with 1D-CNN and see whether it seems consistant to achieve that high accuracy on both the physionet and BCI-IV 2a datasets |
Thanks for your sharing infomation. I am reproducing the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data, which claimed the accuracy up to 90% im MI. Have you ever taka a look at this paper? Thanks again! |
Thanks all for the useful discussion. This is exactly why I'm insisting on publishing the codes that correspond to any articles I publish. Unfortunately, making mistakes in programming will happen sometimes and there's no way around it, no matter how hard you work to check your code before submission. |
Hi there, inspiring method and great paper
have been trying to apply your work on other dataset for couple of days, but cant achieve good results..
recheck the code, i think maybe it's a data-split problem.
for example, in this file, MI-EEG-1D-CNN/models/train_a.py, line 45
x is the loaded data, already shapes (events_num, 2, 640).
as we know, in one specific MI-task, different channel-couple in one ROI have similar behaviors,
in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at same time, that maybe cause the acc rise not for the Model cause.
the data loading code of your work is a little bit hard for me to read, so i am trying to write my data loading function( humble one without base type event or SMOTE), which split data to train and test set first then reshaped it from (events_num, channels_num, 640) to (events_num, 2, 640) . then using HopefullNet to fit them, didn't end well.
i will paste my function below, after figure out how..
hope could get your respond, instruction about how to transfer HopefullNet to other dataset will be more than great.
best wishes
The text was updated successfully, but these errors were encountered: