EEG Data Analysis and Machine Learning System

This repository contains a comprehensive Python script developed for the analysis of EEG data and the application of machine learning techniques to classify outcomes and predict completion times based on EEG features. The script is designed to be flexible and user-friendly, offering both a command-line interface and a graphical interface via Streamlit.

Overview

The script processes EEG data recorded during gameplay sessions to extract meaningful features and apply machine learning algorithms for:

Classification: Predicting the outcome of a game (Win or Lose).
Regression: Predicting the completion time of a game.

The script supports various machine learning models and includes functionalities for data preprocessing, feature extraction, hyperparameter tuning, and results visualization.

Features

Data Extraction: Reads EEG data, timestamps, and game annotations from CSV and JSON files.
Preprocessing: Cleans and segments the data based on specified intervals.
Feature Extraction: Computes statistical features from the EEG signals, including means, variances, correlations, and power spectral densities.
Machine Learning Models: Implements classifiers (SVC, Decision Tree, Random Forest) and regressors (Theil-Sen, Neural Network, Gradient Boosting).
Hyperparameter Tuning: Utilizes GridSearchCV for optimal parameter selection.
Results Visualization: Generates plots for metrics such as accuracy, confusion matrices, and regression errors.
User Interface: Offers both command-line and Streamlit graphical interfaces for parameter selection.

Installation

Clone the repository:

git clone https://github.com/your_username/your_repository.git

Navigate to the directory:
```
cd your_repository
```
Install the required packages:

The script requires Python 3.10 and the following packages:
- numpy
- pandas
- scipy
- scikit-learn
- imbalanced-learn
- matplotlib
- seaborn
- streamlit
Install them using:
```
pip install -r requirements.txt
```
Or install them individually:
```
pip install numpy pandas scipy scikit-learn imbalanced-learn matplotlib seaborn streamlit
```

Usage

Command-Line Interface

By default, the script is set up to run as a Streamlit application. To run it as a regular script from the command line:

Open the script in a text editor.

Set use_streamlit to False in the if __name__ == "__main__": block:

if __name__ == "__main__":
    use_streamlit = False
    # Rest of the code...

Configure Parameters:

Modify the parameters in the if_running_a_regular_script() function to suit your dataset and preferences.
Run the script:
```
python your_script_name.py
```

Streamlit Application

To run the script as a Streamlit application:

Ensure use_streamlit is set to True in the if __name__ == "__main__": block:

if __name__ == "__main__":
    use_streamlit = True
    # Rest of the code...

Run the Streamlit app:
```
streamlit run your_script_name.py
```
Interact with the app:
- Adjust the parameters using the provided widgets.
- Click the "START THE SYSTEM" button to run the analysis.
- View the results and visualizations directly in the browser.

Data Processing Pipeline

The script processes data through several stages, encapsulated in functions for modularity and clarity.

Data Extraction

Function: extract_data()
Purpose: Reads EEG data, intervals, and game annotations from the specified directory.
Process:
- Scans the directory for files matching the specified extensions.
- Reads CSV files containing EEG values and intervals.
- Reads JSON files containing game annotations.
- Stores data in lists for further processing.

Preprocessing

Function: processing()
Purpose: Cleans and segments the EEG data based on game intervals.
Process:
- Extracts relevant columns (amplitude and power) from the EEG data.
- Filters data within the start and end times of each game.
- Applies windowing to segment the data into fixed-length windows.

Feature Extraction

Function: eeg_feature_extraction()
Purpose: Computes statistical features from the EEG signals.
Features:
- Amplitude Statistics: Mean and variance for each EEG channel.
- Correlation Metrics: Mean and variance of correlation matrices between windows.
- Cross-Covariance Metrics: Mean and variance of cross-covariance matrices.
- Power Spectral Densities: Mean power in different frequency bands.
Process:
- Calculates statistics for each windowed segment.
- Aggregates features across all windows for each game session.

Machine Learning Models

Classification:
- Function: classification_prediction_and_evaluation()
- Purpose: Predicts game outcomes using selected classifiers.
- Models: Support Vector Classifier (SVC), Decision Tree Classifier (DTC), Random Forest Classifier (RFC).
- Process:
  - Splits the data into training and testing sets.
  - Optionally applies scaling and oversampling.
  - Performs hyperparameter tuning with GridSearchCV.
  - Evaluates the model using metrics like accuracy and confusion matrices.
Regression:
- Function: regression_prediction_and_evaluation()
- Purpose: Predicts game completion times using selected regressors.
- Models: Theil-Sen Regressor (TSR), Neural Network Regressor (NNR), Gradient Boosting Regressor (GBR).
- Process:
  - Splits the data into training and testing sets.
  - Optionally performs data augmentation.
  - Performs hyperparameter tuning with GridSearchCV.
  - Evaluates the model using metrics like Mean Squared Error (MSE) and R² score.

Parameters Configuration

Parameters can be set either via the Streamlit interface or by modifying the code when running as a script.

System Parameters

Path: Directory containing the data files.
Number of Runs: How many times the system will run for evaluation.
Window Length: Duration of each window in seconds.
Sampling Frequency (fs): The sampling rate of the EEG data.
Files Extensions:
- CSV values files extension.
- CSV intervals files extension.
- JSON annotations files extension.

Processing Parameters

CSV Values:
- Columns to Drop: Columns in the EEG data CSV files to be excluded.
- Timestamps Column Name: Name of the column containing timestamps.
- Amplitude First Column Name: Name of the first amplitude column.
- Power First Column Name: Name of the first power column.
CSV Intervals:
- Timestamps Column Name: Name of the column containing interval timestamps.
Game Annotations:
- List of annotations to include (e.g., Time, Outcome, Difficulty).

Prediction Parameters

Random State:
- Use Fixed Random State: Ensures reproducibility across runs.
- Use Same Random State for Classifier/Regressor: Uses the same random state for model initialization.
Data Handling:
- Use Stratify: Maintains the class distribution during splitting.
- Use Scaling and Oversampling for Classification: Applies standard scaling and SMOTE oversampling.
- Use Augmentation for Regression: Augments training data by adding noise.
Model Selection:
- Classifiers: Choose one from SVC, DTC, RFC.
- Regressors: Choose one from TSR, NNR, GBR.
Hyperparameter Tuning:
- Use Grid Search for Classifier/Regressor: Enables hyperparameter optimization.
- Scoring for Grid Search: Selects the metric used for evaluating models during grid search.

Results Visualization

Function: metrics_visualization()
Purpose: Generates plots to visualize the performance of the models.
Visualizations:
- Classification:
  - Accuracy over runs.
  - Average confusion matrix.
  - Average classification report.
- Regression:
  - MSE over runs.
  - MAE over runs.
  - R² score over runs.
Output:
- Saves the plots as PNG files.
- Displays them directly in the Streamlit app or shows them when running as a script.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Feel free to explore the code and adjust the parameters to suit your data and research needs. If you encounter any issues or have suggestions for improvement, please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.streamlit		.streamlit
Nemanja_Perunicic_bachelor.pdf		Nemanja_Perunicic_bachelor.pdf
Nemanja_Perunicic_master.docx		Nemanja_Perunicic_master.docx
Nemanja_Perunicic_master.pdf		Nemanja_Perunicic_master.pdf
Nemanja_Perunicic_rad_za_zbornik.pdf		Nemanja_Perunicic_rad_za_zbornik.pdf
README.md		README.md
eeg_data.zip		eeg_data.zip
eeg_potato.jpg		eeg_potato.jpg
ftn_logo_fun.gif		ftn_logo_fun.gif
nemanja_perunicic_mas.py		nemanja_perunicic_mas.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEG Data Analysis and Machine Learning System

Table of Contents

Overview

Features

Installation

Usage

Command-Line Interface

Streamlit Application

Data Processing Pipeline

Data Extraction

Preprocessing

Feature Extraction

Machine Learning Models

Parameters Configuration

System Parameters

Processing Parameters

Prediction Parameters

Results Visualization

License

About

Languages

Perun98/eeg-minesweeper-prediction

Folders and files

Latest commit

History

Repository files navigation

EEG Data Analysis and Machine Learning System

Table of Contents

Overview

Features

Installation

Usage

Command-Line Interface

Streamlit Application

Data Processing Pipeline

Data Extraction

Preprocessing

Feature Extraction

Machine Learning Models

Parameters Configuration

System Parameters

Processing Parameters

Prediction Parameters

Results Visualization

License

About

Resources

Stars

Watchers

Forks

Languages