DeepFlarePred

Predicting the likelihood that a flare occur

By using the Liu dataset, that contains SHARP parameters (not images), we train a DNN to predict the likelihood of a flare erupting from a sunspot.

Installation

Conda install

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh

Git clone

git clone https://github.com/Dewald928/DeepFlarePred.git

Create virtual environment

Install conda and install environment:

conda env create -f environment.yml

Activate environment:

conda activate DeepFlarePred

If you have errors with the predefined validation set do get dev branch 0.9:

pip install git+https://github.com/skorch-dev/skorch.git

Download and prepare data

The Liu data should be downloaded from Liu dataset. For the main_LSTM_Liu.py and main_TCN_Liu.py the Liu dataset needs to be downloaded and extracted to the "/Data/Liu" folder such that:

./Data/
├── Liu
│   ├── C
│   ├── M
│   └── M5

To fix the incorrect column names used, run the following (Not needed if repo is cloned, dataset already the fixed version): python liu_data_fix

To create the normalized dataset run:

for z_train: python normalize_z_train.py
for z_minmax_train: python normalize_z_minmax_train.py
fot z_minmax_all: cp Data/Liu/M5/ Data/Liu/z_minmax_all -rf

To create the power transformed dataset run: python normality_check.py

Run script

To run the script, you can either do a Weight & Biases sweep, or just simply run:

python main_TCN_Liu.py

The default configurations can be changed in config-defaults.yaml.

Docker cloud gpu setup (not working atm)

Build container

Follow instructions here

Install docker
Install GPU drivers
Install nvidia docker Quickstart
Then build dockerfile and push image to hub (useful Link1 Link2)

Dockerhub container

dvd928/deep_flare_pred:latest

Example of runstring on paperspace

paperspace jobs create --container dvd928/deep_flare_pred:latest --machineType P4000 --command wandb agent 5ks0xbql --ports 5000:5000 --project Liu_pytorch

Test/Analysis Scripts

Script	Description
`cme_svm_updated_for_pyastro.ipynb`	Example notebook of Bobra's CME SVM
`data_aquisition_pipeline.ipynb`	Notebook for generating Liu et al. data (WIP)
`feature_selection.py`	For Univariate Feature selection and RFE
`inspectdata.py`	Basic data analysis & Pair plot generation
`nested_crossval.py`	Example script for nested crossval
`plot_classifier_comparison.py`	Sklearn script
`plot_cv_indices.py`	Sklearn script
`regression.py`	Synthetic LSTM regression testing
`roc_test.py`	ROC vs. PR for imbalanced dataset
`skorchCV.py`	Used for generating Toy Unbalanced Classification
`test_tcn.py`	For analysis of TCN and 1D convolution using sequences
`Titanic_Basic_Interpret.py`	Captum Example
`moving_std_protocol.py`	Protocol for downloading wandb runs and model selection, based on smooth training
`WNBtestscript.py`	wandb setup script
`workers_test.py`	Pytorch optimal workers test

Plans for the Project

Preliminary tests

Copy Liu's code to pytorch somewhat.
Copy Liu architecture completely
Cross-validation: Skorch library
Regularization: L2 + Dropout
Shuffled vs. Unshuffled. Shuffling is not very advers.
GPU integration
GPU optimization, just use larger batch sizes
Implement Weight and Biases
W&B sweeps check if it work
W&B multiple gpu sweep. How to by using tmux
Pytorch bottleneck test: Inconclusive, revisit
Attention models Tested, but not sweeped
Understand LSTM + TCN better
GRU, LSTM and RNN switchable between
hdf5 test script: Chen data uses hdf5, but unable to read the data
MLP skorch test: RNN and custom logs not well supported
TCN networks : better so far, slightly
Early stopping and checkpointing on best validation TSS (LSTM only, so far)
Test data? best wat to test network?
LR scheduler

Main Objectives

~~Create MLP that is equivalent to Nishzuka et al paper~~
Establish a baseline MLP
Understand TCN operation
Synthetic dataset
Change sequence length with TCN
LSTM vs. TCN?
TCN baseline (Liu dataset (20/40 features?))
ROC + Precision Recall curves, with AUC (train, val & test set)
Find best features out of the 40. (captum)
Occlusions method compare.
What do these best features mean? (fits with other literature?)
SHARP only TCN
Case studies
What does TSS mean in this context?
How to interpret W&B parameters?

Future plans

SHARP query and infer model pipeline (un-normalize data)
Incorporate SHARP magnetogram images like Chen article
Use GAN for detecting anomalies
MLP/LSTM attention models
See a regression problem? LSTM regression is possible.

Questions

Data sets

Chen et al. 2019 data (some of it)
Liu dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepFlarePred

Predicting the likelihood that a flare occur

Installation

Conda install

Git clone

Create virtual environment

Download and prepare data

Run script

Docker cloud gpu setup (not working atm)

Build container

Dockerhub container

Example of runstring on paperspace

Test/Analysis Scripts

Plans for the Project

Preliminary tests

Main Objectives

Future plans

Questions

Data sets

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepFlarePred

Predicting the likelihood that a flare occur

Installation

Conda install

Git clone

Create virtual environment

Download and prepare data

Run script

Docker cloud gpu setup (not working atm)

Build container

Dockerhub container

Example of runstring on paperspace

Test/Analysis Scripts

Plans for the Project

Preliminary tests

Main Objectives

Future plans

Questions

Data sets