Skip to content

Latest commit

 

History

History
executable file
·
574 lines (471 loc) · 18.3 KB

README.md

File metadata and controls

executable file
·
574 lines (471 loc) · 18.3 KB

dldoc

Deep Learning based Cardiology AI to predict Atrial Fibrillation in ECG data

Authors

Florian Schroevers
Abel Oakley

Files

global_params.py

Description:
	Loads the parameters set in the 'config.json' file.

core.py

Description:
	The main script that runs all required code to load the data, pre-process the data,
    create the model, trains the network and show the metrics.

Functions:

  • main

      Description:
      	Runs the whole process as specified above
      Args:
      Returns:
    

main.py

Description:
    Loads the model and gives prediction for an ecg selected by the user

Functions:

  • main

      Description:
          Runs the whole process as specified above
      Args:
      Returns:
    

data_generator.py

Description:
	File for handling the loading of ecg data into a usable format

Functions:

  • get_ids:

      Description:
      	returns a list of all patient id's in the dataset
      Args:
      Returns:
        ids : list
            a list of all patient id's in the dataset
    
  • get_ecg_by_id:

      Description:
      	returns the contents of an ecg file and its target (rythm), given a data id.
          if the time is not given, a random ecg will be chosen.
      Args:
          patient_id : str
              the id of the patient
          t : str or Nonetype [optional, default: None]
              the time at which the ecg was taken
      Returns:
          tuple (5000x8 np.array, int): a numpy array with the data and the target
                                    (0 if sinus rythm, 1 otherwise)
    
  • get_ecg_fnames:

      Description:
      	returns a list of filenames which are ecg's for a given patient
      Args:
          patient_id : str
              the id of the patient
      Returns:
          fnames : list
              a list containing all filenames with ecg's of this patient
    
  • get_times:

      Description:
      	returns a list of times at which ecg's for a given patient are taken
      Args:
          patient_id : str
              the id of the patient
      Returns:
          times : list
              a list containing times at which ecg's of this patient are taken
    
  • get_gender:

      Description:
      	returns a the gender of the given patient
      Args:
          patient_id : str
              the id of the patient
      Returns:
          gender : str
              'M' if the patient is male, 'F' if female
    
  • get_time_rythm_mapping:

      Description:
      	returns a dict mapping the times to the rythm at that time for a given patient
      Args:
          patient_id : str
              the id of the patient
      Returns:
          time_rythm_dict : dict
              keys are times ecg's were taken, values are rythms at those times
    
  • get_time_fname_mapping:

      Description:
      	returns a dict mapping the times to the filename of the ecg taken at
          that time for a given patient
      Args:
          data_id : str
              the id of the patient
      Returns:
          times_fnames_dict : dict
              keys are times ecg's were taken, values are filenames of
              ecg's taken at those times
    
  • get_feat_data:

      Description:
      Args:
      	df: a pandas dataframe with atleast 7 columns, 6 of which are not params
      Returns:
          features: part of the dataframe that makes up the processed features
                  Shape is 6 + number of params by number of ECG's
          targets: targets that belong to those features
                  Shape is 1 by number of ECG's
    
  • get_data:

      Description:
      	returns data in the directory specified in the helpers.py file
      Args:
          n_files : (Nonetype or int) [optional, default: None]
              the number of samples to return, return all available data if set to
              None
          extension : str [optional, default: '.csv']
              the extension (filtype) of the data. can be anything, as long as
              it's readable by np.loadtxt
          split : (bool or str) [optional, default: False]
              to split data 50/50 into healthy/non-healthy or not (only works if
              target is set to None)
              if set to 'max', the function will determine what the max amount of
              files is while keeping the ration 50/50 (will override n_files)
          channels : (Nonetype or np.array) [optional, default: None]
              indices of channels to return or None for all channels
          norm : (bool) [optional, default: False]
              normalize the channels
          exclude_targets : (list) [optional, default: []]
              a list of conditions not to return (0: healthy, 1: afib, 2: afl,
              3: svt, 4: unknown)
          return_fnames : bool [optional, default: False]
              wheter to return a the filenames of the data
          randomize_order : bool [optional, default: True]
              whether to randomize the order of the data
          n_points : int [optional, default: 5000]
              the number of data points to exctract
      Returns:
          data_x : np.ndarray
              the ecg data itself as a 3D array with shape
              (n_ecgs, ecg_len, n_channels)
          data_y : np.ndarray
              an array of target variables
          files : list [optional]
              a list of all files
    

data_preprocessing.py

Description:
	Handles the preprocessing of data (smoothing etc)

Functions:

  • savitzky_golay:

      Description:
      	Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
          The Savitzky-Golay filter removes high frequency noise from data.
          It has the advantage of preserving the original shape and
          features of the signal better than other types of filtering
          approaches, such as moving averages techniques.
          Parameters
      Args:
      	y : array_like, shape (N,)
     			the values of the time history of the signal.
          window_size : int
              the length of the window. Must be an odd integer number.
          order : int
              the order of the polynomial used in the filtering.
              Must be less then `window_size` - 1.
          deriv: int
              the order of derivative to compute (default = 0 means only smoothing)
      Returns:
      	ys : ndarray, shape (N)
      		the smoothed signal (or it's n-th derivative).
    
  • cn:

      Description:
      	perform fourier series analysis on a signal and return the nth coefficient
          for a function that approximate the wave
    
          assumes the data is modelled by an even function (f(x) = f(-x)).
          the function on this page is followed (the second one under cosine series)
          https://en.wikipedia.org/wiki/Fourier_sine_and_cosine_series
      Args:
          signal : np.ndarray
              a 1d array containing the signal to perform the analysis on
          n : int
              the nth coefficient to return.
      Returns:
          c : int
              the nth coefficient that approximate the given data
    
  • get_fourier_coefficients:

      Description:
      	uses the function cn (see above) to get an array of coefficients that
          approximate a given signal
          the function that reconstructs this wave is: reconstruct_wave
      Args:
          signal : np.ndarray
              a 1d array containing the signal to perform the analysis on
          resolution : int [optional, default: 600]
              the amount of coefficients to return. the higher the resolution the
              higher the fidelity of the reconstructed wave
      Returns:
          coefficients : np.ndarray
              a 1d array containing the coefficients that model the given data
    
  • reconstruct_wave:

      Description:
      	reconstructs a wave using a given set of coefficients using the function as
          seen on this page (first equation under cosine series)
          https://en.wikipedia.org/wiki/Fourier_sine_and_cosine_series
      Args:
          coefficients : np.ndarray
              a 1d array of coefficients to reconstruct wave from, such as those
              given by the function get_fourier_coefficients
          signal_length : int
              the target length of the reconstructed array
      Returns:
          reconstruction : np.ndarray
              a 1d array of length signal_length that approximates the data of
              which the coefficients were extracted
    
  • fourier_straighten:

      Description:
      	straighten an ecg using a low resolution to obtain ecg baseline, and then
      	straightening the ecg by setting the baseline to zero
      Args:
          signal : np.ndarray
              the ecg signal to straighten
          resolution : int [optional, default: 20]
              the resolution of the baseline. if set to high important details
              from the ecg will be removed, recommended to keep under 30
      Returns:
          corrected_ecg : np.ndarray
              the straightened array
    
  • preprocess_data:

      Description:
      	preprocess the data by smoothing and straightening.
      Args:
          data_x : np.ndarray
              the data to preprocess.
      Returns:
          p_data_x : np.ndarray
              preprocessed data
    
  • save_data:

      Description:
      	saves the preprocessed data to the location specified in global_params.py
          saves the files as the original filename + 'preprocessed' if the oriinal
          filenames are given, otherwise as a generic name with target variable in it
      Args:
          data_x : np.ndarray
              the ecg data to save
          data_y : np.ndarray
              the targetsof the ecg's
          fnames : list [optional, default: []
              the filenames of the original files
      Returns:
    
  • pulse_scale:

      Description:
      	scales an array to a given length, using 1d linear interpolation
      Args:
          pulse : np.ndarray
              the array to scale
          target_size : int
              the size to scale to
      Returns:
          scaled_pulse : np.ndarray
              the scaled pulse
    
  • extract_windows:

      Description:
      	extract all pulses from an ecg and scale them to a given size
      Args:
          data_x : np.ndarray
              an array of ECG's
          data_y : np.ndarray
              an array of targets of the ECG's
          pulse_size : int [optional, default: 80]
              the size to scale the pulses to
      Returns:
          pulse_data_x : np.ndarray
              an array of pulses
          pulse_data_y : np.ndarray
              an array of targets of the corresponding pulses
    

feature_extraction.py

Description:
	Handles the extraction of features from the data

Functions:

  • get_rpeaks:

      Description:
      	returns an array of indices of the r peaks in a given ecg
      Args:
          ecg : np.ndarray
              an ecg
      Returns:
          rpeaks : np.ndarray
              an array of indices of the r peaks
    
  • get_peak_offset:

      Description:
      	calculate the mean of deviations from the horizontal location
      	of r peaks of an ecg to their expected position
      Args:
          ecg : np.ndarray
              the ecg
          rpeaks : np.ndarray
              an array containing the indices of the r peaks
      Returns:
          peakdev : int
              The mean of deviations from the r peaks to their expected location
              if spread uniformly over the ecg
    
  • get_heartrate:

      Description:
      	calculate the heartrate of an ecg
      Args:
          rpeaks : np.ndarray
              an array containing the indices of the r peaks
      Returns:
          hr : int
              the heartrate of an ecg
    
  • get_ppeaks:

      Description:
      	Detects P-peaks in an ECG by looking for max values inside windows between
          the R-peaks, but takes the T-peak into regard as well as missing P-peaks.
          returns the amount of P-peaks and a number between 0 and 1 representing the
          relation between the mean of the P-peaks and the mean of the R-peaks.
      Args:
          data_x : np.ndarray
              3D array with ECG data (should be smoothed)
    
      Returns:
          mean_peak_height : 1D array
              array of the means of the p-peaks in ECGs
          n_ppeaks : 1D array
              array of the number of p-peaks in ECGs
    
  • show_correlation_heatmap:

      Description:
      	shows a correlation heatmap of a given dataframe
      Args:
          df : pd.DataFrame
              the dataframe to calculate the correlation matrix of
    
  • load_extracted_features:

      Description:
      	load the extracted features from the location specified in global_params.py
      Args:
          fname : str
              the name of the file to load
      Returns:
          df : pd.DataFrame
              a dataframe with the features
    
  • extract_features:

      Description:
      	creates a dataframe from a given collection of ecgs containing the fourier
      	series coefficients, heartrate and peak offset
      Args:
          data_x : np.ndarray
              a set of ecgs. must only contain one channel per ecg. can be 1d, 2d
              or 3d
          data_y : np.ndarray
              the targets of the ecg's
          smooth : bool [optional, default: False]
              whether to smooth the data
          fourier_resolution : int [optional, default: 600]
              the amount of coefficients to return. the higher the resolution the
              higher the fidelity of the reconstructed wave
          save_to_file : str [optional, default: '']
              saves to this file if the length of this string > 0, with this as
              filename
      Returns:
          df : pd.DataFrame
              a dataframe with the extracted features
    

neural_network.py

Description:
	Implements the neural network model as well as the fitting and evaluation of the model

Functions:

  • prepare_train_val_data:

      Description:
      	splits the data in a training, validation and test set, while maintaining a
          50/50 split of targets in all sets, so that the network won't learn to
          always predict one target
      Args:
          data_x : np.ndarray
              an array of input data
          data_y : np.ndarray
              an array of targets of the data
          feature_data : pandas.DataFrame
              a dataframe with any additional extracted features
          tvt_split : list
              a list with three floats that represent the fraction of the size of
              the training, validation and test (tvt) sets respectively
          equal_split_test : bool
              whether to split the test set 50/50
      Returns:
          train_x : dict
              a dict containing the data of this set with input name as key and
              data as value
          train_y : np.ndarray
              an array with targets for this set
          validation_x : dict
              a dict containing the data of this set with input name as key and
              data as value
          validation_y : np.ndarray
              an array with targets for this set
          test_x : dict
              a dict containing the data of this set with input name as key and
              data as value
          test_y : np.ndarray
              an array with targets for this set
    
  • ffnet:

      Description:
      	returns model
      Args:
          ecg_shape : tuple
              the shape of the input
          summarize : bool [optional, default:False]
              whether to show a summary of the model
      Returns:
          model : keras.models.Model
              the model
    
  • precision:

      Description:
      	Precision metric, only computes a batch-wise average of precision.
    
      	Computes the precision, a metric for multi-label classification of
      	how many selected items are relevant.
    
  • recall:

      Description:
      	Recall metric, only computes a batch-wise average of recall.
    
          Computes the recall, a metric for multi-label classification of
          how many relevant items are selected.
    
  • train:

      Description:
      	fit data on a model and return the trained model
      Args:
          model : keras.models.Model
              the model to evaluate
          x_train : dict
              a dictionary mapping input names to actual data
          y_train : np.ndarray
              the targets of the train data
          x_val : dict
              a dictionary mapping input names to actual data
          y_val : np.ndarray
              the targets of the validation data
          batch_size : int [optional, default: 32]
              the size of the batches to be fed into the network
          epochs : int [optional, default: 32]
              the number of epochs (times to run the network)
          save : bool [optional, default: False]
              whether to train the saved network
      Returns:
          r : list
              list of the loss and metrics specified by the model after running
              the model on the test data
    
  • eval:

      Description:
      	evaluate the model on a test set (consisting of pulses)
      Args:
          model : keras.models.Model
              the model to evaluate
          x_test : dict
              a dictionary mapping input names to actual data
          y_test : np.ndarray
              the targets of the test data
          batch_size : int [optional, default: 32]
              the size of the batches to be fed into the network
      Returns:
          r : list
              list of the loss and metrics specified by the model after running
              the model on the test data
    

helpers.py

Description:
	Helper functions to be used by multiple files in the project

Functions:

  • progress_bar:

      Description:
      	prints the current state of the progress