Skip to content

Commit

Permalink
Merge branch '1.4.0.dev1'
Browse files Browse the repository at this point in the history
  • Loading branch information
tjkessler committed Jun 9, 2018
2 parents 9ba1b0c + 76603d1 commit 39617b8
Show file tree
Hide file tree
Showing 21 changed files with 1,527 additions and 1,758 deletions.
122 changes: 56 additions & 66 deletions README.md

Large diffs are not rendered by default.

72 changes: 21 additions & 51 deletions ecnet/README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,37 @@
# Low-level usage of model, data_utils, error_utils, limit_parameters, and abc

## model.py
#### Class: multilayer_perceptron
#### Class: MultilayerPerceptron
Attributes:
- **layers**: list of layers; layers contain information about the number of neurons and the activation function in form [num, func]
- **weights**: list of TensorFlow weight variables
- **biases**: list of TensorFlow bias variables

Methods:
- **addLayer(num_neurons, activ_function)**: adds a layer to layers in form [num_neurons, activ_function]
- **add_layer(size, act_fn)**: appends a *Layer* to the MLP's layer list
- supported activation functions: 'relu', 'sigmoid', 'linear'
- **connectLayers()**: initializes TensorFlow variables for weights and biases between each layer; fully connected
- **feed_forward(x)**: used by TensorFlow graph to feed data through weights and add biases
- **fit(x_l, y_l, learning_rate, train_epochs)**: fits the model to the inputs (**x_l**) and outputs (**y_l**) for **train_epochs** iterations with a learning rate of **learning_rate**
- **fit_validation(x_l, x_v, y_l, y_v, learning_rate, mdrmse_stop, mdrmse_memory, max_epochs)**: fits the model while using a validation set in order to test the learning performance over time
- *mdrmse*: mean-delta-root-mean-squared error, or the change in the difference between RMSE values over time
- **mdrmse_stop** is the cutoff point, where the function ceases learning (mdrmse approaches zero as epochs increases)
- **mdrmse_memory** is used to determine how far back (number of epochs) the function looks in determining mdrmse
- **connect_layers()**: initializes TensorFlow variables for weights and biases between each layer; fully connected
- **fit(x_l, y_l, learning_rate, train_epochs)**: fits the MLP to the inputs (**x_l**) and outputs (**y_l**) for **train_epochs** iterations with a learning rate of **learning_rate**
- **fit_validation(x_l, x_v, y_l, y_v, learning_rate, max_epochs)**: fits the MLP, periodically checking MLP performance using validation data; learning is stopped when validation data performance stops improving
- **max_epochs** is the cutoff point if mdrmse has not fallen below mdrmse_stop
- **test_new(x)**: used to pass data through the model to get a prediction, without training it; returns predicted values
- **save_net(output_filepath)**: saves the TensorFlow session (.sess) and model architecture information (.struct) to specified filename
- **load_net(model_load_filename)**: opens a TensorFlow session (.sess) and model architecture information (.struct) to work with
- **export_weights()**: returns numerical versions of the model's TensorFlow weight variables
- **export_biases()**: returns numerical versions of the model's TensorFlow bias variables

Misc. Functions:
- **calc_valid_rmse(x, y)**: calculates the root-mean-squared error during 'fit_validation()'
- **use(x)**: used to pass data through the trained model to get a prediction; returns predicted values
- **save(filepath)**: saves the TensorFlow session (.sess) and model architecture information (.struct) to specified filename
- **load(filepath)**: opens a TensorFlow session (.sess) and model architecture information (.struct) from specified filename

## data_utils.py
#### Class: initialize_data(data_filename)
#### Class: DataFrame
Methods:
- **build()**: imports a formatted database, parses controls, sets up groupings and data I/O locations
- **normalize(param_filepath)**: will normalize the input and output data to [0,1] using min-max normalization; saves a file containing normalization parameters
- **apply_normal(param_filepath)**: will apply the normalization paramters found in the specified file to un-normalized data
- **buildTVL(sort_type, data_split)**: builds the test, validation and learn sets using 'random' or 'explicit' sort types
- supported sort types: 'random', 'explicit'
- data_split format: [0.L, 0.V, 0.T] - sum of 0.L, 0.V and 0.T = 1
- **randomizeData(randIndex, data_split)**: used by 'buildTVL' to randomly assign test, validation and learn indices, which will be applied to each data input
- **applyTVL()**: applies the test, validation and learn indices to each data input
- **package()**: packages the data, so it can be handed to a machine learning model
- **__init__(filename)**: imports a formatted database, creates DataPoints for each data entry, grabs string and group names and counts
- **create_sets(random = True, split = [0.7, 0.2, 0.1]**: create learning, validation and testing sets with *split* proportions; if random = False, database set assignments are used
- **create_sorted_sets(sort_string, split = [0.7, 0.2, 0.1]**: using *sort_string*, a string contained in the given database, assigns proportions *split* of each possible string value to learning, validation and testing sets
- **shuffle(args, split = [0.7, 0.2, 0.1])**: shuffles data for specified sets
- args combinations:
- 'l, v, t' (shuffles data for learning, validation and testing sets)
- 'l, v' (shuffles data for learning and validation sets)
- **package_sets()** returns a PackagedData object, containing NumPy arrays for learning, validation and testing input and target sets

Misc. Functions:
- **create_static_test_set(data)**: taking an initialize_data object, this function will create separate files for the test and learning/validation data; useful for when you need a static test set for completely blind model testing
- **output_results(results, data, filename)**: outputs your prediction results from your model to a specified filename
- arguments are a list of results obtained from model.py, a data object from data_utils.py, and the filename to save to
- **denormalize_result(results, param_filepath)**: denormalizes a result, using min-max normalization paramters found in the param_filepath; returns denormalized results list
Functions:
- **output_results(results, DataFrame, filename)**: outputs *results* (calculated by model.py for a specified data set) to *filename*; *DataFrame* is required for outputting data entry names, strings, groups, etc.

## error_utils.py
Notation:
Expand All @@ -59,22 +46,5 @@ Error Functions:

## limit_parameters.py
Functions:
- **limit(num_params, server)**: limits the number of input parameters to an integer value specified by num_params, using a "retain the best" process, where the best performing input parameter (based on RMSE) is retained, paired with every other input parameter until a best pair is found, repeated until the limit number has been reached
- returns a list of parameters
- **output(data, param_list, filename)**: saves a new .csv formatted database, using a generated parameter list and an output filename

## abc.py
#### Class: ABC
Attributes:
- **valueRanges**: a list of tuples of value types to value range (value_type, (value_min, value_max))
- **fitnessFunction**: fitness function to evaluate a set of values; must take one parameter, a list of values
- **endValue**: target fitness score which will terminate the program when reached
- **iterationAmount**: amount of iterations before terminating program
- **amountOfEmployers**: amount of sets of values stored per iteration

Methods:
- **assignNewPositions(firstBee)**: assign a new position to a given bee
- **getFitnessAverage()**: collect the average of all the fitness scores across all employer bees
- **checkNewPosition(bee)**: Check if the new position is better than the fitness average, if it is, assign it to the bee
- **checkIfDone()**: Check if the best fitness score is lower than the target score to terminate the program; only valid if the argument endValue was assigned a value
- **runABC()**: run the artificial bee colony based on the arguments passed to the constructor. Must pass a fitness function and either a target fitness score or target iteration number in order to specify when the program will terminate. Must also specify value types/ranges.
- **limit_iterative_include(DataFrame, limit_num)**: limits the input dimensionality of data found in *DataFrame* to a dimensionality of *limit_num* using a "retain the best" algorithm
- **limit_genetic(DataFrame, limit_num, population_size, num_survivors, num_generations, print_feedback)**: limits the input dimensionality of data found in *DataFrame* to a dimensionality of *limit_num* using a genetic algorithm; *population_size* indicates the number of members for each generation, *num_survivors* indicates how many members of each generation survive, *num_generations* indicates how many generations the genetic algorithm runs for, and *print_feedback* is a boolean for the genetic algorithm to periodically print status updates
2 changes: 1 addition & 1 deletion ecnet/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
import ecnet.error_utils
import ecnet.model
import ecnet.limit_parameters
import ecnet.abc
__version__ = '1.4.0'
178 changes: 0 additions & 178 deletions ecnet/abc.py

This file was deleted.

Loading

0 comments on commit 39617b8

Please sign in to comment.