Merge branch '1.4.0.dev1'

ecrl · Jun 9, 2018 · 39617b8 · 39617b8
2 parents 9ba1b0c + 76603d1
commit 39617b8
Show file tree

Hide file tree

Showing 21 changed files with 1,527 additions and 1,758 deletions.
diff --git a/README.md b/README.md
diff --git a/ecnet/README.md b/ecnet/README.md
@@ -1,50 +1,37 @@
 # Low-level usage of model, data_utils, error_utils, limit_parameters, and abc
 
 ## model.py
-#### Class: multilayer_perceptron
+#### Class: MultilayerPerceptron
 Attributes:
 - **layers**: list of layers; layers contain information about the number of neurons and the activation function in form [num, func]
 - **weights**: list of TensorFlow weight variables
 - **biases**: list of TensorFlow bias variables
 
 Methods:
-- **addLayer(num_neurons, activ_function)**: adds a layer to layers in form [num_neurons, activ_function]
+- **add_layer(size, act_fn)**: appends a *Layer* to the MLP's layer list
   - supported activation functions: 'relu', 'sigmoid', 'linear'
-- **connectLayers()**: initializes TensorFlow variables for weights and biases between each layer; fully connected
-- **feed_forward(x)**: used by TensorFlow graph to feed data through weights and add biases
-- **fit(x_l, y_l, learning_rate, train_epochs)**: fits the model to the inputs (**x_l**) and outputs (**y_l**) for **train_epochs** iterations with a learning rate of **learning_rate**
-- **fit_validation(x_l, x_v, y_l, y_v, learning_rate, mdrmse_stop, mdrmse_memory, max_epochs)**: fits the model while using a validation set in order to test the learning performance over time
-  - *mdrmse*: mean-delta-root-mean-squared error, or the change in the difference between RMSE values over time
-  - **mdrmse_stop** is the cutoff point, where the function ceases learning (mdrmse approaches zero as epochs increases)
-  - **mdrmse_memory** is used to determine how far back (number of epochs) the function looks in determining mdrmse
+- **connect_layers()**: initializes TensorFlow variables for weights and biases between each layer; fully connected
+- **fit(x_l, y_l, learning_rate, train_epochs)**: fits the MLP to the inputs (**x_l**) and outputs (**y_l**) for **train_epochs** iterations with a learning rate of **learning_rate**
+- **fit_validation(x_l, x_v, y_l, y_v, learning_rate, max_epochs)**: fits the MLP, periodically checking MLP performance using validation data; learning is stopped when validation data performance stops improving
   - **max_epochs** is the cutoff point if mdrmse has not fallen below mdrmse_stop
-- **test_new(x)**: used to pass data through the model to get a prediction, without training it; returns predicted values
-- **save_net(output_filepath)**: saves the TensorFlow session (.sess) and model architecture information (.struct) to specified filename
-- **load_net(model_load_filename)**: opens a TensorFlow session (.sess) and model architecture information (.struct) to work with
-- **export_weights()**: returns numerical versions of the model's TensorFlow weight variables
-- **export_biases()**: returns numerical versions of the model's TensorFlow bias variables
-
-Misc. Functions:
-- **calc_valid_rmse(x, y)**: calculates the root-mean-squared error during 'fit_validation()'
+- **use(x)**: used to pass data through the trained model to get a prediction; returns predicted values
+- **save(filepath)**: saves the TensorFlow session (.sess) and model architecture information (.struct) to specified filename
+- **load(filepath)**: opens a TensorFlow session (.sess) and model architecture information (.struct) from specified filename
 
 ## data_utils.py
-#### Class: initialize_data(data_filename)
+#### Class: DataFrame
 Methods:
-- **build()**: imports a formatted database, parses controls, sets up groupings and data I/O locations
-- **normalize(param_filepath)**: will normalize the input and output data to [0,1] using min-max normalization; saves a file containing normalization parameters
-- **apply_normal(param_filepath)**: will apply the normalization paramters found in the specified file to un-normalized data
-- **buildTVL(sort_type, data_split)**: builds the test, validation and learn sets using 'random' or 'explicit' sort types
-  - supported sort types: 'random', 'explicit'
-  - data_split format: [0.L, 0.V, 0.T] - sum of 0.L, 0.V and 0.T = 1
-- **randomizeData(randIndex, data_split)**: used by 'buildTVL' to randomly assign test, validation and learn indices, which will be applied to each data input
-- **applyTVL()**: applies the test, validation and learn indices to each data input
-- **package()**: packages the data, so it can be handed to a machine learning model
+- **__init__(filename)**: imports a formatted database, creates DataPoints for each data entry, grabs string and group names and counts
+- **create_sets(random = True, split = [0.7, 0.2, 0.1]**: create learning, validation and testing sets with *split* proportions; if random = False, database set assignments are used
+- **create_sorted_sets(sort_string, split = [0.7, 0.2, 0.1]**: using *sort_string*, a string contained in the given database, assigns proportions *split* of each possible string value to learning, validation and testing sets
+- **shuffle(args, split = [0.7, 0.2, 0.1])**: shuffles data for specified sets
+   - args combinations:
+    - 'l, v, t' (shuffles data for learning, validation and testing sets)
+    - 'l, v' (shuffles data for learning and validation sets)
+- **package_sets()** returns a PackagedData object, containing NumPy arrays for learning, validation and testing input and target sets
 
-Misc. Functions:
-- **create_static_test_set(data)**: taking an initialize_data object, this function will create separate files for the test and learning/validation data; useful for when you need a static test set for completely blind model testing
-- **output_results(results, data, filename)**: outputs your prediction results from your model to a specified filename
-  - arguments are a list of results obtained from model.py, a data object from data_utils.py, and the filename to save to
-- **denormalize_result(results, param_filepath)**: denormalizes a result, using min-max normalization paramters found in the param_filepath; returns denormalized results list
+Functions:
+- **output_results(results, DataFrame, filename)**: outputs *results* (calculated by model.py for a specified data set) to *filename*; *DataFrame* is required for outputting data entry names, strings, groups, etc.
 
 ## error_utils.py
 Notation:
@@ -59,22 +46,5 @@ Error Functions:
 
 ## limit_parameters.py
 Functions:
-- **limit(num_params, server)**: limits the number of input parameters to an integer value specified by num_params, using a "retain the best" process, where the best performing input parameter (based on RMSE) is retained, paired with every other input parameter until a best pair is found, repeated until the limit number has been reached
-  - returns a list of parameters
-- **output(data, param_list, filename)**: saves a new .csv formatted database, using a generated parameter list and an output filename
-
-## abc.py
-#### Class: ABC
-Attributes:
-- **valueRanges**: a list of tuples of value types to value range (value_type, (value_min, value_max))
-- **fitnessFunction**: fitness function to evaluate a set of values; must take one parameter, a list of values
-- **endValue**: target fitness score which will terminate the program when reached
-- **iterationAmount**: amount of iterations before terminating program
-- **amountOfEmployers**: amount of sets of values stored per iteration
-
-Methods:
-- **assignNewPositions(firstBee)**: assign a new position to a given bee 
-- **getFitnessAverage()**: collect the average of all the fitness scores across all employer bees
-- **checkNewPosition(bee)**: Check if the new position is better than the fitness average, if it is, assign it to the bee
-- **checkIfDone()**: Check if the best fitness score is lower than the target score to terminate the program; only valid if the argument endValue was assigned a value
-- **runABC()**: run the artificial bee colony based on the arguments passed to the constructor. Must pass a fitness function and either a target fitness score or target iteration number in order to specify when the program will terminate. Must also specify value types/ranges.
+- **limit_iterative_include(DataFrame, limit_num)**: limits the input dimensionality of data found in *DataFrame* to a dimensionality of *limit_num* using a "retain the best" algorithm
+- **limit_genetic(DataFrame, limit_num, population_size, num_survivors, num_generations, print_feedback)**: limits the input dimensionality of data found in *DataFrame* to a dimensionality of *limit_num* using a genetic algorithm; *population_size* indicates the number of members for each generation, *num_survivors* indicates how many members of each generation survive, *num_generations* indicates how many generations the genetic algorithm runs for, and *print_feedback* is a boolean for the genetic algorithm to periodically print status updates
diff --git a/ecnet/__init__.py b/ecnet/__init__.py
@@ -3,4 +3,4 @@
 import ecnet.error_utils
 import ecnet.model
 import ecnet.limit_parameters
-import ecnet.abc
+__version__ = '1.4.0'
diff --git a/ecnet/abc.py b/ecnet/abc.py