Merge pull request #12 from tjkessler/Hernan

abc-ecnet integration
ecrl · May 5, 2018 · 84ff4f8 · 84ff4f8
2 parents 2243f90 + 6384c12
commit 84ff4f8
Show file tree

Hide file tree

Showing 12 changed files with 278 additions and 13 deletions.
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -1,7 +1,7 @@
-Copyright 2017 Travis Kessler
+Copyright 2017 Travis Kessler, Hernan Gelaf-Romer, Sanskriti Sharma
 
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 
 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/README.md b/README.md
@@ -129,6 +129,12 @@ Here is an overview of the Server object's methods:
 		- **'valid'** (obtains results for validation set)
 		- **'train'** (obtains results for learning & validation sets)
 		- **'test'** (obtains results for test set)
+- **tune_hyperparameters(*target_score = None, iteration_amount = 50, amount_of_employers = 50*)**: optimize the hyperparameters
+	- argumnets:
+		- **None** (defaults to 50 iterations, 50 employers)
+		- **iteration_amount** (specify how many iterations to run the colony)
+		- **target_score** (specify target score for program to terminate)
+		- **amount_of_employers** (specify the amount of employer bees in the colony)
 - **calc_error(*args*, *dset = None*)**: calculates various metrics for error for a specified data set
 	- arguments: 
 		- **'rmse'** (root-mean-squared error)
@@ -175,6 +181,9 @@ sv.import_data()
 # Fits model(s), shuffling learn and validate sets between trials
 sv.fit_mlp_model_validation('shuffle_lv')
 
+# Tunes hyperparameters to their optimal values
+sv.tune_hyperparameters(iteration_amount = 150)
+
 # Select best trial from each build node to predict for the node
 sv.select_best()
 
@@ -238,4 +247,4 @@ To contribute to ECNet, make a pull request. Contributions should include tests
 
 To report problems with the software or feature requests, file an issue. When reporting problems, include information such as error messages, your OS/environment and Python version.
 
-For additional support/questions, contact Travis Kessler ([email protected]) or John Hunter Mack ([email protected]).
+For additional support/questions, contact Travis Kessler ([email protected]), Hernan Gelaf-Romer ([email protected]) or John Hunter Mack ([email protected]).
diff --git a/ecnet/README.md b/ecnet/README.md
@@ -1,4 +1,4 @@
-# Low-level usage of model, data_utils, error_utils, and limit_parameters
+# Low-level usage of model, data_utils, error_utils, limit_parameters, and abc
 
 ## model.py
 #### Class: multilayer_perceptron
@@ -62,3 +62,19 @@ Functions:
 - **limit(num_params, server)**: limits the number of input parameters to an integer value specified by num_params, using a "retain the best" process, where the best performing input parameter (based on RMSE) is retained, paired with every other input parameter until a best pair is found, repeated until the limit number has been reached
   - returns a list of parameters
 - **output(data, param_list, filename)**: saves a new .csv formatted database, using a generated parameter list and an output filename
+
+## abc.py
+#### Class: ABC
+Attributes:
+- **valueRanges**: a list of tuples of value types to value range (value_type, (value_min, value_max))
+- **fitnessFunction**: fitness function to evaluate a set of values; must take one parameter, a list of values
+- **endValue**: target fitness score which will terminate the program when reached
+- **iterationAmount**: amount of iterations before terminating program
+- **amountOfEmployers**: amount of sets of values stored per iteration
+
+Methods:
+- **assignNewPositions(firstBee)**: assign a new position to a given bee 
+- **getFitnessAverage()**: collect the average of all the fitness scores across all employer bees
+- **checkNewPosition(bee)**: Check if the new position is better than the fitness average, if it is, assign it to the bee
+- **checkIfDone()**: Check if the best fitness score is lower than the target score to terminate the program; only valid if the argument endValue was assigned a value
+- **runABC()**: run the artificial bee colony based on the arguments passed to the constructor. Must pass a fitness function and either a target fitness score or target iteration number in order to specify when the program will terminate. Must also specify value types/ranges.
diff --git a/ecnet/__init__.py b/ecnet/__init__.py
@@ -3,3 +3,4 @@
 import ecnet.error_utils
 import ecnet.model
 import ecnet.limit_parameters
+import ecnet.abc
diff --git a/ecnet/abc.py b/ecnet/abc.py
@@ -0,0 +1,178 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+#  ecnet/abc.py
+#  v.1.3.0.dev1
+#  Developed in 2018 by Hernan Gelaf-Romer <[email protected]>
+#
+#  This program implements an artificial bee colony to tune ecnet hyperparameters
+#
+
+# 3rd party packages (open src.)
+from random import randint
+import numpy as np
+import sys as sys
+
+### Artificial bee colony object, which contains multiple bee objects ###
+class ABC:
+
+    def __init__(self, valueRanges, fitnessFunction=None, endValue = None, iterationAmount = None, amountOfEmployers = 50):
+        if endValue == None and iterationAmount == None:
+            raise ValueError("must select either an iterationAmount or and endValue")
+        if fitnessFunction == None:
+            raise ValueError("must pass a fitness function")
+        print("***INITIALIZING***")
+        self.valueRanges = valueRanges
+        self.fitnessFunction = fitnessFunction
+        self.employers = []
+        self.bestValues = []                    # Store the values that are currently performing the best
+        self.onlooker = Bee('onlooker')
+        self.bestFitnessScore = None           # Store the current best Fitness Score
+        self.fitnessAverage = 0
+        self.endValue = endValue
+        self.iterationAmount = iterationAmount
+        # Initialize employer bees, assign them values/fitness scores
+        for i in range(amountOfEmployers):
+            sys.stdout.flush()
+            sys.stdout.write("Creating bee number: %d \r" % (i + 1))
+            self.employers.append(Bee('employer', generateRandomValues(self.valueRanges)))
+            self.employers[i].currFitnessScore = self.fitnessFunction(self.employers[i].values)
+        print("***DONE INITIALIZING***")
+
+    ### Assign a new position to the given bee
+    def assignNewPositions(self, firstBee):
+        valueTypes = [t[0] for t in self.valueRanges]
+        secondBee = randint(0, len(self.employers) -1)
+        # Avoid both bees being the same
+        while (secondBee == firstBee):
+            secondBee = randint(0, len(self.employers) -1)
+        self.onlooker.getPosition(self.employers, firstBee, secondBee, self.fitnessFunction, valueTypes)
+
+    ### Collect the average fitness score across all employers
+    def getFitnessAverage(self):
+        self.fitnessAverage = 0
+        for employer in self.employers:
+            self.fitnessAverage += employer.currFitnessScore
+            # While iterating through employers, look for the best fitness score/value pairing
+            if self.bestFitnessScore == None or employer.currFitnessScore < self.bestFitnessScore:
+                self.bestFitnessScore = employer.currFitnessScore
+                self.bestValues = employer.values      
+        self.fitnessAverage /= len(self.employers)
+
+    ### Check if new position is better than current position held by a bee
+    def checkNewPositions(self, bee):
+        # Update the bee's fitness/value pair if the new location is better
+        if bee.currFitnessScore  > self.fitnessAverage:
+            bee.values = generateRandomValues(self.valueRanges)
+            bee.currFitnessScore = self.fitnessFunction(bee.values)
+
+    ### If termination depends on a target value, check to see if it has been reached
+    def checkIfDone(self, count):
+        keepGoing = True
+        if self.endValue != None:
+            for employer in self.employers:
+                if employer.currFitnessScore <= self.endValue:
+                    print("Fitness score =", employer.currFitnessScore)
+                    print("Values =", employer.values)
+                    keepGoing = False
+        elif count >= self.iterationAmount:
+            keepGoing = False
+        return keepGoing
+
+    ### Run the artificial bee colony
+    def runABC(self):
+        running = True
+        count = 0
+
+        while True:
+            print("Assigning new positions")
+            for i in range(len(self.employers)):
+                sys.stdout.flush()
+                sys.stdout.write('At bee number: %d \r' % (i+1))
+                self.assignNewPositions(i)
+            print("Getting fitness average")
+            self.getFitnessAverage()
+            print("Checking if done")
+            count+=1
+            running = self.checkIfDone(count)
+            if running == False and self.endValue != None:
+                saveScore(self.bestFitnessScore, self.bestValues)
+                break
+            print("Current fitness average:", self.fitnessAverage)
+            print("Checking new positions, assigning random positions to bad ones")
+            for employer in self.employers:
+                self.checkNewPositions(employer)
+            print("Best score:", self.bestFitnessScore)
+            print("Best value:", self.bestValues)
+            if self.iterationAmount != None:
+                print("Iteration {} / {}".format(count, self.iterationAmount))
+            if running == False:
+                saveScore(self.bestFitnessScore, self.bestValues)
+                break
+            saveScore(self.bestFitnessScore, self.bestValues)
+
+        return self.bestValues
+
+
+### Bee object, employers contain value/fitness
+class Bee:
+
+    def __init__(self, beeType, values=[]):
+        self.beeType = beeType
+        # Only the employer bees should store values/fitness scores
+        if beeType == "employer":               
+            self.values = values            
+            self.currFitnessScore = None
+
+    ### Onlooker bee function, create a new set of positions
+    def getPosition(self, beeList, firstBee, secondBee, fitnessFunction, valueTypes):
+        newValues = []
+        currValue = 0
+        for i in range(len(valueTypes)):
+            currValue = valueFunction(beeList[firstBee].values[i], beeList[secondBee].values[i])
+            if valueTypes[i] == 'int':
+                currValue = int(currValue)
+            newValues.append(currValue)
+        beeList[firstBee].getFitnessScore(newValues, fitnessFunction)
+
+    #### Employer bee function, get fitness score for a given set of values
+    def getFitnessScore(self, values, fitnessFunction):
+        if self.beeType != "employer":
+            raise RuntimeError("Cannot get fitness score on a non-employer bee")
+        else:
+            # Your fitness function must take a certain set of values that you would like to optimize
+            fitnessScore = fitnessFunction(values)  
+            if self.currFitnessScore == None or fitnessScore < self.currFitnessScore:
+                self.value = values
+                self.currFitnessScore = fitnessScore
+
+### Private functions to be called by ABC
+
+### Generate a random set of values given a value range
+def generateRandomValues(value_ranges):
+    values = []
+    if value_ranges == None:
+        raise RuntimeError("must set the type/range of possible values")
+    else:
+        # t[0] contains the type of the value, t[1] contains a tuple (min_value, max_value)
+        for t in value_ranges:  
+            if t[0] == 'int':
+                values.append(randint(t[1][0], t[1][1]))
+            elif t[0] == 'float':
+                values.append(np.random.uniform(t[1][0], t[1][1]))
+            else:
+                raise RuntimeError("value type must be either an 'int' or a 'float'")
+    return values
+
+### Method of generating a value in between the values given
+def valueFunction(a, b):  
+    activationNum = np.random.uniform(-1, 1)
+    return a + abs(activationNum * (a - b))
+
+### Function for saving the scores of each iteration onto a file
+def saveScore(score, values, filename = 'scores.txt'):
+    f = open(filename, 'a')
+    string = "Score: {} Values: {}".format(score, values)
+    f.write(string)
+    f.write('\n')
+    f.close()
diff --git a/ecnet/data_utils.py b/ecnet/data_utils.py
@@ -2,7 +2,7 @@
 # -*- coding: utf-8 -*-
 #
 #  ecnet/data_utils.py
-#  v.1.2.7.dev1
+#  v.1.3.0.dev1
 #  Developed in 2018 by Travis Kessler <[email protected]>
 #
 #  This program contains the data object class, and functions for manipulating/importing/outputting data

diff --git a/ecnet/error_utils.py b/ecnet/error_utils.py
@@ -2,7 +2,7 @@
 # -*- coding: utf-8 -*-
 #
 #  ecnet/error_utils.py
-#  v.1.2.7.dev1
+#  v.1.3.0.dev1
 #  Developed in 2018 by Travis Kessler <[email protected]>
 #
 #  This program contains functions for error calculations

diff --git a/ecnet/model.py b/ecnet/model.py
@@ -2,7 +2,7 @@
 # -*- coding: utf-8 -*-
 #
 #  ecnet/error_utils.py
-#  v.1.2.7.dev1
+#  v.1.3.0.dev1
 #  Developed in 2018 by Travis Kessler <[email protected]>
 #
 #  This program contains functions necessary creating, training, saving, and importing neural network models

diff --git a/ecnet/server.py b/ecnet/server.py
@@ -2,7 +2,7 @@
 # -*- coding: utf-8 -*-
 #
 #  ecnet/server.py
-#  v.1.2.7.dev1
+#  v.1.3.0.dev1
 #  Developed in 2018 by Travis Kessler <[email protected]>
 #
 #  This program contains all the necessary config parameters and network serving functions
@@ -22,6 +22,7 @@
 import ecnet.model
 import ecnet.limit_parameters
 import ecnet.error_utils
+import ecnet.abc
 
 ### Config/server object; to be referenced by most other files ###
 class Server:
@@ -290,6 +291,24 @@ def open_project(self, project_name):
 		create_folder_structure(self)
 		self.model = create_model(self)
 
+	### Optimizes and tunes the the hyperparameters for ecnet
+	def tune_hyperparameters(self, target_score = None, iteration_amount = 50, amount_of_employers = 50):
+    		# Check which arguments to use to terminate artifical bee colony, then create the ABC object
+		if target_score == None:
+    			abc = ecnet.abc.ABC(iterationAmount = iteration_amount, fitnessFunction=runNeuralNet, valueRanges=ecnetValues, amountOfEmployers=amount_of_employers)
+		else:
+    			abc = ecnet.abc.ABC(endValue = target_score, fitnessFunction=runNeuralNet, valueRanges=ecnetValues, amountOfEmployers=amount_of_employers)
+		# Run the artificial bee colony and return the resulting hyperparameter values
+		hyperparams = abc.runABC()
+		# Assign the hyperparameters generated from the artificial bee colony to ecnet
+		self.vars['learning_rate'] = hyperparams[0]
+		self.vars['valid_mdrmse_stop'] = hyperparams[1]
+		self.vars['valid_max_epochs'] = hyperparams[2]
+		self.vars['valid_mdrmse_memory'] = hyperparams[3]
+		self.vars['mlp_hidden_layers[0][0]'] = hyperparams[4]
+		self.vars['mlp_hidden_layers[1][0]'] = hyperparams[5]
+		return hyperparams
+
 # Creates the default folder structure, outlined in the file config by number of builds and nodes.
 def create_folder_structure(server_obj):
 	server_obj.build_dirs = []
@@ -408,5 +427,23 @@ def create_default_config():
 		'valid_mdrmse_memory' : 1000
 	}
 	yaml.dump(config_dict,stream)
+
+def runNeuralNet(values):
+    # Run the ecnet server
+    config_file = import_config()
+    sv = Server()
+    sv.vars['learning_rate'] = values[0]
+    sv.vars['valid_mdrmse_stop'] = values[1]
+    sv.vars['valid_max_epochs'] = values[2]
+    sv.vars['valid_mdrmse_memory'] = values[3]
+    sv.vars['mlp_hidden_layers[0][0]'] = values[4]
+    sv.vars['mlp_hidden_layers[1][0]'] = values[5]
+    sv.vars['data_filename'] = config_file['data_filename']
 
-
+    sv.import_data(sv.vars['data_filename'])
+    sv.fit_mlp_model_validation('shuffle_lv')
+    test_errors = sv.calc_error('rmse')
+    sv.publish_project()
+    return test_errors['rmse']
+
+ecnetValues = [('float', (0.001, 0.1)), ('float', (0.000001,0.01)), ('int', (1250, 2500)), ('int', (500, 2500)), ('int', (12,32)), ('int', (12,32))]
diff --git a/examples/README.md b/examples/README.md
@@ -8,3 +8,4 @@
   - **limit_db_parameters.py**: imports a database, reduces the input dimensionality using a "retain the best" algorithm, and saves the reduced database to a specified file
   - **create_static_test_set.py**: Imports a dataset, and creates two files; one containing the test data, one containing the training (learning + validation) data; set sizes are determined by 'data_split' server variable
   - **select_from_test_set_performance.py**: Select best trial from each node using static test set performance
+  - **abc_script.py**: Select an optimal set of values given a fitness function and a set of value ranges
diff --git a/examples/abc_script.py b/examples/abc_script.py
@@ -0,0 +1,23 @@
+"""
+EXAMPLE SCRIPT:
+Find optimal values for a given set of value ranges, and a fitness function
+
+Save the scores of each iteration in a text file called scores.txt
+"""
+
+from ecnet.abc import ABC
+
+# Define a fitness function
+def fitnessTest(values):
+    fit = 0
+    for val in values:
+        fit+=val
+    return fit
+
+# Define a set of value ranges with types attached to them
+values = [('int', (0,100)), ('int', (0,100)), ('int',(0,100)), ('float', (10,1000))]
+
+# Create the abc object
+abc = ABC(fitnessFunction = fitnessTest, amountOfEmployers = 100, valueRanges = values, endValue = 5)
+# Run the colony until the fitness score reaches 5 or less
+abc.runABC()
diff --git a/setup.py b/setup.py
@@ -1,12 +1,12 @@
 from setuptools import setup
 
 setup(name = 'ecnet',
-version = "1.2.7.dev2",
+version = "1.3.0.dev1",
 description = 'UMass Lowell Energy and Combustion Research Laboratory Neural Network Software',
 url = 'http://github.com/tjkessler/ecnet',
-author = 'Travis Kessler',
-author_email = '[email protected]',
+author = 'Travis Kessler, Hernan Gelaf-Romer, Sanskriti Sharma',
+author_email = '[email protected], [email protected], [email protected]',
 license = 'MIT',
 packages = ['ecnet'],
-install_requires = ["tensorflow","pyyaml"],
+install_requires = ["tensorflow","pyyaml", "numpy"],
 zip_safe = False)