The repository contains the codes about the network structure of paper "Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation".
name | nodes | activation function |
---|---|---|
LSTM Layer | 512 | N/A |
Fully Connected Layer 1 | 1024 | Relu |
Fully Connected Layer 2 | 1000 | Relu |
Fully Connected Layer 3 | 3 | N/A |
We are using the on-policy prediction method Sarsa (State–Action–Reward–State–Action). It's a Temporal Difference learning method, and estimate the player performance by Q(s,a), where state s is a series of game contexts and action a is the motion of player.
Use python td_three_prediction_lstm.py
to train the neural network, which produce the Q values. Goal-Impact-Metric is the different between consecutive Q values.
The origin works uses a private play-by-play dataset from Sportlogiq, which we are not allowed to publish.
If you want to run the network, please prepare your won sequential dataset, please organize the data according to network input in the format of Numpy. As it's shown in td_three_prediction_lstm.py
, the neural network requires three input files:
- reward
- state_input (conrtains both state features and one hot represetation of action)
- state_trace_length
To be specific, if you want to directly run this python RNN scripy, you need to prepare the input in this way. In each game file, there are three .mat files representing reward, state_input and state_trace_length. The name of files should follow the rules below:
- GameDirectory_xxx
- dynamic_rnn_reward_xxx.mat
- A two dimensional array named 'dynamic_rnn_reward' should be in the .mat file
- Row of the array: R, Column of the array: 10
- dynamic_rnn_input_xxx.mat
- A three dimensional array named 'dynamic_feature_input' should be in the .mat file
- First dimension: R, Second dimension: 10, Third dimension: feature number
- hybrid_trace_length_xxx.mat
- A two dimensional array named 'hybrid_trace_length' should be in the .mat file
- Row of the array: 1, Column of the array: Unknown
- The array gives us information about how to split the length of different plays, so the sum(array_element) should be R
- dynamic_rnn_reward_xxx.mat
in which xxx is a random string.
Each input file must has the same number of rows R (corresponding to number of events in a game). In our paper, we have trace length equals to 10, so reward is an R*10 array, state_input is an R*10*feature_number array and state_trace_length is an one demensional vector that tells the length of plays in a game.
# R=3, feature number=1
>>> reward['dynamic_rnn_reward']
array([[0, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])
>>> state_input['dynamic_feature_input']
array([[[-4.51194112e-02],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],
[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00]],
[[-4.51194112e-02],[ 5.43495586e-04],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],
[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00]],
[[-4.51194112e-02],[ 5.43495586e-04],[-3.46831161e-01],[ 0.00000000e+00],[ 0.00000000e+00],
[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00],[ 0.00000000e+00]]])
>>> trace_length['hybrid_trace_length']
array([[1, 2]])
The data must be standardized or normalized before inputing to the neural network, we are using the sklearn.preprocessing.scale
Python 2.7
- Numpy
- Tensorflow (1.0.0?)
- Scipy
- Matplotlib
- scikit-learn (We may need a requirement.txt)
(For Oliver's students with access to the net drive, the following steps should work on lab's machine)
Training:
- modify the
save_mother_dir
inconfiguration.py
as your save directory, e.g./cs/oschulte/Bill/
or just/local_scratch/
cd
into yoursave_mother_dir
, make two directories./models/hybrid_sl_saved_NN/
and./models/hybrid_sl_log_NN/
- modify the global
DATA_STORE
variable intd_three_prediction_lstm.py
as/cs/oschulte/Galen/Hockey-data-entire/Hybrid-RNN-Hockey-Training-All-feature5-scale-neg_reward_v_correct__length-dynamic/
- check the package and python version as mentioned above
python td_three_prediction_lstm.py
Evaluation:
- suppose you have finish the step 1-5 in the training process, to evalute the network only, just disable the AdamOptimizer. Modify line 188-192 in
td_three_prediction_lstm.py
as below
[diff, read_out, cost_out, summary_train] = sess.run(
[model.diff, model.read_out, model.cost, merge],
feed_dict={model.y: y_batch,
model.trace_lengths: trace_t0_batch,
model.rnn_input: s_t0_batch})
python td_three_prediction_lstm.py
- we have a pretrained network in
/cs/oschulte/Bill/hybrid_sl_saved_NN/Scale-three-cut_together_saved_networks_feature5_batch32_iterate30_lr0.0001_v4_v_correct__MaxTL2/
only for LSTM_V4. If you want to directly use this network to evaluate, finish the step 1-4 in the training process, and modify the globalSAVED_NETWORK
variable intd_three_prediction_lstm.py
as the previous network directory, then you can run the code using step 2.
MIT LICENSE
we are still updating this repository.