Skip to content

Latest commit

 

History

History
160 lines (100 loc) · 6.17 KB

README.md

File metadata and controls

160 lines (100 loc) · 6.17 KB
CERN-LOGO NextGen CMSlogo MIT-social-media-logo-white

Training Jet Taggers for CMS Phase 2 L1 Trigger

Documentation for training a L1T Jet Tagging model for CMS Phase-2 L1 upgrades.

To train the jet tagger, there are multiple steps that one need to follow, from creating the raw datasets, preprocessing them, train the model, synthesize it, and make validation plots for different physics seeds. This README describes all the steps in a sequential manner.

The CI in this repository aims at building a pipeline that enables running all of these steps automatically.

A summary menu of all the steps is listed below:

1. Produce Raw Training Datasets

2. Prepare the data and train the model

3. Physics Validation

4. Synthesize the model (with wrapper and CMSSW)

5. Implement the model in FPGA Firmware

Note that the instructions are assuming that you have access to the appropriate eos data spaces. If you are not interested in reading lengthy documentation like me, here is a ultra-short version to get started on running the code (more details in each specific command is provided in each section above, futher help can be found by looking into each script):

#Activate the environment
conda activate tagger

#Run this to add the scripts in this directory to your python path
export PYTHONPATH=$PYTHONPATH:$PWD
export CI_COMMIT_REF_NAME=local

#Prepare the data
python tagger/train/train.py --make-data

#Train the model
python tagger/train/train.py

#Make some basic validation plots
python tagger/train/train.py --plot-basic

#Make other plots for bbbb/bbtautau final state for example:
python tagger/plot/bbbb.py
python tagger/plot/bbtautau.py

#OR vbf tautau
python tagger/plot/vbf_tautau.py

#Synthesize the model (with wrapper and CMMSSW)
python tagger/firmware/hls4ml_convert.py

1. Produce Raw Training Dataset

Creating the training datasets involve several steps:

  1. Taking the RAW samples and pruning/sliming them. This can be done running the runInputs_X_X.py scripts in FastPUPPI, which also uses submission repo. This is currently done for all, and stored in here:
/eos/cms/store/cmst3/group/l1tr/FastPUPPI/14_0_X/fpinputs_131X/v9a/
  1. These samples will then be processed by the nTuplizer, which is part of the FastPUPPI repo. In particular the runPerformanceNTuple.py, which calls jetNTuplizer.cc. Note that to submit jobs as part of this setup, you also need the submission repo as well.

2. Prepare the data and train the model

After creating the training ntuples, in our setup, they will then be shuffled and concatenate (hadd) into a big file, such as this one:

/eos/cms/store/cmst3/group/l1tr/sewuchte/l1teg/fp_ntuples_v131Xv9/baselineTRK_4param_021024/All200.root

We use one of the scripts in this respository to prepare the training data. First, you have to set up the conda environment and set up the appropriate paths for the scripts:

#Create the environment from the yaml file
conda-env create -f environment.yml

#Activate the environment
conda activate tagger

#Run this to add the scripts in this directory to your python path
export PYTHONPATH=$PYTHONPATH:$PWD

Then, to prepare the data for training:

python tagger/train/train.py --make-data 

This prepare the data using the default options(look into the script to see what the options are). If you want to customize the input data path, or the data step size for uproot.iterate, then you can use the full options

python tagger/train/train.py --make-data -i <your-rootfile> -s <custom-step-size>

This automatically create a new directory: training_data (it will ask before removing the exisiting one), and writes the data into it. Then, to train the model:

python tagger/train/train.py

The models are defined in tagger/train/models.py the baseline model is provided as default.

3. Physics Validation

Various physics validation plots can be make using the tagger/plot modules, the plots are divided into different final states, such as bbbb.py, to use the script, you need to derive the working points before evaluating the background rate/efficiency.

python tagger/plot/bbbb.py --deriveWPs -n <number of samples to use, usually ~1M>

then, evaluate the efficiency using:

python tagger/plot/bbbb.py --eff -n <number of samples to use, usually ~500k>

4. Synthesize the model to HDL Codes

To synthesize the model into HDL codes, we first need use hls4ml:

python tagger/firmware/hls4ml_convert.py

Then, these codes are synthesize again with an hls wrapper, and CMSSW:

5. Implement model on FPGA firmware


Conda Environment Notes

To deactivate the environment:

conda deactivate

If you make any update for the environment, please edit the environment.yml file and run:

conda env update --file environment.yml  --prune

Reference on conda environment here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Related Materials

Related talks and materials to the project can be found here, they are ordered chronologically.