Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Train neural network for roman pots momentum reconstruction #10

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

rahmans1
Copy link

Briefly, what does this PR introduce?

Train neural network for roman pots momentum reconstruction

What kind of change does this PR introduce?

  • Bug fix (issue #__)
  • New feature (issue #__)
  • Documentation update
  • Other: __

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

Does this PR change default behavior?

…m reconstruction. Function create a parametried dense neural network and train it with standardized input data.
@rahmans1 rahmans1 self-assigned this Jan 10, 2024
@rahmans1 rahmans1 changed the title Train neural network for roman pots momentum reconstruction Draft: Train neural network for roman pots momentum reconstruction Jan 10, 2024
@rahmans1 rahmans1 marked this pull request as draft January 10, 2024 18:13
@rahmans1
Copy link
Author

@veprbl I have a snakemake file where I use some global variables which are then used to define directory structure (DETECTOR_VERSION and DETECTOR_CONFIG).
Line 44
Line 53
Line 66

I am noticing that if I have more than than one value stored as list in these variables, snakemake fails. For example, if i set DETECTOR_VERSION=["23.11.0","23.12.0"]

MissingOutputException in rule roman_pots_generate_neural_network_configs in file /w/eic-scshelf2104/users/rahmans/RomanPotsML/detector_benchmarks/benchmarks/roman_pots/Snakefile, line 50:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
results/23.12.0/epic_craterlake/detector_benchmarks/roman_pots/ml/metadata/931f1764775d40f79ec4f5781f2e810ba0987e11064e7b89928373b5095f922395d301b8dfc228c696b3e5b8badfff29b43a191e1bb56caf2458f91ba634cfd3.txt
results/23.12.0/epic_craterlake/detector_benchmarks/roman_pots/ml/metadata/a94f560f93e5c337b2105bfe01d04ef354ab17e083ee609fd5fb75c69a4f49a876296d36fae5b83387b174a9f6879176884a36ff2c502acfd77543345b4d34e9.txt

However, if i don't use that variable to define directory structure, then it runs as expected. For example, this works fine:

output:
    expand("results/{detector_config}/detector_benchmarks/roman_pots/ml/metadata/{model_version}.txt",
           detector_config=DETECTOR_CONFIG,
           model_version=MODEL_VERSION) 

Does snakemake expect all outputs from an individual rule to go to a single directory?

@rahmans1
Copy link
Author

@veprbl I have a snakemake file where I use some global variables which are then used to define directory structure (DETECTOR_VERSION and DETECTOR_CONFIG). Line 44 Line 53 Line 66

I am noticing that if I have more than than one value stored as list in these variables, snakemake fails. For example, if i set DETECTOR_VERSION=["23.11.0","23.12.0"]

MissingOutputException in rule roman_pots_generate_neural_network_configs in file /w/eic-scshelf2104/users/rahmans/RomanPotsML/detector_benchmarks/benchmarks/roman_pots/Snakefile, line 50:
Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
results/23.12.0/epic_craterlake/detector_benchmarks/roman_pots/ml/metadata/931f1764775d40f79ec4f5781f2e810ba0987e11064e7b89928373b5095f922395d301b8dfc228c696b3e5b8badfff29b43a191e1bb56caf2458f91ba634cfd3.txt
results/23.12.0/epic_craterlake/detector_benchmarks/roman_pots/ml/metadata/a94f560f93e5c337b2105bfe01d04ef354ab17e083ee609fd5fb75c69a4f49a876296d36fae5b83387b174a9f6879176884a36ff2c502acfd77543345b4d34e9.txt

However, if i don't use that variable to define directory structure, then it runs as expected. For example, this works fine:

output:
    expand("results/{detector_config}/detector_benchmarks/roman_pots/ml/metadata/{model_version}.txt",
           detector_config=DETECTOR_CONFIG,
           model_version=MODEL_VERSION) 

Does snakemake expect all outputs from an individual rule to go to a single directory?

Nevermind. I think i messed up my combinatorics. Most likely not a snakemake issue.

rahmans1 and others added 4 commits January 25, 2024 18:14
…s obeyed. Execute snakemake all1/2/3 in sequence to avoid errors
…ts rules. Parallelization at generation stage through use of wildcards.
Comment on lines +49 to +56
num_epochs=MODEL_PZ["num_epochs"],
learning_rate=MODEL_PZ["learning_rate"],
size_input=MODEL_PZ["size_input"],
size_output=MODEL_PZ["size_output"],
n_layers=MODEL_PZ["n_layers"],
size_first_hidden_layer=MODEL_PZ["size_first_hidden_layer"],
multiplier=MODEL_PZ["multiplier"],
leak_rate=MODEL_PZ["leak_rate"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_epochs=MODEL_PZ["num_epochs"],
learning_rate=MODEL_PZ["learning_rate"],
size_input=MODEL_PZ["size_input"],
size_output=MODEL_PZ["size_output"],
n_layers=MODEL_PZ["n_layers"],
size_first_hidden_layer=MODEL_PZ["size_first_hidden_layer"],
multiplier=MODEL_PZ["multiplier"],
leak_rate=MODEL_PZ["leak_rate"]
**MODEL_PZ,

Comment on lines +121 to +129
detector_path=DETECTOR_PATH,
nevents_per_file=NEVENTS_PER_FILE,
detector_config=DETECTOR_CONFIG
output:
"results/"+DETECTOR_VERSION+"/"+DETECTOR_CONFIG+"/detector_benchmarks/"+SUBSYSTEM+"/"+BENCHMARK+"/raw_data/"+DETECTOR_VERSION+"_"+DETECTOR_CONFIG+"_{index}.edm4hep.root"
shell:
"""
npsim --steeringFile {input.script} \
--compactFile {params.detector_path}/{params.detector_config}.xml \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop global variables in the leaf rules, and use wildcards where possible

Suggested change
detector_path=DETECTOR_PATH,
nevents_per_file=NEVENTS_PER_FILE,
detector_config=DETECTOR_CONFIG
output:
"results/"+DETECTOR_VERSION+"/"+DETECTOR_CONFIG+"/detector_benchmarks/"+SUBSYSTEM+"/"+BENCHMARK+"/raw_data/"+DETECTOR_VERSION+"_"+DETECTOR_CONFIG+"_{index}.edm4hep.root"
shell:
"""
npsim --steeringFile {input.script} \
--compactFile {params.detector_path}/{params.detector_config}.xml \
detector_path=DETECTOR_PATH,
nevents_per_file=NEVENTS_PER_FILE,
output:
"results/"+DETECTOR_VERSION+"/{DETECTOR_CONFIG}/detector_benchmarks/"+SUBSYSTEM+"/"+BENCHMARK+"/raw_data/"+DETECTOR_VERSION+"_{DETECTOR_CONFIG}_{index}.edm4hep.root"
shell:
"""
npsim --steeringFile {input.script} \
--compactFile {params.detector_path}/{wildcards.DETECTOR_CONFIG}.xml \

Comment on lines +1 to +12
//-------------------------
//
// Hit reader to relate hits at Roman Pots to momentum vectors from MC.
//
// Input(s): output file from npsim particle gun for RP particles.
//
// Output(s): txt file with training information with px_mc, py_mc, pz_mc, x_rp, slope_xrp, y_rp, slope_yrp
//
//
// Author: Alex Jentsch
//------------------------
//Low PT preprocessing added by David Ruth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//-------------------------
//
// Hit reader to relate hits at Roman Pots to momentum vectors from MC.
//
// Input(s): output file from npsim particle gun for RP particles.
//
// Output(s): txt file with training information with px_mc, py_mc, pz_mc, x_rp, slope_xrp, y_rp, slope_yrp
//
//
// Author: Alex Jentsch
//------------------------
//Low PT preprocessing added by David Ruth
// Copyright 2023 - 2024, Alex Jentsch, David Ruth
// SPDX-License-Identifier: LGPL-3.0-only
//-------------------------
//
// Hit reader to relate hits at Roman Pots to momentum vectors from MC.
//
// Input(s): output file from npsim particle gun for RP particles.
//
// Output(s): txt file with training information with px_mc, py_mc, pz_mc, x_rp, slope_xrp, y_rp, slope_yrp
//
//
// Author: Alex Jentsch
//------------------------
//Low PT preprocessing added by David Ruth

@@ -0,0 +1,163 @@
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import pandas as pd
# Copyright YYYY, NAME
# SPDX-License-Identifier: LGPL-3.0-only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants