Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Unified Particle Transformer v2 #47173

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

AlexDeMoor
Copy link
Contributor

PR description:

This PR opens the update of the Unified Particle Transformer bringing substantial performance improvement for flavour tagging performance and model inference time. Link of the XPOG meeting : https://indico.cern.ch/event/1504557/#3-upart-training-updates

This PR has to be tested with the following PR of the model :cms-data/RecoBTag-Combined#64

Please note the final model is being finalized for training and validation.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 23, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47173/43412

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AlexDeMoor for master.

It involves the following packages:

  • RecoBTag/FeatureTools (reconstruction)
  • RecoBTag/ONNXRuntime (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@AlexDeMoor, @Ming-Yan, @Senphy, @andrzejnovak, @castaned, @hqucms, @missirol this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@hqucms
Copy link
Contributor

hqucms commented Jan 24, 2025

test parameters:

@hqucms
Copy link
Contributor

hqucms commented Jan 24, 2025

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43935/summary.html
COMMIT: 33e0cf4
CMSSW: CMSSW_15_0_X_2025-01-23-1100/el8_amd64_gcc12
Additional Tests: NANO,PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47173/43935/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43935/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43935/git-merge-result

Comparison Summary

Summary:

NANO Comparison Summary

Summary:

  • You potentially removed 70 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 1314 differences found in the comparisons
  • DQMHistoTests: Total files compared: 21
  • DQMHistoTests: Total histograms compared: 75127
  • DQMHistoTests: Total failures: 3489
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 71638
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 20 files compared)
  • Checked 105 log files, 60 edm output root files, 21 DQM output files
  • TriggerResults: no differences found

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.001 3.105 3.114 -0.009 ( -0.3% ) 6.00 6.40 -6.3% 2.577 2.562
2500.002 3.223 3.230 -0.007 ( -0.2% ) 5.44 5.72 -5.0% 3.011 2.586
2500.003 3.161 3.171 -0.010 ( -0.3% ) 5.65 5.95 -5.0% 2.982 2.604
2500.011 1.638 1.644 -0.006 ( -0.4% ) 10.45 10.15 +3.0% 2.663 2.660
2500.012 2.176 2.184 -0.008 ( -0.4% ) 5.84 5.97 -2.1% 2.839 2.473
2500.013 1.994 2.000 -0.006 ( -0.3% ) 8.17 8.45 -3.3% 2.767 2.464
2500.021 0.022 0.022 0.000 ( +0.0% ) 1.97 2.06 -4.3% 2.641 2.632
2500.022 0.022 0.022 0.000 ( +0.0% ) 1.76 2.00 -11.8% 2.639 2.632
2500.023 0.022 0.022 0.000 ( +0.0% ) 1.74 1.90 -8.3% 2.500 2.484
2500.024 0.022 0.022 0.000 ( +0.0% ) 1.44 1.55 -7.1% 2.733 2.732
2500.031 0.035 0.035 0.000 ( +0.0% ) 1.62 1.76 -7.8% 2.693 2.676
2500.032 0.036 0.036 0.000 ( +0.0% ) 1.62 1.77 -8.8% 2.642 2.645
2500.033 0.037 0.037 0.000 ( +0.1% ) 1.56 1.70 -8.3% 2.734 2.732
2500.034 0.036 0.036 0.000 ( +0.0% ) 1.60 1.70 -5.6% 2.706 2.703
2500.101 2.844 2.844 0.000 ( +0.0% ) 13.11 16.45 -20.3% 2.681 2.676
2500.111 1.463 1.463 0.000 ( +0.0% ) 26.55 31.87 -16.7% 2.363 2.372
2500.112 1.883 1.883 0.000 ( +0.0% ) 19.40 26.28 -26.2% 2.440 2.438
2500.131 0.750 0.750 0.000 ( +0.0% ) 31.54 37.72 -16.4% 1.509 1.509
2500.201 2.674 2.674 0.000 ( +0.0% ) 10.76 13.87 -22.4% 2.250 2.234
2500.211 1.806 1.806 0.000 ( +0.0% ) 21.95 27.23 -19.4% 2.427 2.433
2500.212 2.203 2.203 0.000 ( +0.0% ) 17.74 22.68 -21.8% 2.532 2.514
2500.221 2.038 2.038 0.000 ( +0.0% ) 11.06 14.47 -23.6% 2.166 2.150
2500.222 3.479 3.479 0.000 ( +0.0% ) 9.90 13.13 -24.6% 2.251 2.253
2500.223 9.431 9.444 -0.013 ( -0.1% ) 4.17 4.27 -2.4% 2.274 2.269
2500.224 6.289 6.304 -0.015 ( -0.2% ) 1.48 1.39 +5.9% 2.261 2.270
2500.225 6.334 6.350 -0.016 ( -0.2% ) 1.25 1.30 -3.7% 2.465 2.457
2500.226 3.172 3.172 0.000 ( +0.0% ) 10.82 13.79 -21.6% 2.192 2.243
2500.227 1.442 1.442 0.000 ( +0.0% ) 19.80 24.12 -17.9% 1.484 1.482
2500.228 3.957 3.957 0.000 ( +0.0% ) 7.07 9.20 -23.2% 2.333 2.280
2500.231 1.456 1.456 0.000 ( +0.0% ) 18.78 22.73 -17.4% 2.334 2.057
2500.232 2.462 2.462 0.000 ( +0.0% ) 17.07 21.65 -21.1% 2.434 2.433
2500.233 4.946 4.954 -0.008 ( -0.2% ) 6.48 6.23 +4.0% 2.514 2.493
2500.234 3.833 3.842 -0.009 ( -0.2% ) 1.89 1.80 +5.1% 2.455 2.217
2500.235 3.864 3.873 -0.010 ( -0.2% ) 1.78 1.69 +5.5% 2.666 2.405
2500.236 2.252 2.252 0.000 ( +0.0% ) 17.64 22.36 -21.1% 2.425 2.418
2500.237 1.018 1.018 0.000 ( +0.0% ) 29.41 35.70 -17.6% 1.503 1.435
2500.238 2.444 2.444 0.000 ( +0.0% ) 14.27 17.43 -18.2% 2.508 2.063
2500.241 9.404 9.404 0.000 ( +0.0% ) 6.05 4.20 +44.1% 1.956 1.869
2500.242 10.331 10.331 0.000 ( +0.0% ) 1.27 1.61 -21.1% 1.755 1.552
2500.243 2.712 2.712 0.000 ( +0.0% ) 13.46 15.84 -15.0% 1.092 1.088
2500.244 486.016 486.016 0.000 ( +0.0% ) 0.89 1.16 -22.9% 1.723 1.712
2500.245 826.413 826.413 0.000 ( +0.0% ) 1.30 1.50 -13.0% 1.735 1.699
2500.901 1.819 1.819 0.000 ( +0.0% ) 33.08 47.17 -29.9% 1.477 1.476
2500.902 1.665 1.665 0.000 ( +0.0% ) 38.17 47.77 -20.1% 1.370 1.372
2500.911 14.345 14.345 0.000 ( +0.0% ) 6.36 8.42 -24.4% 1.122 1.119
2500.912 0.199 0.199 0.000 ( +0.0% ) 2.21 3.83 -42.1% 0.882 0.878
2500.913 0.110 0.110 0.000 ( +0.0% ) 1.82 2.70 -32.7% 0.881 0.880

@jfernan2
Copy link
Contributor

@AlexDeMoor do I understand correctly from your PR description that performance should be improved with this PR? The rate ev/s/thd is decreased for all the NANO workflows tested

@hqucms
Copy link
Contributor

hqucms commented Jan 24, 2025

assign xpog

@cmsbuild
Copy link
Contributor

New categories assigned: xpog

@ftorrresd,@hqucms you have been requested to review this Pull request/Issue and eventually sign? Thanks

@hqucms
Copy link
Contributor

hqucms commented Jan 27, 2025

@AlexDeMoor do I understand correctly from your PR description that performance should be improved with this PR? The rate ev/s/thd is decreased for all the NANO workflows tested

I think there are some fluctuations in the NANO tests -- let's run again and see.

@hqucms
Copy link
Contributor

hqucms commented Jan 27, 2025

please test

@AlexDeMoor
Copy link
Contributor Author

Hi @jfernan2 and @hqucms , yes we expect the model to be faster (~40% faster). I agree with re-running again :)

@nurfikri89
Copy link
Contributor

@AlexDeMoor do I understand correctly from your PR description that performance should be improved with this PR? The rate ev/s/thd is decreased for all the NANO workflows tested

I think there are some fluctuations in the NANO tests -- let's run again and see.

@hqucms Just to add that the changes in this PR should not directly affect NanoAOD workflows computing time because the inference is done at PAT level.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43965/summary.html
COMMIT: 33e0cf4
CMSSW: CMSSW_15_0_X_2025-01-27-1100/el8_amd64_gcc12
Additional Tests: NANO,PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47173/43965/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43965/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e554d/43965/git-merge-result

Comparison Summary

Summary:

NANO Comparison Summary

Summary:

  • You potentially removed 363 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 1369 differences found in the comparisons
  • DQMHistoTests: Total files compared: 21
  • DQMHistoTests: Total histograms compared: 75127
  • DQMHistoTests: Total failures: 3239
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 71888
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 20 files compared)
  • Checked 105 log files, 60 edm output root files, 21 DQM output files
  • TriggerResults: no differences found

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.001 3.104 3.114 -0.010 ( -0.3% ) 7.01 6.42 +9.1% 2.578 2.554
2500.002 3.221 3.230 -0.010 ( -0.3% ) 6.15 5.73 +7.2% 3.011 2.981
2500.003 3.160 3.171 -0.011 ( -0.3% ) 6.49 6.00 +8.3% 2.992 2.950
2500.011 1.638 1.644 -0.007 ( -0.4% ) 11.80 10.25 +15.2% 2.669 2.631
2500.012 2.175 2.184 -0.009 ( -0.4% ) 6.55 5.98 +9.7% 2.847 2.821
2500.013 1.993 2.000 -0.007 ( -0.3% ) 9.29 8.42 +10.2% 2.748 2.725
2500.021 0.022 0.022 0.000 ( +0.2% ) 1.96 2.04 -3.5% 2.657 2.604
2500.022 0.022 0.022 0.000 ( +0.0% ) 1.88 1.95 -3.7% 2.662 2.601
2500.023 0.022 0.022 0.000 ( +0.0% ) 1.89 1.91 -1.0% 2.520 2.457
2500.024 0.022 0.022 0.000 ( +0.0% ) 1.51 1.57 -3.6% 2.758 2.706
2500.031 0.035 0.035 0.000 ( +0.0% ) 1.72 1.76 -1.9% 2.713 2.654
2500.032 0.036 0.036 0.000 ( +0.0% ) 1.74 1.77 -1.5% 2.670 2.614
2500.033 0.037 0.037 0.000 ( +0.0% ) 1.63 1.70 -4.1% 2.751 2.694
2500.034 0.036 0.036 0.000 ( +0.0% ) 1.66 1.70 -2.3% 2.739 2.678
2500.101 2.844 2.844 0.000 ( +0.0% ) 16.60 16.68 -0.5% 2.644 2.633
2500.111 1.463 1.463 0.000 ( +0.0% ) 31.67 32.00 -1.1% 2.334 2.335
2500.112 1.883 1.883 0.000 ( +0.0% ) 26.60 26.53 +0.3% 2.408 2.406
2500.131 0.750 0.750 0.000 ( +0.0% ) 37.88 37.50 +1.0% 1.501 1.496
2500.201 2.674 2.674 0.000 ( +0.0% ) 13.86 13.85 +0.1% 2.214 2.211
2500.211 1.806 1.806 0.000 ( +0.0% ) 27.75 27.70 +0.2% 2.399 2.412
2500.212 2.203 2.203 0.000 ( +0.0% ) 22.79 22.93 -0.6% 2.493 2.490
2500.221 2.038 2.038 0.000 ( +0.0% ) 14.59 14.55 +0.3% 2.120 2.121
2500.222 3.479 3.479 0.000 ( +0.0% ) 13.19 13.17 +0.1% 2.221 2.218
2500.223 9.430 9.444 -0.014 ( -0.1% ) 4.86 4.31 +12.8% 2.348 2.291
2500.224 6.464 6.480 -0.016 ( -0.2% ) 1.73 1.42 +21.7% 2.315 2.239
2500.225 6.510 6.527 -0.017 ( -0.3% ) 1.56 1.30 +20.1% 2.533 2.454
2500.226 3.172 3.172 0.000 ( +0.0% ) 13.80 13.89 -0.6% 2.216 2.210
2500.227 1.442 1.442 0.000 ( +0.0% ) 24.15 24.20 -0.2% 1.454 1.448
2500.228 3.957 3.957 0.000 ( +0.0% ) 9.21 9.23 -0.2% 2.310 2.305
2500.231 1.457 1.457 0.000 ( +0.0% ) 23.06 23.20 -0.6% 2.293 2.293
2500.232 2.462 2.462 0.000 ( +0.0% ) 21.61 21.41 +0.9% 2.394 2.392
2500.233 4.945 4.954 -0.009 ( -0.2% ) 7.30 6.26 +16.6% 2.518 2.474
2500.234 3.834 3.844 -0.010 ( -0.3% ) 2.24 1.81 +23.9% 2.451 2.181
2500.235 3.865 3.876 -0.011 ( -0.3% ) 2.06 1.69 +21.5% 2.657 2.386
2500.236 2.252 2.252 0.000 ( +0.0% ) 22.87 23.06 -0.8% 2.399 2.395
2500.237 1.018 1.018 0.000 ( +0.0% ) 35.51 35.49 +0.1% 1.459 1.454
2500.238 2.444 2.444 0.000 ( +0.0% ) 17.57 17.61 -0.3% 2.486 2.471
2500.241 9.404 9.404 0.000 ( +0.0% ) 7.68 5.18 +48.1% 1.927 1.926
2500.242 10.331 10.331 0.000 ( +0.0% ) 1.69 1.64 +3.0% 1.725 1.723
2500.243 2.712 2.712 0.000 ( +0.0% ) 15.93 16.01 -0.5% 1.060 1.060
2500.244 486.016 486.016 0.000 ( +0.0% ) 1.15 1.15 +0.1% 1.700 1.694
2500.245 826.413 826.413 0.000 ( +0.0% ) 1.54 1.55 -0.6% 1.670 1.669
2500.901 1.819 1.819 0.000 ( +0.0% ) 45.93 47.49 -3.3% 1.448 1.447
2500.902 1.665 1.665 0.000 ( +0.0% ) 49.47 49.44 +0.1% 1.340 1.337
2500.911 14.345 14.345 0.000 ( +0.0% ) 7.66 7.69 -0.4% 1.086 1.090
2500.912 0.240 0.240 0.000 ( +0.0% ) 2.79 3.00 -7.1% 0.848 0.843
2500.913 0.110 0.110 0.000 ( +0.0% ) 2.66 2.69 -1.0% 0.851 0.842

unsigned int n_vtx = features.sv_features.size();

// Use actual sizes for dynamic axes version
n_cpf_ = std::max((unsigned int)1, n_cpf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part could be simplified using std::clamp, e.g.,

n_cpf_  = std::clamp(features.c_pf_features.size(), 1, 29);

@@ -76,6 +77,8 @@ UnifiedParticleTransformerAK4ONNXJetTagsProducer::UnifiedParticleTransformerAK4O
: src_(consumes<TagInfoCollection>(iConfig.getParameter<edm::InputTag>("src"))),
flav_names_(iConfig.getParameter<std::vector<std::string>>("flav_names")),
input_names_(iConfig.getParameter<std::vector<std::string>>("input_names")),
use_dynamic_axes_(iConfig.getParameter<edm::FileInPath>("model_path").fullPath().find("v2.onnx") !=
std::string::npos),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding a configurable parameter for use_dynamic_axes (with a default value to true), otherwise we are stuck with the naming of v2.onnx.

@@ -89,7 +92,7 @@ void UnifiedParticleTransformerAK4ONNXJetTagsProducer::fillDescriptions(edm::Con
desc.add<edm::InputTag>("src", edm::InputTag("pfUnifiedParticleTransformerAK4TagInfos"));
desc.add<std::vector<std::string>>(
"input_names", {"input_1", "input_2", "input_3", "input_4", "input_5", "input_6", "input_7", "input_8"});
desc.add<edm::FileInPath>("model_path", edm::FileInPath("RecoBTag/Combined/data/UParTAK4/PUPPI/V00/UParTAK4.onnx"));
desc.add<edm::FileInPath>("model_path", edm::FileInPath("RecoBTag/Combined/data/UParTAK4/PUPPI/V1/UParTAK4_v2.onnx"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I suggest to rename V1 to V01.

@hqucms
Copy link
Contributor

hqucms commented Jan 27, 2025

@AlexDeMoor It seems UParTAK4RegPtRawCorrNeutrino is broken (e.g., here). Could you please have a look?

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47173 was updated. @cmsbuild, @ftorrresd, @hqucms, @jfernan2, @mandrenguyen can you please check and sign again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants