Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS: 2016 MC list #156

Open
katilp opened this issue Feb 14, 2023 · 4 comments
Open

CMS: 2016 MC list #156

katilp opened this issue Feb 14, 2023 · 4 comments

Comments

@katilp
Copy link
Member

katilp commented Feb 14, 2023

From #124

See https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVRun2LegacyAnalysis (but note that the DAS links are wrong)
The "preVBF" production for Run2016B-F is separate and not included in the dataset lists below.

MiniAOD - RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17 19294 datasets (as of 14 Feb 2023)
dasgoclient -query="dataset=/*/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17*/MINIAODSIM" > miniaodsim_2016.txt
Tot 810 TB

NanoAOD - RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17 datasets 18779 (as of 14 Feb 2023)
dasgoclient -query="dataset=/*/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17*/NANOAODSIM" > nanoaodsim_2016.txt
Tot 27 TB

Remarks:

  • mini volume is larger than expected
  • nano is 3% of mini, are there some big mini that have not been processed to nano yet or is nano smaller than expected
  • 18784 datasets corresponding to mini found in nano (5 not?), 510 missing.
    • tot volume of the missing (from nano) datasets (in mini) is 15 TB

For previous volume estimates for the OD pledges, see https://cms-opendata-releaseguide.docs.cern.ch/resource_pledge/volume_estimates/ (or directly the slides)
These numbers were estimated from the N of events processed (multiplied by the expected increase for the ongoing production), for the release the volume is computed from the actual file sizes.

To compute volume, use e.g.

datasets=$(cat datasets)
totsize=0

for d in $datasets
do
  size=$(dasgoclient -query "dataset=$d" -json | jq '.[].dataset[].size ' | grep -v null)
  # If error "parse error: Invalid numeric literal at line 1, column 8", do  voms-proxy-init --rfc --voms cms
  # echo $d $size
  totsize=$(echo "$totsize + $size" | bc)
done
echo "Total size (TB):"
echo "$totsize / 1000000000000" | bc

⚠️ This takes ages, but the previously used DAS query format, e.g.
block dataset=/*/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17*/NANOAODSIM | sum(block.size)
does not work anymore...

To-do❗ Find a better way for counting the volume

@katilp
Copy link
Member Author

katilp commented Jun 19, 2023

Pdmv monitoring

Actual volume:

  • MiniAODv2: 843 TB, 20421 datasets (16 June 2023)
  • NanoAODv9: 28 TB, 20278 datasets (19 June 2023)

GrASP:

@katilp
Copy link
Member Author

katilp commented Oct 13, 2023

Update 13 Oct 2023:

MiniAOD list with size, alphabetic: https://cernbox.cern.ch/s/ZLKnnWldbXgk8sA
MiniAOD list sorted by size: https://cernbox.cern.ch/s/6vqZp6yxjtszzg0

Tot. volume: 979 TB
N datasets: 21879

To be checked:
datasets with H2ErratumFix replace those without it in the name.

With a quick check, for 34 fixed datasets, I find these as duplicates:

$ list=$(grep Erratum size_miniaodsim_2016_12_oct_2023.out | awk '{print $1}' | sed -e "s/_H2ErratumFix//" | awk -F "/" '{print $2}')
$ for l in $list; do grep $l size_miniaodsim_2016_12_oct_2023.out ; done
/DYJetsToTauTau_M-50_AtLeastOneEorMuDecay_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 1526158751468
/WminusJetsToMuNu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 7139558284395
/WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 998947456041
/WplusJetsToMuNu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 12129669176694
/WplusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 1588683076965

@jmhogan
Copy link
Contributor

jmhogan commented Oct 18, 2023

@katilp -- So now for these samples there are 3 sets, and I'm quite confident we should take the newest.

I think we should keep these 6:

  • /WminusJetsToMuNu_H2ErratumFix_PDFExt_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 26429867619960
  • /WplusJetsToMuNu_H2ErratumFix_PDFExt_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 36496536772097
  • /WplusJetsToTauNu_TauToMu_H2ErratumFix_PDFExt_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 6504233752727
  • /WminusJetsToTauNu_TauToMu_H2ErratumFix_PDFExt_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 4129922709116
  • /DYJetsToTauTau_M-50_AtLeastOneEorMuDecay_H2ErratumFix_PDF_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v3/MINIAODSIM 6388140394222
  • /DYJetsToMuMu_H2ErratumFix_PDFExt_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v3/MINIAODSIM 13592981776903

And drop these 12:

  • /WplusJetsToMuNu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 11876179310330
  • /WplusJetsToMuNu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 12129669176694
  • /WplusJetsToTauNu_TauToMu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 9843784630901
  • /WplusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 1588683076965
  • /WminusJetsToMuNu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 8787101599503
  • /WminusJetsToMuNu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 7139558284395
  • /WminusJetsToTauNu_TauToMu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 1351522573255
  • /WminusJetsToTauNu_TauToMu_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 998947456041
  • /DYJetsToTauTau_M-50_AtLeastOneEorMuDecay_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 6396125906590
  • /DYJetsToTauTau_M-50_AtLeastOneEorMuDecay_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 1526158751468
  • /DYJetsToMuMu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v2/MINIAODSIM 4385522296149
  • /DYJetsToMuMu_M-50_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM 2867399570176

@jmhogan
Copy link
Contributor

jmhogan commented Nov 22, 2023

Following up an email thread here -- we agreed to hold back these powhegMiNNLO samples for now, as things are not complete yet compared to the standard Madgraph samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants