From 7c5514a69f254f06d9eaec6c87b5dfe98e142ae8 Mon Sep 17 00:00:00 2001 From: kcormi Date: Wed, 7 Feb 2024 09:13:20 +0000 Subject: [PATCH] deploy: f5d8df95c838bbbe1dba9370c97e04d0ab19d14d --- 404.html | 6 + CernVM/index.html | 6 + full-documentation/index.html | 6 + index.html | 8 +- part2/bin-wise-stats/index.html | 6 + part2/physicsmodels/index.html | 6 + part2/settinguptheanalysis/index.html | 6 + part3/commonstatsmethods/index.html | 6 + part3/debugging/index.html | 6 + part3/nonstandard/index.html | 6 + part3/regularisation/index.html | 6 + part3/runningthetool/index.html | 6 + part3/simplifiedlikelihood/index.html | 6 + part3/validation/index.html | 6 + part4/usefullinks/index.html | 6 + part5/longexercise/index.html | 6 + part5/longexerciseanswers/index.html | 6 + part5/roofit/index.html | 6 + releaseNotes/index.html | 6 + search/search_index.json | 2 +- sitemap.xml.gz | Bin 127 -> 127 bytes tutorial2020/exercise/index.html | 6 + tutorial2023/parametric_exercise/index.html | 10 +- .../figures/correlationMatrix_pois.png | Bin 0 -> 13672 bytes .../figures/impacts_zh_75_150.png | Bin 0 -> 429860 bytes .../figures/migration_matrix_zh.pdf | Bin 0 -> 13803 bytes .../figures/migration_matrix_zh.png | Bin 0 -> 14550 bytes .../figures/scan_plot_r_zh_75_150.pdf | Bin 0 -> 15934 bytes .../figures/scan_plot_r_zh_75_150.png | Bin 0 -> 14476 bytes .../figures/scan_plot_r_zh_75_150_blinded.pdf | Bin 0 -> 15891 bytes .../figures/scan_plot_r_zh_75_150_blinded.png | Bin 0 -> 14224 bytes .../figures/simplifiedXS_VH_1_2.png | Bin 0 -> 45398 bytes tutorial2023_unfolding/figures/stxs_zh.pdf | Bin 0 -> 13850 bytes tutorial2023_unfolding/figures/stxs_zh.png | Bin 0 -> 8090 bytes .../unfolding_exercise/index.html | 515 ++++++++++++++++++ tutorials-part-2/index.html | 6 + 36 files changed, 651 insertions(+), 4 deletions(-) create mode 100644 tutorial2023_unfolding/figures/correlationMatrix_pois.png create mode 100644 tutorial2023_unfolding/figures/impacts_zh_75_150.png create mode 100644 tutorial2023_unfolding/figures/migration_matrix_zh.pdf create mode 100644 tutorial2023_unfolding/figures/migration_matrix_zh.png create mode 100644 tutorial2023_unfolding/figures/scan_plot_r_zh_75_150.pdf create mode 100644 tutorial2023_unfolding/figures/scan_plot_r_zh_75_150.png create mode 100644 tutorial2023_unfolding/figures/scan_plot_r_zh_75_150_blinded.pdf create mode 100644 tutorial2023_unfolding/figures/scan_plot_r_zh_75_150_blinded.png create mode 100644 tutorial2023_unfolding/figures/simplifiedXS_VH_1_2.png create mode 100644 tutorial2023_unfolding/figures/stxs_zh.pdf create mode 100644 tutorial2023_unfolding/figures/stxs_zh.png create mode 100644 tutorial2023_unfolding/unfolding_exercise/index.html diff --git a/404.html b/404.html index 12b12d7a24b..91f942ba4de 100644 --- a/404.html +++ b/404.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/CernVM/index.html b/CernVM/index.html index 18352b9949f..524e615315d 100644 --- a/CernVM/index.html +++ b/CernVM/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/full-documentation/index.html b/full-documentation/index.html index 0faa80ceedb..0b3e89c9dda 100644 --- a/full-documentation/index.html +++ b/full-documentation/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/index.html b/index.html index 54ae1d1334a..57c1e3f0e44 100644 --- a/index.html +++ b/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + @@ -527,5 +533,5 @@ diff --git a/part2/bin-wise-stats/index.html b/part2/bin-wise-stats/index.html index e017f0e6516..7a78de68446 100644 --- a/part2/bin-wise-stats/index.html +++ b/part2/bin-wise-stats/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part2/physicsmodels/index.html b/part2/physicsmodels/index.html index 90f6b66054a..395be1904f5 100644 --- a/part2/physicsmodels/index.html +++ b/part2/physicsmodels/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part2/settinguptheanalysis/index.html b/part2/settinguptheanalysis/index.html index d9f458f4aec..e1847309fc4 100644 --- a/part2/settinguptheanalysis/index.html +++ b/part2/settinguptheanalysis/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/commonstatsmethods/index.html b/part3/commonstatsmethods/index.html index 96c74b259d5..b956dbbd91a 100644 --- a/part3/commonstatsmethods/index.html +++ b/part3/commonstatsmethods/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/debugging/index.html b/part3/debugging/index.html index 4101b9f7ddd..e3e61f44e74 100644 --- a/part3/debugging/index.html +++ b/part3/debugging/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/nonstandard/index.html b/part3/nonstandard/index.html index 0b7d7710429..413d30facf2 100644 --- a/part3/nonstandard/index.html +++ b/part3/nonstandard/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/regularisation/index.html b/part3/regularisation/index.html index 49d38da27a4..3575b7276ad 100644 --- a/part3/regularisation/index.html +++ b/part3/regularisation/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/runningthetool/index.html b/part3/runningthetool/index.html index 65af6e905fa..4aa430e54d1 100644 --- a/part3/runningthetool/index.html +++ b/part3/runningthetool/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/simplifiedlikelihood/index.html b/part3/simplifiedlikelihood/index.html index 0ec26cc1c57..53da5797bdc 100644 --- a/part3/simplifiedlikelihood/index.html +++ b/part3/simplifiedlikelihood/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part3/validation/index.html b/part3/validation/index.html index 4d348371ac8..ae3ff620fd1 100644 --- a/part3/validation/index.html +++ b/part3/validation/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part4/usefullinks/index.html b/part4/usefullinks/index.html index 45d43522e6f..8c14b35a4d2 100644 --- a/part4/usefullinks/index.html +++ b/part4/usefullinks/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part5/longexercise/index.html b/part5/longexercise/index.html index eb05f724e43..58570a0caa3 100644 --- a/part5/longexercise/index.html +++ b/part5/longexercise/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part5/longexerciseanswers/index.html b/part5/longexerciseanswers/index.html index 1bd98dbfa18..0dd43bf8de2 100644 --- a/part5/longexerciseanswers/index.html +++ b/part5/longexerciseanswers/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/part5/roofit/index.html b/part5/roofit/index.html index 9aaa0769dee..11b1ac52d56 100644 --- a/part5/roofit/index.html +++ b/part5/roofit/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/releaseNotes/index.html b/releaseNotes/index.html index ae72c80dd6e..6c083ecb33d 100644 --- a/releaseNotes/index.html +++ b/releaseNotes/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + + diff --git a/search/search_index.json b/search/search_index.json index 459ef30e8cd..4afff95f878 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Introduction These pages document the RooStats / RooFit - based software tool used for statistical analysis within the CMS experiment - Combine . Note that while this tool was originally developed in the Higgs PAG , its usage is now widespread within CMS. Combine provides a command-line interface to many different statistical techniques, available inside RooFit/RooStats, that are used widely inside CMS. The package exists on GitHub under https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit For more information about Git, GitHub and its usage in CMS, see http://cms-sw.github.io/cmssw/faq.html The code can be checked out from GitHub and compiled on top of a CMSSW release that includes a recent RooFit/RooStats Installation instructions Installation instructions and recommended versions can be found below. Since v9.0.0, the versioning follows the semantic versioning 2.0.0 standard . Earlier versions are not guaranteed to follow the standard. Within CMSSW (recommended for CMS users) The instructions below are for installation within a CMSSW environment. For end users that do not need to commit or do any development, the following recipes should be sufficient. To choose a release version, you can find the latest releases on github under https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/releases Combine v9 - recommended version The nominal installation method is inside CMSSW. The current release targets the CMSSW 11_3_X series because this release has both python2 and python3 ROOT bindings, allowing a more gradual migration of user code to python3. Combine is fully python3-compatible and, with some adaptations, can also work in 12_X releases. CMSSW 11_3_X runs on slc7, which can be setup using apptainer ( see detailed instructions ): cmssw-el7 cmsrel CMSSW_11_3_4 cd CMSSW_11_3_4/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag is v9.1.0 : see release notes cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v9.1.0 scramv1 b clean; scramv1 b # always make a clean build Combine v8: CMSSW_10_2_X release series Setting up the environment (once): cmssw-el7 cmsrel CMSSW_10_2_13 cd CMSSW_10_2_13/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag is v8.2.0 : see release notes cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v8.2.0 scramv1 b clean; scramv1 b # always make a clean build SLC6/CC7 release CMSSW_8_1_X Setting up OS using apptainer ( see detailed instructions ): # For CC7: cmssw-el7 # For SLC6: cmssw-el6 cmsrel CMSSW_8_1_0 cd CMSSW_8_1_0/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag for CMSSW_8_1_X is v7.0.13 : cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v7.0.13 scramv1 b clean; scramv1 b # always make a clean build Oustide of CMSSW (recommended for non-CMS users) Pre-compiled versions of the tool are available as containers from the CMS cloud pages . These containers can be downloaded and run using Docker . If you have docker running you can pull and run the latest version using, docker run --name combine -it gitlab-registry.cern.ch/cms-cloud/combine-standalone:latest You will now have the compiled combine binary available as well as the complete package of tool. The container can be re-started using docker start -i combine . Standalone compilation The standalone version can be easily compiled using cvmfs as it relies on dependencies that are already installed at /cvmfs/cms.cern.ch/ . Access to /cvmfs/cms.cern.ch/ can be obtained from lxplus machines or via CernVM . See CernVM for further details on the latter. In case you do not want to use the cvmfs area, you will need to adapt the locations of the dependencies listed in both the Makefile and env_standalone.sh files. git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit/ # git checkout . env_standalone.sh make -j 4 You will need to source env_standalone.sh each time you want to use the package, or add it to your login environment. Standalone compilation with LCG For compilation outside of CMSSW, for example to use ROOT versions not yet available in CMSSW, one can compile against LCG releases. The current default is to compile with LCG_102, which contains ROOT 6.26: git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit source env_lcg.sh make LCG=1 -j 8 To change the LCG version, edit env_lcg.sh . The resulting binaries can be moved for use in a batch job if the following files are included in the job tarball: tar -zcf Combine_LCG_env.tar.gz build interface src/classes.h --exclude=obj Standalone compilation with conda This recipe will work both for linux and MacOS git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit conda install --name base mamba # faster conda mamba env create -f conda_env.yml conda activate combine source set_conda_env_vars.sh # Need to reactivate conda deactivate conda activate combine make CONDA=1 -j 8 Using Combine from then on should only require sourcing the conda environment conda activate combine Note: on OS X, Combine can only accept workspaces, so run text2workspace.py first. This is due to an issue with child processes and LD_LIBRARY_PATH (see note in Makefile) Standalone compilation with CernVM combine , either standalone or not, can be compiled via CVMFS using access to /cvmfs/cms.cern.ch/ obtained using a virtual machine - CernVM . To use CernVM You should have access to CERN IT resources. If you are a CERN user you can use your account, otherwise you can request a lightweight account. If you have a CERN user account, we strongly suggest you simply run one of the other standalone installations, which are simpler and faster than using a VM. You should have a working VM on your local machine, compatible with CernVM, such as VirtualBox . All the required software can be downloaded here . At least 2GB of disk space should be reserved on the virtual machine for combine to work properly and the machine must be contextualized to add the `CMS`` group to CVMFS. A minimal working setup is described below. Download the CernVM-launcher for your operating system, following the instructions available [ here ] for your operating system (https://cernvm.readthedocs.io/en/stable/cpt-launch.html#installation Prepare a CMS context. You can use the CMS open data one already available on gitHub: wget https://raw.githubusercontent.com/cernvm/public-contexts/master/cms-opendata-2011.context) Launch the virtual machine cernvm-launch create --name combine --cpus 2 cms-opendata-2011.context In the VM, proceed with an installation of combine Installation through CernVM is maintained on a best-effort basis and these instructions may not be up to date. What has changed between tags? You can generate a diff of any two tags (eg for v9.1.0 and v9.0.0 ) by using the following url: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/compare/v9.0.0...v9.1.0 Replace the tag names in the url to any tags you would like to compare. For developers We use the Fork and Pull model for development: each user creates a copy of the repository on GitHub, commits their requests there, and then sends pull requests for the administrators to merge. Prerequisites Register on GitHub, as needed anyway for CMSSW development: http://cms-sw.github.io/cmssw/faq.html Register your SSH key on GitHub: https://help.github.com/articles/generating-ssh-keys Fork the repository to create your copy of it: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/fork (more documentation at https://help.github.com/articles/fork-a-repo ) You will now be able to browse your fork of the repository from https://github.com/your-github-user-name/HiggsAnalysis-CombinedLimit We strongly encourage you to contribute any developments you make back to the main repository. See contributing.md for details about contributing. CombineHarvester/CombineTools CombineTools is an additional tool for submitting Combine jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core CombineHarvester/CombineTools subpackage, speeding up the checkout and compile times: git clone via ssh: bash <(curl -s https://raw.githubusercontent.com/cms-analysis/CombineHarvester/main/CombineTools/scripts/sparse-checkout-ssh.sh) git clone via https: bash <(curl -s https://raw.githubusercontent.com/cms-analysis/CombineHarvester/main/CombineTools/scripts/sparse-checkout-https.sh) make sure to run scram to compile the CombineTools package. See the CombineHarvester documentation pages for more details on using this tool and additional features available in the full package.","title":"Home"},{"location":"#introduction","text":"These pages document the RooStats / RooFit - based software tool used for statistical analysis within the CMS experiment - Combine . Note that while this tool was originally developed in the Higgs PAG , its usage is now widespread within CMS. Combine provides a command-line interface to many different statistical techniques, available inside RooFit/RooStats, that are used widely inside CMS. The package exists on GitHub under https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit For more information about Git, GitHub and its usage in CMS, see http://cms-sw.github.io/cmssw/faq.html The code can be checked out from GitHub and compiled on top of a CMSSW release that includes a recent RooFit/RooStats","title":"Introduction"},{"location":"#installation-instructions","text":"Installation instructions and recommended versions can be found below. Since v9.0.0, the versioning follows the semantic versioning 2.0.0 standard . Earlier versions are not guaranteed to follow the standard.","title":"Installation instructions"},{"location":"#within-cmssw-recommended-for-cms-users","text":"The instructions below are for installation within a CMSSW environment. For end users that do not need to commit or do any development, the following recipes should be sufficient. To choose a release version, you can find the latest releases on github under https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/releases","title":"Within CMSSW (recommended for CMS users)"},{"location":"#combine-v9-recommended-version","text":"The nominal installation method is inside CMSSW. The current release targets the CMSSW 11_3_X series because this release has both python2 and python3 ROOT bindings, allowing a more gradual migration of user code to python3. Combine is fully python3-compatible and, with some adaptations, can also work in 12_X releases. CMSSW 11_3_X runs on slc7, which can be setup using apptainer ( see detailed instructions ): cmssw-el7 cmsrel CMSSW_11_3_4 cd CMSSW_11_3_4/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag is v9.1.0 : see release notes cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v9.1.0 scramv1 b clean; scramv1 b # always make a clean build","title":"Combine v9 - recommended version"},{"location":"#combine-v8-cmssw_10_2_x-release-series","text":"Setting up the environment (once): cmssw-el7 cmsrel CMSSW_10_2_13 cd CMSSW_10_2_13/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag is v8.2.0 : see release notes cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v8.2.0 scramv1 b clean; scramv1 b # always make a clean build","title":"Combine v8: CMSSW_10_2_X release series"},{"location":"#slc6cc7-release-cmssw_8_1_x","text":"Setting up OS using apptainer ( see detailed instructions ): # For CC7: cmssw-el7 # For SLC6: cmssw-el6 cmsrel CMSSW_8_1_0 cd CMSSW_8_1_0/src cmsenv git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit Update to a recommended tag - currently the recommended tag for CMSSW_8_1_X is v7.0.13 : cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit git fetch origin git checkout v7.0.13 scramv1 b clean; scramv1 b # always make a clean build","title":"SLC6/CC7 release CMSSW_8_1_X"},{"location":"#oustide-of-cmssw-recommended-for-non-cms-users","text":"Pre-compiled versions of the tool are available as containers from the CMS cloud pages . These containers can be downloaded and run using Docker . If you have docker running you can pull and run the latest version using, docker run --name combine -it gitlab-registry.cern.ch/cms-cloud/combine-standalone:latest You will now have the compiled combine binary available as well as the complete package of tool. The container can be re-started using docker start -i combine .","title":"Oustide of CMSSW (recommended for non-CMS users)"},{"location":"#standalone-compilation","text":"The standalone version can be easily compiled using cvmfs as it relies on dependencies that are already installed at /cvmfs/cms.cern.ch/ . Access to /cvmfs/cms.cern.ch/ can be obtained from lxplus machines or via CernVM . See CernVM for further details on the latter. In case you do not want to use the cvmfs area, you will need to adapt the locations of the dependencies listed in both the Makefile and env_standalone.sh files. git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit/ # git checkout . env_standalone.sh make -j 4 You will need to source env_standalone.sh each time you want to use the package, or add it to your login environment.","title":"Standalone compilation"},{"location":"#standalone-compilation-with-lcg","text":"For compilation outside of CMSSW, for example to use ROOT versions not yet available in CMSSW, one can compile against LCG releases. The current default is to compile with LCG_102, which contains ROOT 6.26: git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit source env_lcg.sh make LCG=1 -j 8 To change the LCG version, edit env_lcg.sh . The resulting binaries can be moved for use in a batch job if the following files are included in the job tarball: tar -zcf Combine_LCG_env.tar.gz build interface src/classes.h --exclude=obj","title":"Standalone compilation with LCG"},{"location":"#standalone-compilation-with-conda","text":"This recipe will work both for linux and MacOS git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit cd HiggsAnalysis/CombinedLimit conda install --name base mamba # faster conda mamba env create -f conda_env.yml conda activate combine source set_conda_env_vars.sh # Need to reactivate conda deactivate conda activate combine make CONDA=1 -j 8 Using Combine from then on should only require sourcing the conda environment conda activate combine Note: on OS X, Combine can only accept workspaces, so run text2workspace.py first. This is due to an issue with child processes and LD_LIBRARY_PATH (see note in Makefile)","title":"Standalone compilation with conda"},{"location":"#standalone-compilation-with-cernvm","text":"combine , either standalone or not, can be compiled via CVMFS using access to /cvmfs/cms.cern.ch/ obtained using a virtual machine - CernVM . To use CernVM You should have access to CERN IT resources. If you are a CERN user you can use your account, otherwise you can request a lightweight account. If you have a CERN user account, we strongly suggest you simply run one of the other standalone installations, which are simpler and faster than using a VM. You should have a working VM on your local machine, compatible with CernVM, such as VirtualBox . All the required software can be downloaded here . At least 2GB of disk space should be reserved on the virtual machine for combine to work properly and the machine must be contextualized to add the `CMS`` group to CVMFS. A minimal working setup is described below. Download the CernVM-launcher for your operating system, following the instructions available [ here ] for your operating system (https://cernvm.readthedocs.io/en/stable/cpt-launch.html#installation Prepare a CMS context. You can use the CMS open data one already available on gitHub: wget https://raw.githubusercontent.com/cernvm/public-contexts/master/cms-opendata-2011.context) Launch the virtual machine cernvm-launch create --name combine --cpus 2 cms-opendata-2011.context In the VM, proceed with an installation of combine Installation through CernVM is maintained on a best-effort basis and these instructions may not be up to date.","title":"Standalone compilation with CernVM"},{"location":"#what-has-changed-between-tags","text":"You can generate a diff of any two tags (eg for v9.1.0 and v9.0.0 ) by using the following url: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/compare/v9.0.0...v9.1.0 Replace the tag names in the url to any tags you would like to compare.","title":"What has changed between tags?"},{"location":"#for-developers","text":"We use the Fork and Pull model for development: each user creates a copy of the repository on GitHub, commits their requests there, and then sends pull requests for the administrators to merge. Prerequisites Register on GitHub, as needed anyway for CMSSW development: http://cms-sw.github.io/cmssw/faq.html Register your SSH key on GitHub: https://help.github.com/articles/generating-ssh-keys Fork the repository to create your copy of it: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/fork (more documentation at https://help.github.com/articles/fork-a-repo ) You will now be able to browse your fork of the repository from https://github.com/your-github-user-name/HiggsAnalysis-CombinedLimit We strongly encourage you to contribute any developments you make back to the main repository. See contributing.md for details about contributing.","title":"For developers"},{"location":"#combineharvestercombinetools","text":"CombineTools is an additional tool for submitting Combine jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core CombineHarvester/CombineTools subpackage, speeding up the checkout and compile times: git clone via ssh: bash <(curl -s https://raw.githubusercontent.com/cms-analysis/CombineHarvester/main/CombineTools/scripts/sparse-checkout-ssh.sh) git clone via https: bash <(curl -s https://raw.githubusercontent.com/cms-analysis/CombineHarvester/main/CombineTools/scripts/sparse-checkout-https.sh) make sure to run scram to compile the CombineTools package. See the CombineHarvester documentation pages for more details on using this tool and additional features available in the full package.","title":"CombineHarvester/CombineTools"},{"location":"CernVM/","text":"Standalone use inside CernVM Standalone by adding the CMS group to the CVMFS Configuration. A minimal CernVM working context setup can be found in the CernVM Marketplace under Experimental/HiggsCombine or at https://cernvm-online.cern.ch/context/view/9ee5960ce4b143f5829e72bbbb26d382 . At least 2GB of disk space should be reserved on the virtual machine for Combine to work properly. Available machines for standalone combine The standalone version can be easily compiled via CVMFS as it relies on dependencies which are already installed at /cvmfs/cms.cern.ch/ . Access to /cvmfs/cms.cern.ch/ can be obtained from lxplus machines or via CernVM . The only requirement will be to add the CMS group to the CVMFS configuration as shown in the picture At least 2GB of disk space should be reserved on the virtual machine for combine to work properly. A minimal CernVM working context setup can be found in the CernVM Marketplace under Experimental/HiggsCombine . To use this predefined context, first locally launch the CernVM (eg you can use the .ova with VirtualBox, by downloading from here and launching the downloaded file. You can click on \"pair an instance of CernVM\" from the cernvm-online dashboard, which displays a PIN. In the VirtualBox terminal, pair the virtual machine with this PIN code (enter in the terminal using #PIN eg #123456 . After this, you will be asked again for username (use user ) and then a password (use hcomb ). In case you do not want to use the cvmfs area, you will need to adapt the location of the dependencies listed in both the Makefile and env_standalone.sh files.","title":"CernVM"},{"location":"CernVM/#standalone-use-inside-cernvm","text":"Standalone by adding the CMS group to the CVMFS Configuration. A minimal CernVM working context setup can be found in the CernVM Marketplace under Experimental/HiggsCombine or at https://cernvm-online.cern.ch/context/view/9ee5960ce4b143f5829e72bbbb26d382 . At least 2GB of disk space should be reserved on the virtual machine for Combine to work properly.","title":"Standalone use inside CernVM"},{"location":"CernVM/#available-machines-for-standalone-combine","text":"The standalone version can be easily compiled via CVMFS as it relies on dependencies which are already installed at /cvmfs/cms.cern.ch/ . Access to /cvmfs/cms.cern.ch/ can be obtained from lxplus machines or via CernVM . The only requirement will be to add the CMS group to the CVMFS configuration as shown in the picture At least 2GB of disk space should be reserved on the virtual machine for combine to work properly. A minimal CernVM working context setup can be found in the CernVM Marketplace under Experimental/HiggsCombine . To use this predefined context, first locally launch the CernVM (eg you can use the .ova with VirtualBox, by downloading from here and launching the downloaded file. You can click on \"pair an instance of CernVM\" from the cernvm-online dashboard, which displays a PIN. In the VirtualBox terminal, pair the virtual machine with this PIN code (enter in the terminal using #PIN eg #123456 . After this, you will be asked again for username (use user ) and then a password (use hcomb ). In case you do not want to use the cvmfs area, you will need to adapt the location of the dependencies listed in both the Makefile and env_standalone.sh files.","title":"Available machines for standalone combine"},{"location":"full-documentation/","text":"Full documentation Common options Choice of the prior for Bayesian methods For methods based on Bayesian inference (BayesianSimple, MarkovChainMC) you can specify a prior for the signal strength. This is controlled through option --prior , and the values that are currently allowed are: uniform : flat prior 1/sqrt(r) : prior proportional to 1/sqrt(r) , where r is the signal strength; for a counting experiment with a single channel and no systematics, this is Jeffrey's prior. Note that you might have to put the 1/sqrt(r) within quotes because for some shells the braces have a special meaning. If no prior is specified, a flat prior is assumed. Algorithm-specific options #ProfiLe ProfileLikelihood algorithm The ProfileLikelihood algorithm has only two options besides the common ones: the choice of the minimizer algorith (Minuit2 or Minuit), and the choice of the tolerance. If you see that the limit fails with one of the minimizer, try the other. Sometimes this problem happens due to numerical instabilities even if the model itself looks perfectly well behaved. If neither of the two succeeds, another possible way of circumventing the instability is sometimes to change the range of r by a small amount, or to change slightly one number in the model (e.g. in one simple counting experiment a test was failing when assuming \u0394B/B = 0.3, but was succeeding for \u0394B/B = 0.301 and \u0394B/B = 0.299, giving almost the same result) In case you experience numerical instabilities, e.g. failures or bogus results, you could be able to work around the problem by performing the minimization multiple times and requiring the results to be consistent. This can be done using the options below. maxTries : how many times the program should attempt to find a minimum at most (plausible range: 20-100) tries : how many times the program should succeed in computing a limit (plausible range: 5-10) maxOutlierFraction : maximum fraction of bad results within the tries ones (plausible: 0.15-0.30; default 0.25) maxOutliers : maximum number of bogus results before giving up for this point (plausible values: 2-5; default 3) preFit : don't try to profile the likelihood if Minuit already gave errors in finding the minimum. (suggested) BayesianSimple algorithm The BayesianSimple algorithm doesn't have parameters besides the ones specified above under the \"Common options\" section. #MarkoV MarkovChainMC algorithm This algorithm has many configurable parameters, and you're encouraged to experiment with those because the default configuration might not be the best for you (or might not even work for you at all) *Iterations, burn-in, tries* Three parameters control how the MCMC integration is performed: the number of tries (option --tries ): the algorithm will run multiple times with different ransom seeds and report as result the truncated mean and rms of the different results. The default value is 10, which should be ok for a quick computation, but for something more accurate you might want to increase this number even up to ~200. the number of iterations (option -i ) determines how many points are proposed to fill a single Markov Chain. The default value is 10k, and a plausible range is between 5k (for quick checks) and 20-30k for lengthy calculations. Usually beyond 30k you get a better tradeoff in time vs accuracy by increasing the number of chains (option tries ) the number of burn-in steps (option -b ) is the number of points that are removed from the beginning of the chain before using it to compute the limit. IThe default is 200. If your chain is very long, you might want to try increase this a bit (e.g. to some hundreds). Instead going below 50 is probably dangerous. Proposal The option --proposal controls the way new points are proposed to fill in the MC chain. uniform : pick points at random. This works well if you have very few nuisance parameters (or none at all), but normally fails if you have many. gaus : Use a product of independent gaussians one for each nuisance parameter; the sigma of the gaussian for each variable is 1/5 of the range of the variable (this can be controlled using the parameter propHelperWidthRangeDivisor ). This proposal appears to work well for a reasonable number of nuisances (up to ~15), provided that the range of the nuisance parameters is reasonable, like \u00b15\u03c3. It does not work without systematics. =ortho ( default ): (previously known as test ) This proposalis similar to the multi-gaussian proposal but at every step only a single coordinate of the point is varied, so that the acceptance of the chain is high even for a large number of nuisances (i.e. more than 20). fit : Run a fit and use the uncertainty matrix from HESSE to construct a proposal (or the one from MINOS if the option --runMinos is specified). This sometimes work fine, but sometimes gives biased results, so we don't recommend it in general. If you believe there's something going wrong, e.g. if your chain remains stuck after accepting only a few events, the option debugProposal can be used to have a printout of the first N proposed points to see what's going on (e.g. if you have some region of the phase space with probability zero, the gaus and fit proposal can get stuck there forever) #HybridNew HybridNew algorithm Type of limit By default, HybridNew computes hybrid Bayesian-frequentist limits. If one specifies the command line option --freqentist then it will instead compute the fully frequentist limits. Rule (option --rule ) The rule defines how the distribution of the test statistics is used to compute a limit. When using the CLs+b rule the limit to the value of the signal strength for which 95% of the pseudo-experiments give a result more signal-like than the current one, P(x < xobs|r*s+b) = 1 - CL . For the more conservative CLs rule, the limit is set by the condition P(x < xobs|r*s+b) /P(x < xobs|b) = 1 - CL . The default rule is CLs . Test statistics (option --testStat ) The test statistics is the measure of how signal-like or background-like is an observation. The following test statistics are provided: Simple Likelihood Ratio (option value LEP or SLR ): The LEP-like ratio of likelihoods ( L(x|r*s+b,\u03b8) / L(x|b, \u03b8) ) where numerator and denominator are evaluated for the reference values of the nuisance parameters \u03b8. Ratio of Profiled Likelihoods (option value TEV or ROPL ): The Tevatron-like ratio of profiled likelihoods, in which before computing each of the likelihoods is maximised as function of the nuisance parameters ( max\u03b8 L(x|r*s+b,\u03b8) / max\u03b8 L(x|b, \u03b8) ). Profile Likelihood modified for upper limits (option value LHC or MPL ): The LHC-like (or Atlas-like) profiled likelihood in which the maximization of the likelihood is done also in the signal strength ( max\u03b8 L(x|r*s+b,\u03b8) / maxr', \u03b8 L(x|r'*s+b,\u03b8) ), with the constraints 0 \u2264 r' \u2264 r where the upper bound is applied to force the method to always give an upper limit and not a two-sided interval. Profile Likelihood (not modified for upper limits) (option value PL ): The traditional profiled likelihood in which the maximization of the likelihood is done also in the signal strength ( max\u03b8L(x|r*s+b,\u03b8) / maxr', \u03b8L(x|x|r'*s+b,\u03b8) ), with just the physical constraints 0 \u2264 r' This test statistics can give two-sided limits, as it starts decreasing when the number of observed events is above the expectations from the signal+background hypothesis. The default value when computing hybrid Bayesian-frequentist limits is LEP . The default value when computing frequentist limits is LHC . Accuracy The search for the limit is performed using an adaptive algorithm, terminating when the estimate of the limit value is below some limit or when the precision cannot be futher improved with the specified options. The options controlling this behaviour are: rAbsAcc , rRelAcc : define the accuracy on the limit at which the search stops. The default values are 0.1 and 0.05 respectively, meaning that the search is stopped when \u0394r < 0.1 or \u0394r/r < 0.05. clsAcc : this determines the absolute accuracy up to which the CLs values are computed when searching for the limit. The default is 0.5%. Raising the accuracy above this value will increase significantly the time to run the algorithm, as you need N2 more toys to improve the accuracy by a factor N; you can consider enlarging this value if you're computing limits with a larger CL (e.g. 90% or 68%) Note that if you're using the CLs+b rule then this parameter will control the uncertainty on CLs+b , not on CLs . T or toysH : controls the minimum number of toys that are generated for each point. The default value of 500 should be ok when computing the limit with 90-95% CL. You can decrease this number if you're computing limits at 68% CL, or increase it if you're using 99% CL. Computing significances When computing the significances, there is no adaptive generation. You can control the number of toys with option T or toysH= and the option iterations (shortcut -i , default 1): the default of (1 iteration)\u00d7(500 toys) is not enough to probe a significances above ~2. We suggest that you uncrease the number of iterations instead of the number of toys, since the increase in time is linear with the iterations but non-linear with the toys. In order to compute the significance in multiple jobs, proceed as follows: Run N different jobs with the same inputs but different random seed (option -s ), specifying the additional option --saveHybridResult . Use hadd to merge the output root files in a single one. The program will complain with messages like Cannot merge object type, name: HybridCalculator _result which can be safely ignored. Compute the significance from the merged file running again but with options --readHybridResults= and --toysFile= Caveat: there is no check within the program that you're using consistent inputs for the different jobs. Simple hypotesis testing : Sometimes you just want to compute the p-values of the background-only hypotesis and of the signal plus background hypotesis for a fixed value of the signal strength. This can be done specifying the option singlePoint which will set the signal strength to that value and run the hypothesis test. It will generate toys until the required accuracy is met (see above for parameter clsAcc ). You can turn off adaptive generation setting clsAcc to zero, and then it will generate the toys once (or multiple times if you set option iterations to a value larger than 1). Just like for significance, you can run multiple times with different seeds and options --saveHybridResult , combine the output files with hadd and then compute the final result with --readHybridResult --toysFile=merged.root Performance issues The current hybrid code requires a lot of cpu resources. You can speed up the processing by using multiple cores (option fork , default value is 1). Note that even with fork set to 1, toy generation is done in a separate thread to avoid memory leaks. If you want to run in a single thread, e.g. to be able to read the debug output during generation, you should set the option to zero. If running multi-threaded on the cern batch cluster, you should declare it to the bsub command when submitting the jobs: e.g. for a job that uses four cores you should use bsub -n 4 -R \"'span[hosts=1]'\" ... #HybridNewGrid HybridNew algorithm usage for complex models or expected limits: grids If your model is complex, or you need to know the limit accurately, or you want expected limits, then running the computation in a single job might not be feasible. The alternative approach is to compute a grid of distributions of the test statistics for various values of the signal strength, a task that is easy to parallelize, and then use the that grid to compute the observed limit (and also the expected ones). This requires you to have some knowledge of where the limit should be, which you can gain e.g. from the ProfileLikelihood method Creating the grid: manual way The procedure to do this manually would be like the procedure for significances or simple hypothesis testing described previously: for each value r_i of the cross section, you write out one file with the distribution of the test statistics using combine card.txt -M HybridNew [--freq] [other options] -s seed_i --singlePoint r_i --saveToys --saveHybridResult and then you can merge all the output files for the different r_i with hadd . The [other options] should include --clsAcc 0 to switch off adaptive sampling, and you can tune the CPU time by working on the parameters -T and --iterations . It is important that you use different seed_i values for each point; if you don't care about exact reproducibility, you can just use --seed -1 and the code will take care of randomizing itself properly. Creating the grid: automated way, using CRAB Please note that the following is intended for use with crab2. For producing the grid with crab3, please see the instructions here Once you have a sense of the time needed for each toy, and of the range to consider, you can use the script makeGridUsingCrab.py to run the toys to create the grid in parallel either on LXBATCH or on regular T2s (or anything else that CRAB can digest). The procedure is very simple: makeGridUsingCrab.py card.txt minimum maximum -n points [other options] -o name This will create a crab cfg file name.cfg and a script name.sh , and possibly a binary workspace name.workspace.root . You can then just create and submit the jobs from that cfg file, and merge the output rootfiles with hadd=(note: =hadd will complain with messages like Cannot merge object type, name: HybridCalculator _result which can be safely ignored). The other options, that you can get executing the program with --help are: -T : same as in combine -r : use a random seed in each job (suggested) -I n : run only on 1/n of the points in each job (suggested if you want to have many points) -t , -j : choose the total number of toys and of jobs (can change later from the crab cfg file) --lsf , --queue ... : use lxbatch with the specific queue (can change later from the crab cfg file) Note that you can merge also the output of multiple crab submissions, if you have used random seeds. Using the grid for observed limits combine mydatcard.txt -M HybridNew [--freq] --grid=mygrid.root All other parameters controlling toys, accuracy and so on are meaningless in this context. Note that it might still take a while if you have many points and the test statistics is slow to evaluate. Add the option --saveGrid to save the value of the observed CLs at each grid point in the output tree. Using the grid for expected limits combine mydatcard.txt -M HybridNew [--freq] --grid=mygrid.root --expectedFromGrid 0.5 0.5 gives you the median. use 0.16/0.84 to get the endpoints of 68% interval, 0.025/0.975 to get the 95% one). Add the option --saveGrid to save the value of the expected quantile CLs at each grid point in the output tree. Plotting the test-statistics distributions The distribution of the test-statistic under the signal plus background and background only hypotheses can be plotted at each value of the grid using the following; python test/plotTestStatCLs.py --input mygrid.root --poi r --val all --mass MASS The output root file will contain a plot for each point found in the grid. FeldmanCousins The F-C method is used to compute an interval with the specified confidence level. If you run the model without special options, it will report the upper limit to the signal strength. If you want instead the lower end of the interval, just run it with option lowerLimit . The algorithm will search for a limit with an iterative procedure until the specified absolute or relative accuracy is met, as controlled by the parameters rAbsAcc , =rRelAcc . The default values are 0.1 and 0.05 respectively, meaning that the search is stopped when \u0394r < 0.1 or \u0394r/r < 0.05. The number of toys generated is adaptive. You can increase it by a factor using option toysFactor a value of 2 or 4 is suggested if you want to compute the limit with high accuracy. Running under CRAB The instructions below are for use with crab2 . For instructions on how to use the grid for toy based studies or complicated model scans under crab3 , follow the instructions given here . Running many toy MC for the limit calculation may be conveniently split among the different available GRID resources using CRAB. Examples of how to run on the GRID via CRAB are provided in the files: [[https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/master/test/combine_crab.sh][combine_crab.sh]] [[https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/master/test/combine_crab.cfg][combine_crab.cfg]] Preparing the ROOT workspace The first thing to do is to convert the datacards and possibly the shape model into a ROOT workspace. This model will be shipped to Worker Nodes for GRID processing. This is done via the utility script text2workspace.py . For instance: ../scripts/text2workspace.py ../data/benchmarks/simple-counting/counting-B0-Obs0-StatOnly.txt -b -o model.root Shell script for GRID Worker Nodes CRAB is designed mainly to provide automatic cmsRun job splitting providing the number of jobs one wants to run, and the number of 'events' in total one wants to process.The total number of toy MC we want to run. The maximum number of events is passed to the application to be executed via the variable $MaxEvents . In our case, we will use for convenience $MaxEvents as the number of toy MC to run per job. The script [[https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/master/test/combine_crab.sh][combine_crab.sh]] runs the combiner code with the proper options, and prepares the output to be retrieved after the run completion on the Worker Nodes. It takes as argument the job indes ( $1 ), which we use as random seed. The main elements there are running the combiner and packing the output for final retrieval: echo \"job number: seed # i with n toys\" ./combine model.root -t n toys\" ./combine model.root -t n -sn -s i with n toys\" ./combine model.root -t (VHbb) final state.

    +

    The measurement is performed within the Simplified Template Cross Section (STXS) framework, which provides the prediction in the bins of generator-level quantities p_{T}(V) and number of additional jets. The maximum likelihood based unfolding is performed to measure the cross section in the generator-level bins defined by STXS scheme. At the detector-level we define appropriate categories to match the STXS bins as closely as possible so that there is a good correspondence between the detector-level observable and the underlying generator-level quantity we are interested in.

    +

    +

    Note that for this STXS measurement, as well as measuring the cross-section as a function of the p_{T} of the vector boson, the measurement includes some information on the number of additional jets and is performed over multiple different production modes, for different production processes. However, it is common to focus on a single distribution (e.g. p_{T}) for a signle process, (e.g. t\bar{t}).

    +

    In this tutorial we will focus on the ZH production, with the Z boson decaying to charged leptons, and Higgs boson reconstructed with the resolved b\bar{b} pair. We will also use only a part of the Run 2 categories, we will not achieve the same sensitivity as the full analysis. Note that ggZH and ZH production modes are combined in the fit, since it is not possible to resolve them at this stage of the analysis. The STXS categories are defined independently of the Higgs decay channel, to streamline the combinations of the cross section measurement.

    +

    In the first part of the tutorial, we will setup a relatively simple unfolding, where there is a single detector-level bin for every generator-level bin we are trying to measure. We will then perform a blind analysis using this setup to see the expected sensitivity.

    +

    In this simple version of the analysis, we use a series of datacards, one for each detector-level bin, implemented as a counting experiment. We then combine the datacards for the full measurement. It is also possible to implement the same analysis as a single datacard, passing a histogram with each of the detector-level bins. Either method can be used, depending on which is more practical for the analysis being considered.

    +

    In the second part of the tutorial we will perform the same measurement with a more advanced setup, making use of differential distributions per generator-level bin we are trying to measure, as well as control regions. By providing this additional information to the fit, we are able to achieve a better and more robust unfolding result. After checking the expected sensitivity, we will take a look at the impacts and pulls of the nuisance parameters. Then we will unblind and look at the results of the measurement, produce generator-level plots and provide the correlation matrix for our measured observables.

    +

    Simplified unfolding

    +

    When determining the detector-level binning for any differential analysis the main goal is to chose a binning that distinguishes contributions from the various generator-level bins well. In the simplest case it can be done with the cut-based approach, i.e. applying the same binning for the detector-level observables as is being applied to the generator-level quantities being measured. In this case, that means binning in p_{T}(Z) and n_{\text{add. jets}}. +Due to the good lepton p_{T} resolution we can follow the original STXS scheme quite closely with the detector-level selection, with one exception, it is not possible to access the very-low transverse momenta bin p_{T}(Z)<75 GeV.

    +

    In counting/regions dicrectory you can find the datacards with five detector-level categories, each targetting a corresponding generator-level bin. Below you can find an example of the datacard for the detector-level bin with p_{T}(Z)>400 GeV.

    +
    imax    1 number of bins
    +jmax    9 number of processes minus 1
    +kmax    * number of nuisance parameters
    +--------------------------------------------------------------------------------
    +--------------------------------------------------------------------------------
    +bin          vhbb_Zmm_gt400_13TeV
    +observation  12.0
    +--------------------------------------------------------------------------------
    +bin                                   vhbb_Zmm_gt400_13TeV   vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV   vhbb_Zmm_gt400_13TeV     vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV vhbb_Zmm_gt400_13TeV
    +process                               ggZH_lep_PTV_GT400_hbb ZH_lep_PTV_GT400_hbb ZH_lep_PTV_250_400_hbb ggZH_lep_PTV_250_400_hbb Zj1b            Zj0b_c          Zj0b_udsg       VVLF            Zj2b            VVHF
    +process                               -3                     -2                   -1                     0                        1               2               3               4               5               6
    +rate                                  0.0907733              0.668303             0.026293               0.00434588               3.78735         2.58885         4.09457         0.413716        7.02731         0.642605
    +--------------------------------------------------------------------------------
    +
    +
    +

    You can see the contributions from various background processes, namely Z+jets, t\bar{t} and the single top, as well as the signal processes (ggZH and ZH) corresponding to the STXS scheme discussed above. Note that for each generator-level bin being measured, we assign a different process in combine. This is so that the signal strengths for each of their contributions can float independently in the measurement. Also note, that due to migrations, each detector-level bin will receive contributions from multiple generator-level bins.

    +

    One of the most important stages in the analysis design, is to make sure that the detector-level categories are well-chosen to target the corresponding generator-level processes.

    +

    To explicitly check the correspondance between detector- and generator-level, one can plot the contributions of each of the generator-level bins in all of the detector-level bins. You can use the script provided in the tutorial git-lab page. This script uses CombineHarvester to loop over detector-level bins, and get the rate at which each of the signal processes (generator-level bins) contributes to that detector-level bin; which is then used to plot the migration matrix.

    +
    python scripts/get_migration_matrix.py counting/combined_ratesOnly.txt
    +
    +
    +

    +

    The migration matrix shows the generator-level bins on the x-axis and the corresponding detector-level bins on the y-axis. The entries are normalized such that the sum of all contributions for a given generator-level bin sum up to 1. With this convention, the numbers in each bin represent the probability that an event from a given generator-level bin is reconstructed in a given detector-level bin if it is reconstructed at all within the considered bins.

    +

    Now that we checked the response matrix we can attempt the maximum likelihood unfolding. We can use the multiSignalModel physics model available in Combine, which assigns a parameter of interest poi to a process p within a bin b using the syntax --PO 'map=b/p:poi[init, min, max]' to linearly scale the normalisation of this process under the parameter of interest (POI) variations. To create the workspace we can run the following command:

    +
    text2workspace.py -m 125  counting/combined_ratesOnly.txt -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zh_75_150[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zh_150_250noj[1,-5,5]'  --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zh_150_250wj[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_250_400_hbb:r_zh_250_400[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_GT400_hbb:r_zh_gt400[1,-5,5]' -o ws_counting.root
    +
    +

    In the example given above a signal POI is assigned to each generator-level bin independent of detector-level bin. This allows the measurement to take into account migrations.

    +

    To extract the measurement let's run the initial fit first using the MultiDimFit method implemented in Combine to extract the best-fit values and uncertainties on all floating parameters:

    +
    combineTool.py -M MultiDimFit --datacard ws_counting.root --setParameters r_zh_250_400=1,r_zh_150_250noj=1,r_zh_75_150=1,r_zh_150_250wj=1,r_zh_gt400=1 --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 -t -1 
    +
    +
    +

    With the option -t -1 we set Combine to fit the asimov dataset instead of actual data. +The --setParameters <param>=<value> set the initial value of parameter named . +--redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 set the POIs to the comma-separated list, instead of the default one r.

    +
    +

    While the uncertainties on the parameters of interest (POIs) can be extracted in multiple ways, the most robust way is to run the likelihood scans for a POI corresponding to each generator-level bin, it allows you to spot discontinuities in the likelihood shape in case of problems with the fit or the model.

    +
    combineTool.py -M MultiDimFit --datacard ws_counting.root -t -1 --setParameters r_zh_250_400=1,r_zh_150_250noj=1,r_zh_75_150=1,r_zh_150_250wj=1,r_zh_gt400=1 --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 --algo=grid --points=100 -P r_zh_75_150 --floatOtherPOIs=1 -n scan_r_zh_75_150
    +
    +
    +

    Now we can plot the likelihood scan and extract the expected intervals.

    +
    python scripts/plot1DScan.py higgsCombinescan_r_zh_75_150.MultiDimFit.mH120.root -o r_zh_75_150 --POI r_zh_75_150
    +
    +
      +
    • Repeat for all POIs
    • +
    +

    Shape analysis with control regions

    +

    One of the advantages of the maximum likelihood unfolding is the flexibility to choose the analysis observable and include more information on the event kinematics, consequently improving the analysis sensitivity. This analysis benefits from the shape information of the DNN output trained to differentiate the VH(bb) signal from the SM backgrounds.

    +

    The datacards for this part of the exercise located full_model_datacards/, where you can find a separate datacard for each region within full_model_datacards/regions directory and also a combined datacard full_model_datacards/comb_full_model.txt. In this case, each of the detector-level bins being used in the unfolding above is now split into multiple bins according to the DNN output score. This provides extra discrimination power to separate the signal from background and improve the measurement.

    +

    As you will find, the datacards also contain several background processes. To control them properly we will also add regions enriched in the respective backgrounds. Then we can define a common set of rate parameters for signal and control regions to scale the rates or other parameters affecting their shape.

    +

    For the shape datacards one has to specify the mapping of histograms and channels/processes as given described below:

    +
    shapes [process] [channel] [file] [nominal] [systematics_templates]
    +
    +

    Then the shape nuisance parameters can be defined in the systematics block in the datacard. More details can be found in Combine documentation pages.

    +

    In many CMS analyses there are hundreds of nuisance parameters corresponding to various source of systematics.

    +

    When we unfold to the generator-level quantities we should remove the nuisances affecting the rate of the generator-level bins, i.e. when measuring a given cross-section such as \sigma_{\textrm{gen1}}, the nuisance parameters should not change the value of that parameter itself; they should only change the relationship between that parameter and the observations. This means that, for example, effects of renormalization and factorization scales on the generator-level cross section within each bin need to be removed. Only their effects on the detector-level distribution through changes of shape within each bin as well as acceptances and efficiencies should be considered.

    +

    For this analysis, that means removing the lnN nuisance parameters: THU_ZH_mig* and THU_ZH_inc; keeping only the acceptance shape uncertainties: THU_ZH_acc and THU_ggZH_acc, which do not scale the inclusive cross sections by construction. In this analysis the normalisation effects in the THU_ZH_acc and THU_ggZH_acc templates were already removed from the shape histograms. Removing the normalization effects can be achieved by removing them from the datacard. Alternatively, freezing the respective nuisance parameters with the option --freezeParameters par_name1,par_name2. Or you can create a group following the syntax given below at the end of the combined datacard, and freeze the parameters with the --freezeNuisanceGroups group_name option.

    +
    [group_name] group = uncertainty_1 uncertainty_2 ... uncertainty_N
    +
    +

    Now we can create the workspace using the same multiSignalmodel as before:

    +
    text2workspace.py -m 125  full_model_datacards/comb_full_model.txt -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zh_75_150[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zh_150_250noj[1,-5,5]'  --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zh_150_250wj[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_250_400_hbb:r_zh_250_400[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_GT400_hbb:r_zh_gt400[1,-5,5]' --for-fits --no-wrappers --X-pack-asympows --optimize-simpdf-constraints=cms --use-histsum -o ws_full.root
    +
    +
    +

    As you might have noticed we are using a few extra versions --for-fits --no-wrappers --X-pack-asympows --optimize-simpdf-constraints=cms --use-histsum to create a workspace. They are needed to construct a more optimised pdf using the CMSHistSum class implemented in Combine to significantly lower the memory consumption.

    +
    +
      +
    • Following the instructions given earlier, create the workspace and run the initial fit with -t -1.
    • +
    +

    Since this time the datacards include shape uncertainties as well as additional categories to improve the background description the fit might take much longer, but we can submit jobs to a batch system by using the combine tool and have results ready to look at in a few minutes.

    +
    combineTool.py -M MultiDimFit -d ws_full.root --setParameters r_zh_250_400=1,r_zh_150_250noj=1,r_zh_75_150=1,r_zh_150_250wj=1,r_zh_gt400=1 --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400  -t -1 --X-rtd FAST_VERTICAL_MORPH --algo=grid --points=50 --floatOtherPOIs=1 -n .scans_blinded --job-mode condor --task-name scans_zh  --split-points 1 --generate P:n::r_zh_gt400,r_zh_gt400:r_zh_250_400,r_zh_250_400:r_zh_150_250wj,r_zh_150_250wj:r_zh_150_250noj,r_zh_150_250noj:r_zh_75_150,r_zh_75_150
    +
    +
    +

    The option --X-rtd FAST_VERTICAL_MORPH is added here and for all combineTool.py -M MultiDimFit ... to speed up the minimisation.

    +

    The job submission is handled by the CombineHarvester, the combination of options --job-mode condor --task-name scans_zh --split-points 1 --generate P:n::r_zh_gt400,r_zh_gt400:r_zh_250_400,r_zh_250_400:r_zh_150_250wj,r_zh_150_250wj:r_zh_150_250noj,r_zh_150_250noj:r_zh_75_150,r_zh_75_150 will submit the jobs to HTCondor for each POI. +The --generate option is is being used to automatically generate jobs attaching the options -P <POI> -n <name> with each of the pairs of values <POI>,<name> specified between the colons. +You can add --dry-run option to create the submissions files first and check them, and then submit the jobs with condor_submit condor_scans_zh.sub.

    +

    If you are running the tutorial from a cluster where HTCondor is not available you can also submit the jobs to the slurm system, just change the --job-mode condor to --job-mode slurm.

    +
    +

    After all jobs are completed we can combine the files for each POI:

    +
    for p in r_zh_75_150 r_zh_150_250noj r_zh_150_250wj r_zh_250_400 r_zh_gt400
    +do
    +    hadd -k -f scan_${p}_blinded.root higgsCombine.scans_blinded.${p}.POINTS.*.MultiDimFit.mH120.root
    +done
    +
    +

    And finally plot the likelihood scans

    +
    python scripts/plot1DScan.py scan_r_zh_75_150_blinded.root  -o scan_r_zh_75_150_blinded --POI r_zh_75_150 --json summary_zh_stxs_blinded.json
    +
    +

    +

    Impacts

    +

    One of the important tests before we move to the unblinding stage is to check the impacts of nuisance parameters on each POI. For this we can run the combineTool.py with -M Impacts method. We start with the initial fit, which should take about 20 minutes (good time to have a coffee break!):

    +
    combineTool.py -M Impacts -d ws_full.root -m 125 --robustFit 1 --doInitialFit --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 --X-rtd FAST_VERTICAL_MORPH
    +
    +
    +

    Note that it is important to add the option --redefineSignalPOIs [list of parameters], to produce the impacts for all POIs we defined when the workspace was created with the multiSignalModel.

    +
    +

    After the initial fit is completed we can perform the likelihood scans for each nuisance parameter. +We will submit the jobs to the HTCondor to speed up the process.

    +
    combineTool.py -M Impacts -d ws_full.root -m 125 --robustFit 1 --doFits --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 --job-mode condor --task-name impacts_zh --X-rtd FAST_VERTICAL_MORPH 
    +
    +

    Now we can combine the results into the .json format and use it to produce the impact plots.

    +
    combineTool.py -M Impacts -d ws_full.root -m 125 --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400 --output impacts.json 
    +
    +plotImpacts.py -i impacts.json -o impacts_r_zh_75_150 --POI r_zh_75_150
    +
    +

    +* Do you observe differences in impacts plots for different POIs, do these differences make sense to you?

    +

    Unfolded measurements

    +

    Now that we studied the nuisance parameter impacts for each POI, we can finally perform the measurement. +Note that for the purposes of the tutorial, we are skipping further checks and validation that you should do on your analysis. Namely the goodness of fit test and the post-fit plots of folded observables. Both of these checks were detailed in the previous exercises, which you can find under the following link.

    +

    At this stage we'll run the MultiDimFit again scanning each POI to calculate the intervals, but this time we'll remove the -t -1 option to extract the unblinded results.

    +

    Also since we want to unfold the measurements to the generator-level observables, i.e. extract the cross sections, we remove the theoretical uncertainties affecting the rates of signal processes, we can do this be freezing them --freezeNuisanceGroups <group_name>, using the group_name you assigned earlier in the tutorial.

    +

    Now plot the scans and collect the measurements in the json file summary_zh_stxs.json.

    +
    python scripts/plot1DScan.py scan_r_zh_75_150.root -o r_zh_75_150 --POI r_zh_75_150 --json summary_zh_stxs.json  
    +
    +

    +

    Repeat the same command for other POIs to fill the summary_zh_stxs.json, which can then be used to make the cross section plot by multiplying the standard model cross sections by the signal strengths' best-fit values as shown below.

    +
    python scripts/make_XSplot.py summary_zh_stxs.json
    +
    +

    +

    POI correlations

    +

    In addition to the cross-section measurements it is very important to publish covariance or correlation information of the measured cross sections. +This allows the measurement to be properly intepreted or reused in combined fits.

    +

    The correlation matrix or covariance matrix can be extracted from the results after the fit. Here we can use the FitDiagnostics or MultiDimFit method.

    +
    combineTool.py -M FitDiagnostics --datacard ws_full.root --setParameters r_zh_250_400=1,r_zh_150_250noj=1,r_zh_75_150=1,r_zh_150_250wj=1,r_zh_gt400=1 --redefineSignalPOIs r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400  --robustHesse 1 -n .full_model --X-rtd FAST_VERTICAL_MORPH
    +
    +

    Then the RooFitResult, containing correlations matrix, can be found in the fitDiagnostics.full_model.root file under the name fit_s. The script plotCorrelations_pois.py from the exercise git-lab repository can help to plot the correlation matrix.

    +
    python scripts/plotCorrelations_pois.py -i fitDiagnostics.full_model.root:fit_s -p r_zh_75_150,r_zh_150_250noj,r_zh_150_250wj,r_zh_250_400,r_zh_gt400
    +
    +
    +

    + + + + + +
    + + +
    +

    + Documentation built with MkDocs. +

    + + + + +
    + + + + + + + + + + + + + + + + + + + + diff --git a/tutorials-part-2/index.html b/tutorials-part-2/index.html index 45b0b2953ed..bd0778dece9 100644 --- a/tutorials-part-2/index.html +++ b/tutorials-part-2/index.html @@ -188,6 +188,12 @@ + +
  • + Exercise: unfolding in combine +
  • + +