Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cax all-run mode skips corrections (and slow/stalls) #108

Open
pdeperio opened this issue Jun 1, 2017 · 6 comments
Open

cax all-run mode skips corrections (and slow/stalls) #108

pdeperio opened this issue Jun 1, 2017 · 6 comments
Assignees

Comments

@pdeperio
Copy link
Contributor

pdeperio commented Jun 1, 2017

When running cax in full DB all-run mode (not specifying a run) for corrections, e.g. on Midway:

source activate pax_v6.6.5
HOSTNAME=midway-login1 cax --once --config  /project/lgrandi/xenon1t/cax/cax_AddCorrection.json --log DEBUG 

it appears to complete the first task, but then skips the rest proceeding to the following run:

root        : INFO     Executing AddElectronLifetime.
root        : DEBUG    Loading config file /project/lgrandi/xenon1t/cax/cax_AddCorrection.json
root        : DEBUG    dataset_list not specified, operating on entire DB
AddElectronLifetime: INFO     Run 10113: calculated lifetime of 575 us
AddElectronLifetime: INFO     Run 10112: calculated lifetime of 575 us

Explicitly specifying a single run works as expected, e.g.:

HOSTNAME=midway-login1 cax --once --config cax_slowcontrol.json --log DEBUG --run 10112

runs all the tasks, which is how we're doing all the corrections now on Midway. However, it would be nice if all-run mode can work too so that the xe1t-datamanager daemon can add corrections immediately after each run is transferred at LNGS.

Furthermore, completing the task in all-run mode takes much longer than in single-run mode, maybe related to #105, this query, and/or something in corrections.py since removing all correction tasks runs fine and fast.

@pdeperio pdeperio changed the title cax all-run mode skips corrections (and stalls) cax all-run mode skips corrections (and slow/stalls) Jun 1, 2017
@XeBoris
Copy link
Contributor

XeBoris commented Jun 15, 2017

Very basic question but why do you use

HOSTNAME=midway-login1 cax --once --config  /project/lgrandi/xenon1t/cax/cax_AddCorrection.json --log DEBUG 

for the all-run mode instead of massive-cax? I could think of a logic in massive-cax to select only runs which do not have a correction and cycle only over them. massive-ruciax works like that.

@pdeperio
Copy link
Contributor Author

pdeperio commented Jun 15, 2017

was giving an example where everybody can reproduce. it will actually be run on xe1t-datamanager, which does not have a batch queue system, so if you want to use massive-cax it needs to be adjusted (maybe @malfonsi started working on making this more flexible?). if massive-ruciax is running on datamanager then great! are you able to merge the functions (I think this was the original plan before, to avoid this duplicate coding)?

@XeBoris
Copy link
Contributor

XeBoris commented Jun 20, 2017

Regarding duplicate codes, massive-ruciax has some command line calls which are more driven by the "pure" upload purpose of ruciax itself. But I can have a look at it if it is worth to spend time on merging it.
I had a look at "/project/lgrandi/xenon1t/cax/cax_AddCorrection.json" and it requests the following task list:

  • AddElectronLifetime
  • AddGains
  • AddDriftVelocity
  • SetS2xyMap
  • SetLightCollectionEfficiency
  • SetFieldDistortion
  • SetNeuralNetwork
  • CopyPull
  • AddChecksum
  • SetPermission
  • ProcessBatchQueueHax
  • BufferPurger
    I think these are enough for the corrections. Trying to speed up the cycle means to test for each function if there is already a value set to the according raw data set in the runDB. For example the gains are added (processor/DEFAULT). For my understanding the key is to apply the "AddCorrection" action only to raw data sets which do not have the "processor/correction_versions" tag in the Xenon1T runDB. This should not be to complicated to do.

@XeBoris
Copy link
Contributor

XeBoris commented Jun 20, 2017

@pdeperio @lucrlom
Have a look at this change in massive-cax which I would propose:
https://github.com/XENON1T/cax/tree/AddCorrectionsEff
You would need another "--addcorrecctions" when you start massive-cax but then it selects only runs which do not have the correct runDB entries for the corrections (https://github.com/XENON1T/cax/blob/AddCorrectionsEff/cax/main.py#L300)

Quick and easy, but we can also think about a more complicated selection of the runs.

@pdeperio
Copy link
Contributor Author

pdeperio commented Jun 21, 2017

Thanks Boris. For sustainability, shouldn't that check to skip pulling from DB go into the corrections.py module? Or do we need to skip the entire run for this? (I forget where all the queries are and how much info each one pulls back, maybe #105.)

Then, for generality, the block of code you implemented in main.py is just for bypassing job submission and running locally instead (and can work with any set of tasks). So the option could be e.g. --local instead. (Of course this is just sweeping the original issue with cax under the rug, but ok since this works.)

@XeBoris
Copy link
Contributor

XeBoris commented Jun 28, 2017

A check if it necessary to add "new" corrections is already implemented in corrections.py. (See line: https://github.com/XENON1T/cax/blob/master/cax/tasks/corrections.py#L66). This test is useful but does not avoid pulling the data base information before. It becomes worse when you call each correction as a unique class from the "task_list" tag in the json file. Then you pull 7 times data from the data base for a single run. But I should mention here that "AddElectronLifetime" takes the longest time when it comes to parse the sympy function in https://github.com/XENON1T/cax/blob/master/cax/tasks/corrections.py#L62.

Therefore my intention is to have massive-cax running and use it for other services then "only" job submission to batch queues. I changed --AddCorrections to --local (makes more sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants