cax all-run mode skips corrections (and slow/stalls) #108

pdeperio · 2017-06-01T18:27:05Z

When running cax in full DB all-run mode (not specifying a run) for corrections, e.g. on Midway:

source activate pax_v6.6.5
HOSTNAME=midway-login1 cax --once --config  /project/lgrandi/xenon1t/cax/cax_AddCorrection.json --log DEBUG

it appears to complete the first task, but then skips the rest proceeding to the following run:

root        : INFO     Executing AddElectronLifetime.
root        : DEBUG    Loading config file /project/lgrandi/xenon1t/cax/cax_AddCorrection.json
root        : DEBUG    dataset_list not specified, operating on entire DB
AddElectronLifetime: INFO     Run 10113: calculated lifetime of 575 us
AddElectronLifetime: INFO     Run 10112: calculated lifetime of 575 us

Explicitly specifying a single run works as expected, e.g.:

HOSTNAME=midway-login1 cax --once --config cax_slowcontrol.json --log DEBUG --run 10112

runs all the tasks, which is how we're doing all the corrections now on Midway. However, it would be nice if all-run mode can work too so that the xe1t-datamanager daemon can add corrections immediately after each run is transferred at LNGS.

Furthermore, completing the task in all-run mode takes much longer than in single-run mode, maybe related to #105, this query, and/or something in corrections.py since removing all correction tasks runs fine and fast.

The text was updated successfully, but these errors were encountered:

XeBoris · 2017-06-15T07:59:00Z

Very basic question but why do you use

HOSTNAME=midway-login1 cax --once --config  /project/lgrandi/xenon1t/cax/cax_AddCorrection.json --log DEBUG

for the all-run mode instead of massive-cax? I could think of a logic in massive-cax to select only runs which do not have a correction and cycle only over them. massive-ruciax works like that.

pdeperio · 2017-06-15T12:50:12Z

was giving an example where everybody can reproduce. it will actually be run on xe1t-datamanager, which does not have a batch queue system, so if you want to use massive-cax it needs to be adjusted (maybe @malfonsi started working on making this more flexible?). if massive-ruciax is running on datamanager then great! are you able to merge the functions (I think this was the original plan before, to avoid this duplicate coding)?

XeBoris · 2017-06-20T14:14:42Z

Regarding duplicate codes, massive-ruciax has some command line calls which are more driven by the "pure" upload purpose of ruciax itself. But I can have a look at it if it is worth to spend time on merging it.
I had a look at "/project/lgrandi/xenon1t/cax/cax_AddCorrection.json" and it requests the following task list:

AddElectronLifetime
AddGains
AddDriftVelocity
SetS2xyMap
SetLightCollectionEfficiency
SetFieldDistortion
SetNeuralNetwork
~~CopyPull~~
~~AddChecksum~~
~~SetPermission~~
~~ProcessBatchQueueHax~~
~~BufferPurger~~
I think these are enough for the corrections. Trying to speed up the cycle means to test for each function if there is already a value set to the according raw data set in the runDB. For example the gains are added (processor/DEFAULT). For my understanding the key is to apply the "AddCorrection" action only to raw data sets which do not have the "processor/correction_versions" tag in the Xenon1T runDB. This should not be to complicated to do.

XeBoris · 2017-06-20T15:18:22Z

@pdeperio @lucrlom
Have a look at this change in massive-cax which I would propose:
https://github.com/XENON1T/cax/tree/AddCorrectionsEff
You would need another "--addcorrecctions" when you start massive-cax but then it selects only runs which do not have the correct runDB entries for the corrections (https://github.com/XENON1T/cax/blob/AddCorrectionsEff/cax/main.py#L300)

Quick and easy, but we can also think about a more complicated selection of the runs.

pdeperio · 2017-06-21T21:14:09Z

Thanks Boris. For sustainability, shouldn't that check to skip pulling from DB go into the corrections.py module? Or do we need to skip the entire run for this? (I forget where all the queries are and how much info each one pulls back, maybe #105.)

Then, for generality, the block of code you implemented in main.py is just for bypassing job submission and running locally instead (and can work with any set of tasks). So the option could be e.g. --local instead. (Of course this is just sweeping the original issue with cax under the rug, but ok since this works.)

XeBoris · 2017-06-28T12:02:04Z

A check if it necessary to add "new" corrections is already implemented in corrections.py. (See line: https://github.com/XENON1T/cax/blob/master/cax/tasks/corrections.py#L66). This test is useful but does not avoid pulling the data base information before. It becomes worse when you call each correction as a unique class from the "task_list" tag in the json file. Then you pull 7 times data from the data base for a single run. But I should mention here that "AddElectronLifetime" takes the longest time when it comes to parse the sympy function in https://github.com/XENON1T/cax/blob/master/cax/tasks/corrections.py#L62.

Therefore my intention is to have massive-cax running and use it for other services then "only" job submission to batch queues. I changed --AddCorrections to --local (makes more sense).

pdeperio assigned JelleAalbers and coderdj Jun 1, 2017

pdeperio changed the title ~~cax all-run mode skips corrections (and stalls)~~ cax all-run mode skips corrections (and slow/stalls) Jun 1, 2017

pdeperio mentioned this issue Sep 13, 2017

Add corrections eff #114

Open

pdeperio mentioned this issue Nov 8, 2017

Reduce job submission count #134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cax all-run mode skips corrections (and slow/stalls) #108

cax all-run mode skips corrections (and slow/stalls) #108

pdeperio commented Jun 1, 2017 •

edited

Loading

XeBoris commented Jun 15, 2017

pdeperio commented Jun 15, 2017 •

edited

Loading

XeBoris commented Jun 20, 2017

XeBoris commented Jun 20, 2017

pdeperio commented Jun 21, 2017 •

edited

Loading

XeBoris commented Jun 28, 2017

cax all-run mode skips corrections (and slow/stalls) #108

cax all-run mode skips corrections (and slow/stalls) #108

Comments

pdeperio commented Jun 1, 2017 • edited Loading

XeBoris commented Jun 15, 2017

pdeperio commented Jun 15, 2017 • edited Loading

XeBoris commented Jun 20, 2017

XeBoris commented Jun 20, 2017

pdeperio commented Jun 21, 2017 • edited Loading

XeBoris commented Jun 28, 2017

pdeperio commented Jun 1, 2017 •

edited

Loading

pdeperio commented Jun 15, 2017 •

edited

Loading

pdeperio commented Jun 21, 2017 •

edited

Loading