ToDos #62

sofia-calgaro · 2023-03-06T09:58:23Z

High-level priority ToDos

implement spms plots: separate by barrel / fibers -> plotting.plot_per_barrel_and_position already works; need to implement plotting.plot_per_fiber_and_barrel and plot_styles.plot_heatmap (Michele WIP)
fix when a detector is ON but NOT processable (right now the code crashes) - this happens for V05267B and V05612B starting from end of p07 (never happened before because everytime a detector was NOT processable was also OFF)
add test functions to improve the Codecov coverage
par1 vs par2: fix AUX entries
fix saving of FWHM values (here, values are constant for a selected time window; which means that if we inspect a new time window later on, we have 1) to update previous values, by evaluating FWHM in the sum of both two windows [previous one + new one] and 2) we have to save these as new values, so it's wrong to merge the old dataframe with the new one - CHANGE IT!)
new plot ideas #92
quality cut flags: we could think to add one (or more) column to the big dataframe, save it, and only later apply the cuts. Which means that for an user plot production you'll get just the entries where the QC is True. But for the Dashboard we will have the full complete object to work with. We can also think to append more QC columns to the same dataframe, and through the Dashboard we can later select which QC to look at (ie is baseline? is 0nu2b? ...). QUALITY CUTS COLUMNS ARE NEEDED TO EVALUATE RATE OF ACCEPTANCE OR REJECTION
resampled values: handle in a better way gaps and last resampled point

finalize data structure to write to shelve file to retrieve by dashboard (DataAnalysis object + plot_info?) #76 (Sofia)
add the removal of PULS01ANA from geds when monitoring the pulser events, to remove instabilities of the pulser itself (VERY HIGH PRIORITY). Add a key in config file to either 1) plot the parameters as usual 2) introduce the info from PULS01ANA by either plotting ratios or differences wrt ged parameters (Sofia)

{
 "parameter":  "gain",
 "event_type":  "pulser",
  "plot_structure": "...",
  "plot_style": "...",
  "include_puls": "ratio" // or "diff"; if nothing, do not include puls01ana data in geds (spms???) plots (=case no. 1)
}

building the FileDB with the full "scan_tables_columns()" option. This means that it opens every file, checks what channels are available in each file, as well as what columns are available for each of those channels. This obviously takes a long time.

Jason:

I would like data prod to make a full-scan fdb.h5 for each run (we can generate them at the cycle level until a run is closed) and make FileDB able to load in a chain of such .h5's, that would make it so that you don’t have to do short cuts to get it to work in finite time for monitoring. This is already on our to-do list, perhaps we will bump the priority.

Medium-level priority ToDos

Low-level priority ToDos

background plot, separating for different energy ranges (summary plot for M=total mass of Ge array)
time difference when using containers (Mariia gets the right utc timestamp conversion when not using the container)
currently, the status map works only if we plot something. What if we want to inspect only the status map? Divide the two things
status maps check if a given parameter is out of threshold, looking at resampled values. That helps to aovid getting "bad" detectors when there are occasional spikes, but what if we want to get the timestamp where spikes/big fluctuations are present? Like, what if we want to build a list of keys to later exclude in the analysis? It would be nice to save these keys
fix background facecolor when plotting more parameters (first paramter has the correct settings; then, background switches from white to gray and grid from gray to white...why? Becuase of seaborn library used for maps. Indeed, backgrounds change only if status maps are created)
labels and colours must be fixed when plotting summary plots (per CC4, per string, ...) with all data - the problem is fixed when we plot either ONLY the resampled values or ONLY the not-resampled values (this will be probably the case in the future, since no one wants to look at a multitude of overlapping lines)
add possibility to plot something for the full array (eg energy spectrum = histogram of one variable, summing contribution coming from each channel) -> dataloader already has a feature for this
the code itself does not work giving start&end that select one cycle file only (it does not concatenate any object) -> fix it

The text was updated successfully, but these errors were encountered:

sagitta42 · 2023-05-08T15:00:30Z

Some more low-priority ToDo improvements:

pass Subsystem.channel_map to AnalysisData (instead of passing only Subsystem.data, pass the whole Subsystem object)
"inherit" the subsystem type so that is_geds() type functions don't have to rely on interpreting subsystem type from data columns but simply do return self.subsystem_type == 'geds' etc.
load detector mass in Subsystem.get_channel_map()
"inherit" the channel map into AnalysisData, to simplify exposure calculation not having to call legendmeta again

Also possible but need to draw on a napkin and think about it:

define the "string visualization" type plot as a plot structure
make it so that the "status" plot is plotted calling that function, but also exposure so that then we can do

{
 "parameter":  "exposure",
  "plot_structure": "string visualization",
  "plot_style": "something"
}

for also any future potential thing that we want to plot in that way. It's not as trivial as plot structure-style flow, but should be possible to make the string-detector layout in a "plot structure" sort of function, and then for each detector call something that plots with the colors etc. what needs to be plotted be it "status" or exposure or some other performance.

sagitta42 · 2023-05-09T07:14:29Z

We finally arrived at the point discussed ever since it was asked to plot cal+phy... Loading large data.
I can't load p04 r002 phy data, I get Killed from DataLoader.
Need to implement the next functionality of DataLoader, but that would change many things in the structure: the plot structure function would have to take the Subsystem object before it has data in it, create fig & axes, then call Subsystem.next() type of function in a loop, that will in turn call DataLoader.next() and provide next chunk of data to plot. Instead of first doing get_data() and then passing all to plot structure.

That means setting up the data loader has to happen first separately, saved in some Subsystem internal variable(s), and then one can call get_data() to get all, or next_data() to get chunks. It's not hard, just needs structural changes.

We need to first sort out current PRs and ongoing things, update main with latest stable stuff, then I'll do that change, push to main, and then we work with the new structure for other updates...

sofia-calgaro · 2023-05-09T09:50:55Z

We finally arrived at the point discussed ever since it was asked to plot cal+phy... Loading large data. I can't load p04 r002 phy data, I get Killed from DataLoader. Need to implement the next functionality of DataLoader, but that would change many things in the structure: the plot structure function would have to take the Subsystem object before it has data in it, create fig & axes, then call Subsystem.next() type of function in a loop, that will in turn call DataLoader.next() and provide next chunk of data to plot. Instead of first doing get_data() and then passing all to plot structure.

That means setting up the data loader has to happen first separately, saved in some Subsystem internal variable(s), and then one can call get_data() to get all, or next_data() to get chunks. It's not hard, just needs structural changes.

We need to first sort out current PRs and ongoing things, update main with latest stable stuff, then I'll do that change, push to main, and then we work with the new structure for other updates...

As a workaround we could implement a new part of code that separately studies each run included in the BIG time range chosen by a user, and then it separately attach each inspected run at a final stage. Then we call all the remnant plotting functions.

Like, if you want to plot all p04 runs, there would be a first stage in which you inspect only r000 and saved it, then you do it again with r001 and r002. After this you concatenate produced dfs and switch to plotting/status.

Same can be done within a HUGE run. We can like first inspect the size of data we are trying to load. If higher than a given threshold, we can proceed to cut it into multiple sub-runs, inspect them separately, concatenate them at the very end, and proceed in plotting.

There's already the "append" feature that appends new data to an already existing dataframe, but it is not automatized for automatically do it for a large time interval or a heavy run. Maybe we can exploit that

sagitta42 · 2023-05-09T15:20:42Z

As a workaround we could implement a new part of code that separately studies each run included in the BIG time range chosen by a user, and then it separately attach each inspected run at a final stage. Then we call all the remnant plotting functions.

Like, if you want to plot all p04 runs, there would be a first stage in which you inspect only r000 and saved it, then you do it again with r001 and r002. After this you concatenate produced dfs and switch to plotting/status.

This would work for the Dashboard. It must be run and plot data is saved, then concatenated for plot - and it wouldn't be that much data because it's already trimmed through event selection etc.

For HUGE run, that would mean re-shuffling parts of code anyway, we'd have to move figures out somewhere so that both chunks are still plotted on the same figure (e.g. create Subsystem for each chunk, then plot). But then, that's equivalent really to the next() function (create one subsystem, load data for each chunk, then plot), and instead of moving the figure creation out, the Subsystem and get_data() (and analysis data...) would move in. Definitely need napkin drawing :')

sofia-calgaro added the help wanted Extra attention is needed label Mar 6, 2023

sofia-calgaro pinned this issue Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToDos #62

ToDos #62

sofia-calgaro commented Mar 6, 2023 •

edited

Loading

sagitta42 commented May 8, 2023 •

edited

Loading

sagitta42 commented May 9, 2023

sofia-calgaro commented May 9, 2023 •

edited

Loading

sagitta42 commented May 9, 2023

ToDos #62

ToDos #62

Comments

sofia-calgaro commented Mar 6, 2023 • edited Loading

High-level priority ToDos

Medium-level priority ToDos

Low-level priority ToDos

sagitta42 commented May 8, 2023 • edited Loading

sagitta42 commented May 9, 2023

sofia-calgaro commented May 9, 2023 • edited Loading

sagitta42 commented May 9, 2023

sofia-calgaro commented Mar 6, 2023 •

edited

Loading

sagitta42 commented May 8, 2023 •

edited

Loading

sofia-calgaro commented May 9, 2023 •

edited

Loading