Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToDos #62

Open
24 of 45 tasks
sofia-calgaro opened this issue Mar 6, 2023 · 4 comments
Open
24 of 45 tasks

ToDos #62

sofia-calgaro opened this issue Mar 6, 2023 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@sofia-calgaro
Copy link
Collaborator

sofia-calgaro commented Mar 6, 2023

High-level priority ToDos

  • implement spms plots: separate by barrel / fibers -> plotting.plot_per_barrel_and_position already works; need to implement plotting.plot_per_fiber_and_barrel and plot_styles.plot_heatmap (Michele WIP)

  • fix when a detector is ON but NOT processable (right now the code crashes) - this happens for V05267B and V05612B starting from end of p07 (never happened before because everytime a detector was NOT processable was also OFF)

  • add test functions to improve the Codecov coverage

  • par1 vs par2: fix AUX entries

  • fix saving of FWHM values (here, values are constant for a selected time window; which means that if we inspect a new time window later on, we have 1) to update previous values, by evaluating FWHM in the sum of both two windows [previous one + new one] and 2) we have to save these as new values, so it's wrong to merge the old dataframe with the new one - CHANGE IT!)

  • new plot ideas #92

  • quality cut flags: we could think to add one (or more) column to the big dataframe, save it, and only later apply the cuts. Which means that for an user plot production you'll get just the entries where the QC is True. But for the Dashboard we will have the full complete object to work with. We can also think to append more QC columns to the same dataframe, and through the Dashboard we can later select which QC to look at (ie is baseline? is 0nu2b? ...). QUALITY CUTS COLUMNS ARE NEEDED TO EVALUATE RATE OF ACCEPTANCE OR REJECTION

  • resampled values: handle in a better way gaps and last resampled point

image

image

{
 "parameter":  "gain",
 "event_type":  "pulser",
  "plot_structure": "...",
  "plot_style": "...",
  "include_puls": "ratio" // or "diff"; if nothing, do not include puls01ana data in geds (spms???) plots (=case no. 1)
}
  • add quality cut flag - the flag column is not saved in the output dataframe (Sofia, Mariia polished a bit)
  • add Slow Control parameters -> right now they are only retrieved to be further plotted in the dashboard (Sofia)
  • add hdf output files with new structure + created new notebook to read data stored there (Sofia)
  • AUX entries: make 'append' option available for longer runs
  • add par1 vs par2: DONE but needs some re-shaping (Mariia, Sofia)
  • add flags of FC_bsln and muon events (not only flag of pulser, phy, all events) (Sofia)
  • add muon channel monitoring (Sofia)
  • add the option to remove timestamps from the query (could be useful eg if we want to remove a spike from the time window and see how things look like without it...of course we need to know the timestamp/datetime at which this happens, so 1) we have to save/display it the first time, and 2) then we have to query it out (Sofia, Mariia fixes)
  • add "exposure"/TP number of events for each channel: it would be nice to display these number using the channel layout provided by Felix. The exposure is the product within the mass (Florian already has that entry saved) and the time, ie the product among the inverse of the pulser rate (retrievable from channel maps if we want to be sure about the value that was set) and the number of pulser events (=number of rows for evt type = pulser in geds channels; this is equal for all ged channels since we are acquring data with a global trigger during phy data) - (Sofia)
  • add STD to plots (Sofia)
  • make the code working with a cron job: add a new parser in run.py and pick up new prcoessed files; add them to the already processed data for a given run (Sofia)
  • added re-use of already evaluated mean the first time we studied the run => future improvement: implement mean of X% of data during automatic production (ie when there are already saved data and we want to update the mean value over the new, eg, 10% of data once we include a new bunch of data - time-consuming? truncation problems - need to recalculate initial % variations in case-? ) (Sofia)
  • add error that if you don't put "cuts": "K lines" when plotting K lines events,you don't get anything for it (at the moment the code runs, but it returns an empty plot) (Sofia)
  • add HV and DAQ info to our dataframes + detector type (coax, icpc, ...) (Sofia)
  • implement a function to plot a parameter (eg.FWHM) as a function of channel IDs (can be ordered by string/CC4/...) (Sofia)
  • figure saving: save final figure in plotting.py (and not in plot_styles.py) (Sofia)
  • sometimes it happens that a given channel is missed for a given timestamp: what to do in those situations? Does DataLoader have some option to skip that timestamp without priorly quering it? Grace:

building the FileDB with the full "scan_tables_columns()" option. This means that it opens every file, checks what channels are available in each file, as well as what columns are available for each of those channels. This obviously takes a long time.

Jason:

I would like data prod to make a full-scan fdb.h5 for each run (we can generate them at the cycle level until a run is closed) and make FileDB able to load in a chain of such .h5's, that would make it so that you don’t have to do short cuts to get it to work in finite time for monitoring. This is already on our to-do list, perhaps we will bump the priority.

Medium-level priority ToDos

  • one function to use groupby: string, CC4, HV, DAQ #77
  • check why when we load AUX channels the "loading data" bar is N/1, while when we load geds is N/178
  • fix timestamp selection when using the time window (currently, it applies the cut only on filenames; to be more precise, we should take into account timestamps stored inside data and apply the cut over them => one workaround could be to load more files -the "nearest" one to the selected time range- and then cut away rows that contain timestamps out the selected time window
  • check event rate for spms
  • fix x ticks for scatter plot (problems when we deal with few events). One way to fix it, is to get the minimum/maximum datetime value among all channels and use them for x ticks.
  • fix analysis_data.channel_mean()  #72
  • function to plot saved means for each channel (can re-use the vs ch function) (Sofia)
  • add error/warning messages when something of the config file is not ok (Sofia)
  • check if the list of keys load only the wanted timestamps or the range between the min/max datetimes => nope, it just load the corresponding cycles (Sofia)
  • check event rate for geds (Sofia)

Low-level priority ToDos

  • background plot, separating for different energy ranges (summary plot for M=total mass of Ge array)
  • time difference when using containers (Mariia gets the right utc timestamp conversion when not using the container)
  • currently, the status map works only if we plot something. What if we want to inspect only the status map? Divide the two things
  • status maps check if a given parameter is out of threshold, looking at resampled values. That helps to aovid getting "bad" detectors when there are occasional spikes, but what if we want to get the timestamp where spikes/big fluctuations are present? Like, what if we want to build a list of keys to later exclude in the analysis? It would be nice to save these keys
  • fix background facecolor when plotting more parameters (first paramter has the correct settings; then, background switches from white to gray and grid from gray to white...why? Becuase of seaborn library used for maps. Indeed, backgrounds change only if status maps are created)
  • labels and colours must be fixed when plotting summary plots (per CC4, per string, ...) with all data - the problem is fixed when we plot either ONLY the resampled values or ONLY the not-resampled values (this will be probably the case in the future, since no one wants to look at a multitude of overlapping lines)
  • add possibility to plot something for the full array (eg energy spectrum = histogram of one variable, summing contribution coming from each channel) -> dataloader already has a feature for this
  • the code itself does not work giving start&end that select one cycle file only (it does not concatenate any object) -> fix it
@sofia-calgaro sofia-calgaro added the help wanted Extra attention is needed label Mar 6, 2023
@sofia-calgaro sofia-calgaro pinned this issue Mar 30, 2023
@sagitta42
Copy link
Contributor

sagitta42 commented May 8, 2023

Some more low-priority ToDo improvements:

  • pass Subsystem.channel_map to AnalysisData (instead of passing only Subsystem.data, pass the whole Subsystem object)
  • "inherit" the subsystem type so that is_geds() type functions don't have to rely on interpreting subsystem type from data columns but simply do return self.subsystem_type == 'geds' etc.
  • load detector mass in Subsystem.get_channel_map()
  • "inherit" the channel map into AnalysisData, to simplify exposure calculation not having to call legendmeta again

Also possible but need to draw on a napkin and think about it:

  • define the "string visualization" type plot as a plot structure
  • make it so that the "status" plot is plotted calling that function, but also exposure so that then we can do
{
 "parameter":  "exposure",
  "plot_structure": "string visualization",
  "plot_style": "something"
}

for also any future potential thing that we want to plot in that way. It's not as trivial as plot structure-style flow, but should be possible to make the string-detector layout in a "plot structure" sort of function, and then for each detector call something that plots with the colors etc. what needs to be plotted be it "status" or exposure or some other performance.

@sagitta42
Copy link
Contributor

We finally arrived at the point discussed ever since it was asked to plot cal+phy... Loading large data.
I can't load p04 r002 phy data, I get Killed from DataLoader.
Need to implement the next functionality of DataLoader, but that would change many things in the structure: the plot structure function would have to take the Subsystem object before it has data in it, create fig & axes, then call Subsystem.next() type of function in a loop, that will in turn call DataLoader.next() and provide next chunk of data to plot. Instead of first doing get_data() and then passing all to plot structure.

That means setting up the data loader has to happen first separately, saved in some Subsystem internal variable(s), and then one can call get_data() to get all, or next_data() to get chunks. It's not hard, just needs structural changes.

We need to first sort out current PRs and ongoing things, update main with latest stable stuff, then I'll do that change, push to main, and then we work with the new structure for other updates...

@sofia-calgaro
Copy link
Collaborator Author

sofia-calgaro commented May 9, 2023

We finally arrived at the point discussed ever since it was asked to plot cal+phy... Loading large data. I can't load p04 r002 phy data, I get Killed from DataLoader. Need to implement the next functionality of DataLoader, but that would change many things in the structure: the plot structure function would have to take the Subsystem object before it has data in it, create fig & axes, then call Subsystem.next() type of function in a loop, that will in turn call DataLoader.next() and provide next chunk of data to plot. Instead of first doing get_data() and then passing all to plot structure.

That means setting up the data loader has to happen first separately, saved in some Subsystem internal variable(s), and then one can call get_data() to get all, or next_data() to get chunks. It's not hard, just needs structural changes.

We need to first sort out current PRs and ongoing things, update main with latest stable stuff, then I'll do that change, push to main, and then we work with the new structure for other updates...

As a workaround we could implement a new part of code that separately studies each run included in the BIG time range chosen by a user, and then it separately attach each inspected run at a final stage. Then we call all the remnant plotting functions.

Like, if you want to plot all p04 runs, there would be a first stage in which you inspect only r000 and saved it, then you do it again with r001 and r002. After this you concatenate produced dfs and switch to plotting/status.

Same can be done within a HUGE run. We can like first inspect the size of data we are trying to load. If higher than a given threshold, we can proceed to cut it into multiple sub-runs, inspect them separately, concatenate them at the very end, and proceed in plotting.

There's already the "append" feature that appends new data to an already existing dataframe, but it is not automatized for automatically do it for a large time interval or a heavy run. Maybe we can exploit that

@sagitta42
Copy link
Contributor

As a workaround we could implement a new part of code that separately studies each run included in the BIG time range chosen by a user, and then it separately attach each inspected run at a final stage. Then we call all the remnant plotting functions.

Like, if you want to plot all p04 runs, there would be a first stage in which you inspect only r000 and saved it, then you do it again with r001 and r002. After this you concatenate produced dfs and switch to plotting/status.

This would work for the Dashboard. It must be run and plot data is saved, then concatenated for plot - and it wouldn't be that much data because it's already trimmed through event selection etc.

For HUGE run, that would mean re-shuffling parts of code anyway, we'd have to move figures out somewhere so that both chunks are still plotted on the same figure (e.g. create Subsystem for each chunk, then plot). But then, that's equivalent really to the next() function (create one subsystem, load data for each chunk, then plot), and instead of moving the figure creation out, the Subsystem and get_data() (and analysis data...) would move in. Definitely need napkin drawing :')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants