-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADCP data handling - faster visualisation or ADCP ensemble averaging for single ping datasets #654
Comments
Hi @BecCowley, Could you please provide examples/cases of the functionality? For example, why you need to average the ping before the import!? Is this averaging required for modern pre-processing (conversions, adjustments, etc) requirements? My first impression is that this would be a post-processing no!? PS: AFAIK, the ADCPy & dolfyn packages allow only simple binning. I don't know what UWA uses, It would be nice to have a peek at the code. I/We need to also understand the full requirements for other users, conventions, etc. |
Hi @ocehugo, How the data is supplied is up for discussion - maybe we provide two QC'd versions, one in single ping and one in hourly averaged. I can send you the UWA code I have if you like? |
May you please share a very slow file example and describe the workflow/ steps where it is slow!? It's slow only for displaying or processing!? IMO, we could try to optimise this first before changing the data workflow. I still think this should be a post-processing task, even if we have to create it/move it from pre-processing. afaik we need to keep the raw pings in fv00 - that's why I consider the AVG a post-processing thing, even though some data shift is done at import level. If we can't optimise much, maybe we could average on the fly at plot time (via option), via exclusive button on the main toolbox window, or add something like a windowed plot around large variances ( e.g. manual qc flags), or some similar heuristics. Other option is to load the slow thing and generate an average file that can be loaded ( say FV00ba), which is something similar to what wqm people wish. Is the UWA code doing anything else besides binned avg that we would need!? |
Hi Hugo, happy to share.
If I load up the entire mooring, it is impossible to work the toolbox. I must admit, I haven't tried this again for a while as I was so discouraged the first time. My recollection is that when I tried to QC the ADCP data, I couldn't assess the toolbox adequately because of Issue #636 - If I zoom into an area of the ADCP data, then decide I wanted to see a plot that wasn't there, the toolbox slows as it has to re-draw the entire plot again. Then I have to re-zoom to where I was (again, taking forever). I think the problem is the way the toolbox is rendering these huge files in the plots. I don't know if having two other ADCPS and all the other instruments makes a difference. Would you like me to share the data from an entire mooring (and the associated csv files)? As for UWA, I have only focussed on the binning step. Some other things to consider:
|
That's a long beast :)
Yeap , the repeat/slow drawing is a problem - so i assume it's only the display/plots that are a nuisance, correct!? I reckon we got two options here a. only render/draw what the user is seeing and/or b. cache the first draw and re-use. But I didn't look into it yet.
This would be perfect! I don't have such a dense example here and it would be very helpful - I'm already even thinking it would be a great example to template on top for even easier test here.
Humm, so they are probably doing more checks, maybe because of internal wave scenarios!? |
@ocehugo, I will send you a link to our data for you to test. And the UWA code. We can work through the pre-processing parts together - we turn off a lot of the RDI thresholds anyway, but they should be coded in somewhere. |
@ocehugo, I need to get our EAC ADCP data QC'd in the next week. So, I'll make my own efforts in matlab to do that. If you manage to get to this issue soon, let me know. It seems that your thoughts on QCing first, then averaging work with what I've discovered below. Reviewing the information available, here are my notes on how UWA perform QC of their single-ping data. The process matches with the RDI documentation 'ADCP Coordinate Transformation. Formulas and Calculations' (adcp+coordinate+transformation_Jan10.pdf). The UWA code is located here: https://bitbucket.org/uwaoceandynamics/adcp-rdi/src/master/ UWA single ping ADCP data processing steps. adcprdi.m
Error velocity: Perform bin mapping and 3-beam solutions. Then convert to ENU data (derives east, north, up and error velocities) adcp_postclean.m
Tests done now in UWA: Add magnetic declination
adcp_avg.m Binning process is done over multiple deployments and time gaps are filled with NaNs to an even grid (fill_ts_adcp.m, filldate_adcp.m and join_ts_adcp.m) |
@BecCowley - thanks for the information. As you could see, we had prioritized other issues and this was postponed for later in the roadmap. The information above is super-useful and we should include it in the wiki after we can actually do the entire thing in the toolbox. Thinking forward a bit, what is your opinion about what should be done? my guess is to Follow UWA steps, assuming the fish detection work as expected for beam data, as well as the other tests |
@ocehugo, I'm just working through what we should do next in my head. Here's my thoughts: Map the process described above with what the toolbox already does. Plus add anything missing, which gets us to the stage of a mostly QCd and averaged dataset. I think it can be done prior to the visualisation in the toolbox GUI. I think the toolbox should only display the bin-averaged data upon which we can do further QC if required. I'd like to make this a pre-processing step, with some plots available outside the toolbox (or during the import/pp step) to help make decisions about the thresholds to use. For example, plots I find useful are mean/min/max/std deviation plots of correlation, percent good, echo intensity, error velocity. There will of course be a fish threshold setting if the ADCP hasn't applied one already. I also like to see pcolor plots of the time series vs pressure/depth with the data points that have failed with each threshold. I can share examples, but these will be slow to produce & review in single-ping datasets. Perhaps they can be done one at a time. The user should be able to adjust the thresholds as many times as they need before continuing to the toolbox QC gui. I wonder if we can build a separate tool to just use on ADCP data for determining the thresholds for each instrument? It would be useful for all ADCP datasets. The process above doesn't include the vertical velocity and horizontal velocity QC tests (as well as all the other standard QC tests). These need to be done after the averaging step. The idea is to minimise the amount of data the toolbox displays to reduce the time it takes to QC the large dataset. If all the thresholds are set well, the data shouldn't need any further QC. It's the setting of thresholds and trying to retain as much good data as possible that is time consuming with big datasets due to the size of the plots. Reviewing and adjusting them is slow. |
Agree - we need a step that allows easy back/forth inspection/exploration - this is currently a weak spot. By your description above, I think the best way to implement that is a new "button" or "view" in the main toolbox window (e.g. "run diagnostics/processing" ). It's more flexible such to avoid re-importing workflows (and slowness) and also allow saving the raw data. Again, Although one could argue that, if the manufacturer software is doing averages, we should do the same and follow from that, I point to the actual UWA workflow above as a de-facto example of why we shouldn't follow this line of thinking.
The "view/window" mentioned above would be useful for other things too (say QC exploration). This is quite some work (apart from the workflow design and metadata registry), particularly with UI elements/interaction since every UI element/interaction takes some time to be fully tested. However, If during your diagnostic/parameter exploration you could modularize the plots/analysis we could probably reuse/adapt it here to start ticking boxes towards that. Also, the workflow design is pretty much what you'll be doing and such, we just need to fence it correctly over here (ensuring order of processing, parameter range, etc).
I completely agree in reducing the space dimension for the end-products, but I still stand for "allowing raw-data to be stored/seen/used by whoever is interested". Again, I got some options to try to speed things up, but I cannot commit to investigate right now. |
@ocehugo, thanks for the feedback. One issue I'm unsure of is the delivery of single ping data and hourly averaged data to the AODN. So far, I have thought of these options:
I am not clear on how to deliver non-qcd hourly data, simply because the process we are looking at does the QC before the averaging. Any thoughts on this would be helpful. |
This is great - keep me posted!
I'm not certain of your delivery requirements. AFAIK, you may just deliver the FV00 (raw, non-qc,single-poing) and the FV01 (qced). If you can do it the QC over single-ping, great, this is your FV01 file.. However, if your workflow (whateveer reason) can only be done over hourly data, this is it, your FV01 data is hourly average with QC flags, and differs from FV00 in both QC content, processing/sampling length. The aodn team had created some code to generate FV02 files, like aggregations and time averages. It may worth to check if you really need to create those. |
@ocehugo, I have constructed a 'singlePing' branch of the toolbox to enable the QC of the data. How best to hand over what I've done? I can step through it with you first if you like, then maybe you can take the branch over? For the files that need to be sent to IMOS (single ping and averaged), I will open the conversation with @mhidas to start. I guess @petejan would like to be involved too. Anyone else who should be involved that you can think of? |
@BecCowley , just send us a PullRequest from your fork and we go from there. I think this discussion belongs to the ANMN internal discussions, but better to start the discussion with some "draft proposal" so it's not left open for too long. @ggalibert !? |
Just adding in a list of possible factors here that would eventually improve performance, so I don't need to go back to my notes:
|
After some research, I think we need a pre-processing tool to ensemble average single-ping data on import to the toolbox. Below is documentation of what I discovered (with some help from @sspagnol and David Williams). I am happy to contribute to the code development, but probably would be best written by @ocehugo & team. And, I need it soon if possible!
Investigation of using RDI tools to average:
From David Williams (edited by me to remove irrelevant bits):
Now looking at how other groups average single-ping data.
Probably, the simplest thing to do at this stage would be to use the UWA tools which are in matlab and quite straightforward and clearly described. They have tried to duplicate the on-board processing done by the RDI ADCPs when they ensemble average during deployment (including fish thresholds, EA thresholds etc). The code I have from UWA was sent to me via @sspagnol and possibly there are more recent versions we can ask them for. I also have the Scripps code, again in matlab.
The text was updated successfully, but these errors were encountered: