-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add streamflow #5
Comments
I believe we can get median streamflow for COMIDs from the National Water Model using a package called |
So this might not be as easy as I thought because of a recent data hack. Waiting to hear back from this. BUT we could at least download the NetCDF files ourselves and parse. Just not as clean as using this package. Here's the code I was hoping to test out but am getting the error: # Get median streamflow from NWM data
# Data release: https://www.hydroshare.org/resource/89b0952512dd4b378dc5be8d2093310f/
# R package: https://github.com/mikejohnson51/nwmTools
# Paper: https://www.nature.com/articles/s41597-023-02316-7
library(nwmTools)
nwm_output <- readNWMdata(comid = 6078267,
startDate = "1993-01-01",
endDate = "2017-12-31",
version = 1.2)
# Aggregate to yearly median, then overall median
nwm_medianFlow_yearly <- aggregate_y(fun = "median") |
Once that is working, I plan to compare the NWM median streamflow to the NWIS median streamflow for all the sites/comids in my thesis work. Hopefully shows a mostly 1:1 relationship :) |
OK, here is an example of accessing NWM raw output via zarr files (don't have to download and can use cloud resources to compute). I am going to work on formalizing this on Friday. Upside is that I don't have to download any files and can get around the problems with the package SetupYou need the R package install.packages('reticulate')
# Install python libraries (only need to do this once)
reticulate::py_install("xarray")
reticulate::py_install("fsspec")
reticulate::py_install("numpy") Running exampleHere is an example that calculates median Q for 5 COMIDs between 2008-2018. I followed along with Rich Signell's Jupyter Notebook to understand the data structure and basic commands. # Load the reticulate package in order to run python commands
library(reticulate)
# Import Python packages
xr <- import("xarray")
fsspec <- import("fsspec")
np <- import("numpy")
# Use Python's slice function to create a slice object
slice <- import_builtins()$slice
# Load streamflow from zarr file stored in the NWM's Amazon S3 bucket as an xarray
nwm_url <- 's3://noaa-nwm-retro-v2-zarr-pds' # I tried newer versions but
nwm_zarr <- xr$open_zarr(fsspec$get_mapper(nwm_url, anon=TRUE), consolidated=TRUE)
nwm_streamflow_xr <- nwm_zarr['streamflow']
# Define the time range to filter
# This version of the data is available from 1993-2018
# as declared https://registry.opendata.aws/nwm-archive/
start_time <- "2008-01-01 00:00:00"
end_time <- "2018-12-31 00:00:00"
nwm_streamflow_xr_lastDecade <- nwm_streamflow_xr$sel(time=slice(start_time, end_time))
# Define the 5 COMIDs to use and filter the streamflow xarray data to just those
desired_comids <- np$array(c(6078267L, 6078329L, 6078383L, 5867679L, 5867685L))
nwm_streamflow_comids_xr <- nwm_streamflow_xr_lastDecade$sel(feature_id=desired_comids)
# Calculate the median flow value for each COMID and convert to regular R vector
nwm_streamflow_median_xr <- nwm_streamflow_comids_xr$median(dim='time', skipna=TRUE)
nwm_streamflow_median <- py_to_r(nwm_streamflow_median_xr$data) |
I tried comparing what it was like to calculate with different numbers of COMIDS. I thought that maybe grouping is faster. It is but not grouping too many. I think the ideal groupings for this methods are 5 COMIDs at a time. So, we should definitely parallelize this or it will take a zillion years to calculate median Q. It took 101 seconds for 1 COMID. Here's my test setup: time1 <- system.time({
desired_comids1 <- np$array(14640391L)
nwm_streamflow_comids_xr1 <- nwm_streamflow_xr_lastDecade$sel(feature_id=desired_comids1)
message('Starting median Q calculation for 1 COMID')
nwm_streamflow_median_xr1 <- nwm_streamflow_comids_xr1$median(dim='time', skipna=TRUE)
nwm_streamflow_median1 <- py_to_r(nwm_streamflow_median_xr1$data)
})
time5 <- system.time({
desired_comids5 <- np$array(c(6078267L, 6078329L, 6078383L, 5867679L, 5867685L))
nwm_streamflow_comids_xr5 <- nwm_streamflow_xr_lastDecade$sel(feature_id=desired_comids5)
message('Starting median Q calculation for 5 COMIDs')
nwm_streamflow_median_xr5 <- nwm_streamflow_comids_xr5$median(dim='time', skipna=TRUE)
nwm_streamflow_median5 <- py_to_r(nwm_streamflow_median_xr5$data)
})
time10 <- system.time({
desired_comids10 <- np$array(c(6078383L, 9441199L, 4151212L, 6128763L, 13153643L, 3715602L,
7665728L, 6250614L, 12264996L, 1777616L))
nwm_streamflow_comids_xr10 <- nwm_streamflow_xr_lastDecade$sel(feature_id=desired_comids10)
message('Starting median Q calculation for 10 COMIDs')
nwm_streamflow_median_xr10 <- nwm_streamflow_comids_xr10$median(dim='time', skipna=TRUE)
nwm_streamflow_median10 <- py_to_r(nwm_streamflow_median_xr10$data)
})
time25 <- system.time({
desired_comids25 <- np$array(c(14365060L, 22336379L, 22337975L, 22338427L, 22340329L, 3717136L,
3808745L, 3806147L, 2087785L, 3405683L, 3407001L, 1814983L, 6848607L,
12005077L, 904040051L, 15582369L, 15537883L, 21972746L, 2252949L,
4390605L, 4390909L, 4390597L, 6013072L, 7665728L, 7666946L))
nwm_streamflow_comids_xr25 <- nwm_streamflow_xr_lastDecade$sel(feature_id=desired_comids25)
message('Starting median Q calculation for 25 COMIDs')
nwm_streamflow_median_xr25 <- nwm_streamflow_comids_xr25$median(dim='time', skipna=TRUE)
nwm_streamflow_median25 <- py_to_r(nwm_streamflow_median_xr25$data)
}) |
I took a look at this, because I'd rather learn something new than work on talks. I figured out how to do it in R. I avoid python in R when I can, since it seemingly always leads to version errors. It's still slow as heck. Part of that is because the zarr files are in kind of useless chunks. Would be quicker if they were chunked by comid or time, but I'm sure there's a logical reason to chunk them in big rectangles across the entire raster. [sidebar, I was also using v2. I could get v3 streamflow to load, but not the feature_id. There must be something wrong on their end] The chunks are 672 x 30000 [hours x comid]. If we pull from the least chunks possible, it speeds it up a lot.
|
If this works (which I think it does), we should pull all the comids and make a median streamflow output file, and then use that in our workflow. Instead of running this more than once. |
Amazing! Can't believe it only took 30 seconds to do this. Although, what is the output format for Also, I assume the |
It's a raster, so can then do median of each column. Doesn't seem too bad.
There are a lot of zeros, and a lot of -999900. So have to figure that out... but maybe it's from comids we're not using. Also, I assume feature_id = comid? |
Huh, maybe it is my computer. I just ran your first code chunk to get the data and my download time is so much longer. 4 minutes compared to your 30 sec!
The median calculation code was just as fast as yours, though. And I have also been assuming |
Tried it on my CFL desktop and it also took 30 seconds. |
How can we do this? Is there a modeled discharge for CONUS using comid?
Thus far have tried stream order, but wasn't useful.
The text was updated successfully, but these errors were encountered: