Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q needs the quantizer scaling #24

Open
WillyChap opened this issue Mar 16, 2024 · 3 comments
Open

Q needs the quantizer scaling #24

WillyChap opened this issue Mar 16, 2024 · 3 comments
Assignees

Comments

@WillyChap
Copy link
Collaborator

Q is pretty bad even after 1 time step (see figure below). It's my belief that this is the major contributor to the grid artifacts. (look at the pole where there is minimum moisture available. The loss is CLEARLY focusing on the tropics this is largely a result of the vast difference in scale between tropical and polar specific humidity. Having a single scalar Z score per level will exasperate this issue. I think we need to implement the quantizer scaling ASAP.

This image shows the 2nd auto-recursive time step of the Q field, before and after the diffusion filtering. You can see that the grid artifact is already present, and the diffusion filtering has to do ALOT in order to minimize these errors. It will cause a lot of smoothing unless we can fix it.

panel 1 (difference before and after filtering)
panel 2 (Q prior to filter)
panel 3 (Q post filter)

image
@WillyChap
Copy link
Collaborator Author

@jsschreck @djgagne

@djgagne
Copy link
Collaborator

djgagne commented Mar 17, 2024

@WillyChap I have started a PR with updates to the distributed scalers in bridgescaler, including the DQuantileTransformer. It now supports a channels-first arrangement of the data cube, which also provided about a 50% speedup compared with channels last. I highly recommend running DQuantileTransformer with the distribution="normal" option so that the transformed values cover the full range of real numbers.

@djgagne
Copy link
Collaborator

djgagne commented Mar 21, 2024

I made an initial set of DQuantileTransformer scalers (1 per year), from the original zarr data being used for training in the applications/scaler.py file in my branch currently under PR. The scaler json strings are saved out to a parquet file with separate scalers for the 3D and surface variables. To read the scalers from the parquet file, use the following code example.

import pandas as pd
from bridgescaler import read_scaler

scaler_file = "/glade/campaign/cisl/aiml/credit_scalers/era5_quantile_scalers_2024-02-13_07:33.parquet"
scaler_df = pd.read_parquet(scaler_file)
# Convert scaler json strings to DQuantileTransformers
scaler_3ds = scaler_df["scaler_3d"].apply(read_scaler)
scaler_surfs = scaler_df["scaler_surface"].apply(read_scaler)
# Sum yearly scaler objs to one total scaler for 3d vars and surface vars
scaler_3d = scaler_3ds.sum()
scaler_surf = scaler_surfs.sum()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants