Q needs the quantizer scaling #24

WillyChap · 2024-03-16T16:04:47Z

Q is pretty bad even after 1 time step (see figure below). It's my belief that this is the major contributor to the grid artifacts. (look at the pole where there is minimum moisture available. The loss is CLEARLY focusing on the tropics this is largely a result of the vast difference in scale between tropical and polar specific humidity. Having a single scalar Z score per level will exasperate this issue. I think we need to implement the quantizer scaling ASAP.

This image shows the 2nd auto-recursive time step of the Q field, before and after the diffusion filtering. You can see that the grid artifact is already present, and the diffusion filtering has to do ALOT in order to minimize these errors. It will cause a lot of smoothing unless we can fix it.

panel 1 (difference before and after filtering)
panel 2 (Q prior to filter)
panel 3 (Q post filter)

WillyChap · 2024-03-16T16:06:43Z

@jsschreck @djgagne

djgagne · 2024-03-17T23:17:24Z

@WillyChap I have started a PR with updates to the distributed scalers in bridgescaler, including the DQuantileTransformer. It now supports a channels-first arrangement of the data cube, which also provided about a 50% speedup compared with channels last. I highly recommend running DQuantileTransformer with the distribution="normal" option so that the transformed values cover the full range of real numbers.

djgagne · 2024-03-21T16:45:59Z

I made an initial set of DQuantileTransformer scalers (1 per year), from the original zarr data being used for training in the applications/scaler.py file in my branch currently under PR. The scaler json strings are saved out to a parquet file with separate scalers for the 3D and surface variables. To read the scalers from the parquet file, use the following code example.

import pandas as pd
from bridgescaler import read_scaler

scaler_file = "/glade/campaign/cisl/aiml/credit_scalers/era5_quantile_scalers_2024-02-13_07:33.parquet"
scaler_df = pd.read_parquet(scaler_file)
# Convert scaler json strings to DQuantileTransformers
scaler_3ds = scaler_df["scaler_3d"].apply(read_scaler)
scaler_surfs = scaler_df["scaler_surface"].apply(read_scaler)
# Sum yearly scaler objs to one total scaler for 3d vars and surface vars
scaler_3d = scaler_3ds.sum()
scaler_surf = scaler_surfs.sum()

WillyChap assigned djgagne, WillyChap and jsschreck Mar 16, 2024

WillyChap mentioned this issue Mar 22, 2024

Quantile static #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q needs the quantizer scaling #24

Q needs the quantizer scaling #24

WillyChap commented Mar 16, 2024

WillyChap commented Mar 16, 2024

djgagne commented Mar 17, 2024

djgagne commented Mar 21, 2024

Q needs the quantizer scaling #24

Q needs the quantizer scaling #24

Comments

WillyChap commented Mar 16, 2024

WillyChap commented Mar 16, 2024

djgagne commented Mar 17, 2024

djgagne commented Mar 21, 2024