-
Notifications
You must be signed in to change notification settings - Fork 9
Evaluator
The model evaluation is ran through the following command:
srun python atmorep/core/evaluate.py
Please note that evaluate.py
is a wrapper for atmorep/core/evaluator.py
, which contains a function for each supported evaluation option.
The option BERT
evaluates the model in the so called BERT mode, so randomly masking some of the tokens within the loaded hypercube source
, with random choices spanning both the space and time dimensions.
example:
mode, options = 'BERT', {'years_test' : [2021], 'fields[0][2]' : [123], 'attention' : False}
this examples runs the 'BERT' mode for 2021, model level 123, without storing the attention maps (generally very time/memory consuming).
The option forecast
evaluates the model for the forecasting task, so completely masking the last N tokens in time, with N defined by forecast_num_tokens
.
example:
mode, options = 'forecast', {'forecast_num_tokens' : 1} #, 'fields[0][2]' : [123, 137], 'attention' : False }
this examples is for a 3h forecasting window (1 masked token assuming token_size of 3 in time), model levels 123 and 137, and attention maps are not stored. For longer time-windows e.g. 12h forecast: forecast_num_tokens = 4.
Note: Please remember that these are masked tokens within the pre-loaded source
hypercube, so forecast_num_tokens < num_tokens[0]
. The case forecast_num_tokens = num_tokens[0]
works, but is not meaningful either, as all the source cube would be masked.
The option global_forecast
runs the model in forecast mode, but tiling the globe so the tokens form a global forecast (-90,90 lat - 0,360 lon).
example:
mode, options = 'global_forecast', { 'fields[0][2]' : [123, 137],
'dates' : [[2021, 2, 10, 12]],
'token_overlap' : [0, 0],
'forecast_num_tokens' : 1,
'attention' : False }
this examples runs a global forecast of 3h ('forecast_num_tokens' : 1), with lead times specified in dates
and for just model levels 123 and 137. token_overlap
specifies the spatial overlap between two adjacent tiles (tokens), to catch for example fast moving waves and increase the forecasting accuracy. The suggested values are [0,0] (no overlap) or [2, 6], expressed in grid points.
In the following we report less used evaluation options, which might not work out of the box due to backward compatibility issues:
The option fixed_location
evaluates the model fixing the center of the tokens to a specific location in space, and randomising over the time dimension.
The option temporal_interpolation
evaluates the model for the temporal interpolation task, so masking the middle token of the loaded hypercube as described in the "Trainer" page.
the option global_forecast_range
evaluates the model in global forecast (see above), but for N consecutive steps starting from a single lead_time defined by cur_date
. The number of steps is defined by the num_steps
parameter.
The output of the evaluation step is a set of .zarr
files. Example:
-
model_idc96xrbip.json
= model settings used in the evaluation phase -
results_idc96xrbip_epoch00000_pred.zarr
= file storing all predicted tokens -
results_idc96xrbip_epoch00000_target.zarr
= file storing the masked target tokens (aka ground truth) in the same format as predictions -
results_idc96xrbip_epoch00000_source.zarr
= file storing all the loaded tokens (masked tokens are stored as zeros) -
results_idc96xrbip_epoch00000_ens.zarr
= file storing the ensemble predictions - (optional)
results_idc96xrbip_epoch00000_attention.zarr
= file storing the attention scores. Written only ifattention = True
at evaluation stage.
The AtmoRep Collaboration - last update: April 2024