You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In many datasets, we want to visualize an attribute of the data rather than one of the custom statistics for polygon count or coverage that we use for the ice-wedge polygon data. An example can be found in the lake area time series dataset, where we visualize the areas of permanent water and seasonal water as separate layers, which are in units of hectares and are pulled straight from the vector data for each polygon. These attributes have ranges with a minimum of 0, and a max value that is an integer that varies per year, say 120,025. This data contains outliers and is not normally distributed, so setting the range in the config for these stats cannot simply be [0, 120025]. (The min and max for each z-level is calculated by default based on the raster_summary.csv if the range is not set by the user.) If we do allow the range max to be the max value of the attribute, the result is the web tiles palette does not represent all values of the data clearly on the portal. We see too many polygons with the value of the palette that represents the lower end of the values (like light blue), and few polygons in the tileset show the value of the palette that represents the largest values of the range (like dark blue). In order for users to best understand the data, there should be all values of the palette represented. By removing outliers in the range set in the config (meaning we set the max value to one that is lower than the max value in the attribute), we better represent the middle values in the palette too. Any values that fall beyond the max value in the config are set to the same color as the max in the range, so this is just like winsorizing.
Determining the best value for the max in the range would best be done mathematically, like using a percentile, or an approach that is more complicated and specific. Using a percentile was explored for the lake area time series data here. Depending on the distribution of each particular attribute, the percentile should be adjusted. This can be time consuming, so it would be best to integrate an approach into viz-staging that sets the range values to the ones that do not include outliers.
The text was updated successfully, but these errors were encountered:
In many datasets, we want to visualize an attribute of the data rather than one of the custom statistics for polygon count or coverage that we use for the ice-wedge polygon data. An example can be found in the lake area time series dataset, where we visualize the areas of permanent water and seasonal water as separate layers, which are in units of hectares and are pulled straight from the vector data for each polygon. These attributes have ranges with a minimum of 0, and a max value that is an integer that varies per year, say 120,025. This data contains outliers and is not normally distributed, so setting the range in the config for these stats cannot simply be
[0, 120025]
. (The min and max for each z-level is calculated by default based on theraster_summary.csv
if the range is not set by the user.) If we do allow the range max to be the max value of the attribute, the result is the web tiles palette does not represent all values of the data clearly on the portal. We see too many polygons with the value of the palette that represents the lower end of the values (like light blue), and few polygons in the tileset show the value of the palette that represents the largest values of the range (like dark blue). In order for users to best understand the data, there should be all values of the palette represented. By removing outliers in the range set in the config (meaning we set the max value to one that is lower than the max value in the attribute), we better represent the middle values in the palette too. Any values that fall beyond the max value in the config are set to the same color as the max in the range, so this is just like winsorizing.Determining the best value for the max in the range would best be done mathematically, like using a percentile, or an approach that is more complicated and specific. Using a percentile was explored for the lake area time series data here. Depending on the distribution of each particular attribute, the percentile should be adjusted. This can be time consuming, so it would be best to integrate an approach into
viz-staging
that sets the range values to the ones that do not include outliers.The text was updated successfully, but these errors were encountered: