You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, deduplication in the visualization workflow starts after the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, and/or 3D tiling), then the duplicate rows are flagged with a boolean attribute, then the polygons that are True for that attribute are removed at the specified step.
For some datasets, deduplicating the data before it is tiled could be beneficial. For example, Ingmar Nitze's Arctic lake change dataset is composed of UTM zones that overlap at the edges, and he prefers to have the data deduplicated before it is input into the viz-workflow. That way, whether users are interested in the viz output (tilesets of lakes) or the input data, they can have access to only the deduplicated data.
This functionality is in the exploratory phase. An example of applying of the neighbor deduplication approach to non-tiled data can be found in this issue. One way this functionality could be integrated into the viz-staging package is by adding more acceptable inputs for the deduplication options in the config. An example: deduplicate_at could accept a new option like "before_tiling". In addition to new flexibility in the config, certain pre-deduplication steps would need to happen such as adding a source_file attribute to the input data.
The text was updated successfully, but these errors were encountered:
One more consideration for this feature is that any polygons that intersect the antimeridian in the input data will need to be split prior to deduplication, which is cohesive with the need for them to be split before we stage the files anyway. This was identified with the lake change data (see here). I included an example of how to do this in R here, and example in Python is here.
Currently, deduplication in the visualization workflow starts after the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, and/or 3D tiling), then the duplicate rows are flagged with a boolean attribute, then the polygons that are
True
for that attribute are removed at the specified step.For some datasets, deduplicating the data before it is tiled could be beneficial. For example, Ingmar Nitze's Arctic lake change dataset is composed of UTM zones that overlap at the edges, and he prefers to have the data deduplicated before it is input into the viz-workflow. That way, whether users are interested in the viz output (tilesets of lakes) or the input data, they can have access to only the deduplicated data.
This functionality is in the exploratory phase. An example of applying of the neighbor deduplication approach to non-tiled data can be found in this issue. One way this functionality could be integrated into the
viz-staging
package is by adding more acceptable inputs for the deduplication options in the config. An example: deduplicate_at could accept a new option like "before_tiling". In addition to new flexibility in the config, certain pre-deduplication steps would need to happen such as adding asource_file
attribute to the input data.The text was updated successfully, but these errors were encountered: