Integrate option to deduplicate data before tiling #55

julietcohen · 2024-06-10T21:16:58Z

Currently, deduplication in the visualization workflow starts after the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, and/or 3D tiling), then the duplicate rows are flagged with a boolean attribute, then the polygons that are True for that attribute are removed at the specified step.

For some datasets, deduplicating the data before it is tiled could be beneficial. For example, Ingmar Nitze's Arctic lake change dataset is composed of UTM zones that overlap at the edges, and he prefers to have the data deduplicated before it is input into the viz-workflow. That way, whether users are interested in the viz output (tilesets of lakes) or the input data, they can have access to only the deduplicated data.

This functionality is in the exploratory phase. An example of applying of the neighbor deduplication approach to non-tiled data can be found in this issue. One way this functionality could be integrated into the viz-staging package is by adding more acceptable inputs for the deduplication options in the config. An example: deduplicate_at could accept a new option like "before_tiling". In addition to new flexibility in the config, certain pre-deduplication steps would need to happen such as adding a source_file attribute to the input data.

The text was updated successfully, but these errors were encountered:

julietcohen · 2024-08-14T17:37:26Z

One more consideration for this feature is that any polygons that intersect the antimeridian in the input data will need to be split prior to deduplication, which is cohesive with the need for them to be split before we stage the files anyway. This was identified with the lake change data (see here). I included an example of how to do this in R here, and example in Python is here.

julietcohen added the enhancement New feature or request label Jun 10, 2024

julietcohen added the good first issue Good for newcomers label Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate option to deduplicate data before tiling #55

Integrate option to deduplicate data before tiling #55

julietcohen commented Jun 10, 2024

julietcohen commented Aug 14, 2024 •

edited

Loading

Integrate option to deduplicate data before tiling #55

Integrate option to deduplicate data before tiling #55

Comments

julietcohen commented Jun 10, 2024

julietcohen commented Aug 14, 2024 • edited Loading

julietcohen commented Aug 14, 2024 •

edited

Loading