Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Lenses into special operators (like they are in Adaptive Schemas) #344

Open
okennedy opened this issue Jul 31, 2019 · 2 comments
Open

Comments

@okennedy
Copy link
Member

No description provided.

@okennedy
Copy link
Member Author

okennedy commented Aug 1, 2019

Lens/Model Roles

  1. Tag data values as uncertain/problematic.
  2. Provide a human-readable description of the reason that the value is uncertain/problematic.
  3. Produce a "best guess" value (that is problematic)
  4. Provide an identifier to allow uncertain values to be acknowledged (un-tagged).
  5. Identify a procedure by which the user can address the error / repairing the value.
  6. Optionally provide a facility for sampling alternative values.
  • 1-3 are the critical features.
  • 4 and 5 are useful support, although the current (heavy-weight) implementation of overriding the model value isn't the only way to implement repairs.
  • 6 hasn't been exploited in a long time. It's also generally one of the weakest points whenever I present Mimir. However, the feature is useful for research purposes, and might provide some benefits for applying Mimir to simulation workloads (as opposed to data cleaning)

@okennedy
Copy link
Member Author

okennedy commented Aug 1, 2019

Limitations of the Existing Approach

  1. A single opaque blob (the model) handles everything, so it needs to be serialized and passed around.
  2. Query structure is fixed at the time of lens creation (no option-driven rewrites)
  3. Feedback has to go through the model, when often a lighter-weight fix (e.g., manually replacing null values) exists.
  4. Sampling is not always feasible (e.g., no well-defined distribution exists, or the domain is infinite)
  5. Acknowledgements are handled by the big opaque blob, adding more junk to be serialized and passed around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant