Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnSumCheck should treat an unexpected type as a test failure, not exception #46

Open
colindean opened this issue May 22, 2020 · 4 comments

Comments

@colindean
Copy link
Collaborator

@samratmitra-0812 pointed out:

This behaviour of throwing an exception for unsupported type [in columnSumCheck] is different from columnMaxCheck, where it is treated as a normal check failure. I think we should make both of them consistent.

@phpisciuneri
Copy link
Contributor

@colindean @samratmitra-0812 I think handling as an exception in ColumnMaxCheck is more appropriate. In both cases it is an edge that isn't covered, rather than a logical failure of the test. What do you think?

@colindean
Copy link
Collaborator Author

colindean commented May 22, 2020

I can see it either way. Think of the failure mode when DV encounters an unsupported column type:

  1. Log and fail the individual check
  2. Explode

DV expects the user to inspect failures in the report at the end of a run. Should DV expect the user to know the type of the data in a column, especially when DV's config cannot necessarily reflect that type? That is, YAML only knows JSON numeric types, which are integer and double (I think they're called "fractions" technically in JSON) and exponential.

Also, is it more user-friendly to allow a validation run to complete or should it die immediately at runtime?

Immediate exit for normally long-running processes should be focused at the start of the run. If there's a way we can detect supported types in configCheck, we should do that if DV will exit ungracefully when it encounters an unhandled case.

One of the things I liked about my expressions approach to columnSumCheck was pushing the type handling to the Spark/Hive level, so we didn't need to maintain a list of supported data types that generally ends up being an exhaustive list of subtypes of NumericType without a way to ensure that we've handled all cases exhaustively.

@phpisciuneri
Copy link
Contributor

Immediate exit for normally long-running processes should be focused at the start of the run. If there's a way we can detect supported types in configCheck, we should do that if DV will exit ungracefully when it encounters an unhandled case.

@colindean This is a good point. Maybe too general of a statement, but at level of config check we can throw exceptions, it will be a fast failure to your point. And at deeper levels (like quick check) we just log the error so that other checks still have the chance to complete.

@samratmitra-0812
Copy link
Collaborator

@colindean @phpisciuneri Datatype verification in configCheck is being done in NegativeCheck and StringRegexCheck. But even there, it is being treated as a normal check failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants