-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester: Validate completed blocks #4256
Ingester: Validate completed blocks #4256
Conversation
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
7b01ac7
to
3a0f62c
Compare
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
|
||
// Validate will do a basic sanity check of the state of the parquet file. This can be extended to do more checks in the future. | ||
// This method should lean towards being cost effective over complete. | ||
func (b *backendBlock) Validate(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would not be simpler trying to open the file directly with parquet-go:
o := []parquet.FileOption{
parquet.SkipBloomFilters(true),
parquet.SkipPageIndex(true),
parquet.FileSchema(parquetSchema),
parquet.FileReadMode(parquet.ReadModeAsync),
}
reader := NewBackendReaderAt(ctx, b.r, DataFileName, b.meta)
_, err := parquet.OpenFile(reader, int64(b.meta.Size_), o...)
if err != nil {
return fmt.Errorf("failed to read parquet fike: %w", err)
}
OpenFile seems to do essentially the same plus some other validations. Is this difference noticeable enough? Since this is at startup time maybe it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenFile also unmarshals the thrift footer which is more costly. I was trying to avoid paying those allocs on startup.
// Validate will do a basic sanity check of the state of the parquet file. This can be extended to do more checks in the future. | ||
// This method should lean towards being cost effective over complete. | ||
func (b *backendBlock) Validate(ctx context.Context) error { | ||
if b.meta == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to reassemble this information from data on disk and still avoid paying the allocs on startup like you mention above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly? what do you mean by "this information"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
What this PR does:
Adds a
Validate()
method to backend blocks and uses it in ingester startup to do basic confirmation of block health before reloading them. This prevents corrupt blocks from being loaded and eventually pushed to object storage.Which issue(s) this PR fixes:
Fixes #4166 as near as we can tell. At least it covers the instance seen internally.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]