-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write metadata cache data to mappings _meta with refresh time update #805
base: main
Are you sure you want to change the base?
Write metadata cache data to mappings _meta with refresh time update #805
Conversation
…rch-project#744) * write mock metadata cache data to mappings _meta Signed-off-by: Sean Kao <[email protected]> * Enable write to cache by default Signed-off-by: Sean Kao <[email protected]> * bugfix: _meta.latestId missing when create index Signed-off-by: Sean Kao <[email protected]> * set and unset config in test suite Signed-off-by: Sean Kao <[email protected]> * fix: use member flintSparkConf Signed-off-by: Sean Kao <[email protected]> --------- Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
add label to backport to the nexus branch. |
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
5f3af3b
to
7a8e1f3
Compare
* Handles refresh for refresh mode AUTO, which is used exclusively by auto refresh index with | ||
* internal scheduler. | ||
*/ | ||
private def refreshIndexAuto( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we update for auto refresh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now only track lastRefreshStartTime
and lastRefreshCompleteTime
for manual refresh and auto refresh with external scheduler.
for streaming job, we use createTime
to track the streaming job start time.
there's no mechanism for tracking start/end time for each micro batch update yet, so updating the 2 timestamp in the refresh could be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some comment
...ration/src/main/scala/org/opensearch/flint/spark/scheduler/util/IntervalSchedulerParser.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sean Kao <[email protected]>
Note to any reviewer if curious, the force push only amended commit 2f58f56 and nothing else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we call it metadata Cache
? I was not quite sure the indication of cache
.
i do welcome a better name... was kind of struggling to come up with a name. I'm not too convinced that MetadataCache is the best one. |
Description
Metadata Cache Writer
For the most part, same as
In addition to the regular metadata storage using
FlintIndexMetadataService
, we're dual-writing additional fields, defined byFlintMetadataCache
, to the index mappings_meta
field. It's intended for frontend users to access some crucial metadata for an index quickly without invoking another backend API call.This PR adds such fields for all indexes, if the spark config
spark.flint.metadataCacheWrite.enabled
is set to true._meta.properties.metadataCacheVersion
: "1.0"_meta.properties.refreshInterval
: Integer. Refresh interval of an index measured in seconds. This field is added only if index refresh type is auto refresh and refresh_interval is set_meta.properties.sourceTables
: Array of Strings. For now, it's mocked data. Update coming in later PR._meta.properties.lastRefreshTime
: Long. Timestamp in milliseconds when last refresh happened. This field is added only if index already gets refreshed at least onceLast Refresh Time
Added two new fields in
FlintMetadataLogEntry
and bumped version of its json doc from 1.0 to 1.1 (because adding new field but not changing existing fields)These are accurate only for manual refresh (full, incremental) and external scheduler for auto refresh.
For internal scheduler, the
jobStartTime
(orcreateTime
inFlintMetadataLogEntry
) is used to track streaming job start time.I'm not reusing
createTime
because they should be updated at different times.For createTime (for internal scheduler) it's during
refreshIndex
,recoverIndex
,updateIndexManualToAuto
But for lastRefreshStartTime and lastRefreshCompleteTime (for manual refresh and external scheduler) it's only updated in
refreshIndex
Related Issues
_meta
as read cache for frontend user to access #746By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.