-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gms): store update events in a new index in ElasticSearch #135
Changes from 8 commits
fecad02
4e228e7
b16f90c
47a2191
64fea48
19b80dc
3fa5385
40b9a52
56e9cb2
94aef19
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
{ | ||
"index_patterns": ["*PREFIXdatahub_update_event*"], | ||
"data_stream": { }, | ||
"priority": 499, | ||
"template": { | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"type": { | ||
"type": "keyword" | ||
}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where are you storing your document? _source? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fields such as urn and other info are automatically indexed by ES when inserting the document into the index so I did not defined them here. Should I define the fields here so it is more transparent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you saying that it is using dynamic-mapping? I think we should define the fields since the schema is known. |
||
"timestamp": { | ||
"type": "date" | ||
} | ||
} | ||
}, | ||
"settings": { | ||
"index.lifecycle.name": "PREFIXdatahub_usage_event_policy" | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to create a unique id for the ES documents? are we not storing each events related to the dataset?
can we just use the _id generated by ES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are using time and content as the hash, it is almost certain that it will result in a new doc. it is unlikely that there will be any update to the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ES client in java does not automatically assign a _id to the document, so I need to create a unique _id for each event if not there is an error in uploading the document to the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the idea here, where each update event is an individual document so we can track all updates over a time period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.javadoc.io/doc/org.elasticsearch/elasticsearch/7.8.0/org/elasticsearch/action/index/IndexRequest.html#id()
what if you do not set the id for the document? will ES generate the id for you? Lets see if we can use autogenerated ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm auto generated id works. I have removed the relevant parts to make use of the auto generated ID