Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log persistance for Make Data Count #156

Open
poikilotherm opened this issue Jan 24, 2020 · 3 comments
Open

Add log persistance for Make Data Count #156

poikilotherm opened this issue Jan 24, 2020 · 3 comments
Labels
enhancement New feature or request integration Everything regarding a Dataverse integration

Comments

@poikilotherm
Copy link
Member

Since upstream release 4.18, you can just switch on the logging for Make Data Count. We should persist those files somehow, so we can settle on how to process it later.

Maybe use sidecar to suck up the logs and store them somewhere safe instead of storing on a volume?

@poikilotherm poikilotherm added enhancement New feature or request integration Everything regarding a Dataverse integration labels Jan 24, 2020
@poikilotherm poikilotherm added this to the v4.19 milestone Jan 24, 2020
@qqmyers
Copy link
Member

qqmyers commented Jan 28, 2020

What's the reason for persistence (more than a volume)? Once these are processed, the results are back in Dataverse tables. Is the intent to allow reprocessing in the future?

@pdurbin
Copy link
Member

pdurbin commented Jan 29, 2020

This is a guess but perhaps @poikilotherm is thinking about multiple Glassfish instances. There's a note about Make Data Count at http://guides.dataverse.org/en/4.19/installation/advanced.html#multiple-glassfish-servers

@poikilotherm
Copy link
Member Author

@qqmyers and @pdurbin thanks for asking and getting in touch.

My idea behind shipping those logs away from containers is indeed about scaling, but also about avoiding too much persistance with the Dataverse app. IMHO those logfiles are similar to access logs and those shouldn't be part of the applications persistance (which makes things overly complex, too many volumes to handle), but be part of a log stack ASAP.

IMHO it makes more sense to handle such logs the same way you do nowadays with access logs etc: use things like ELK stack or similar for ingest. Query the index later to grasp the data. We might even think of pushing things into a separate Solr core, as it is already present at any Dataverse installation.

Feeding the index from log files written to disk/memory is really easy with sidecar containers, using tools like logstash/beats or fluentd.

@poikilotherm poikilotherm removed this from the v4.19 milestone Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request integration Everything regarding a Dataverse integration
Projects
None yet
Development

No branches or pull requests

3 participants