Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting : Add granularity to ListRecords when using from parameter #11038

Conversation

luddaniel
Copy link
Contributor

What this PR does / why we need it:

It performs an Identify request before a ListRecords request with from parameter to add the correct granularity.

Which issue(s) this PR closes:

Suggestions on how to test this:

Test with YYYY-MM-DD granularity, without PR it will FAIL in 2nd run, with PR it will work :

{
  "nickName": "nakala",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "default",
  "harvestUrl": "https://api.nakala.fr/oai2",
  "archiveUrl": "https://api.nakala.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "oai_dc",
  "set": "doi_10.34847_nkl.002f8m36",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}
curl -H "X-Dataverse-key: $API_KEY" -H "Content-Type: application/json" -X POST  "http://localhost:8080/api/harvest/clients/nakala" --upload-file "client.json"

Test with YYYY-MM-DDThh:mm:ssZ granularity, non regression test, it work still work with the PR :

{
  "nickName": "ird",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "dataverse",
  "harvestUrl": "https://dataverse.ird.fr/oai",
  "archiveUrl": "https://dataverse.ird.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "dataverse_json",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}
curl -H "X-Dataverse-key: $API_KEY" -H "Content-Type: application/json" -X POST  "http://localhost:8080/api/harvest/clients/ird" --upload-file "client.json"

Is there a release notes update needed for this change?:
Yes

@pdurbin pdurbin added Type: Bug a defect Size: 3 A percentage of a sprint. 2.1 hours. labels Nov 20, 2024
@ofahimIQSS ofahimIQSS added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 26, 2024
@cmbz cmbz added the FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) label Dec 5, 2024
@pdurbin pdurbin self-assigned this Dec 16, 2024
@pdurbin
Copy link
Member

pdurbin commented Dec 16, 2024

@luddaniel thanks for discussing this originally at https://dataverse.zulipchat.com/#narrow/channel/375707-community/topic/Harvesting.20with.20from.3D.20parameter/near/481979177 and for making this pull request.

I didn't run the code but it looks reasonable to me. I'll go ahead and request a review from @landreev in case he has time to take a look.

@luddaniel one question for you. Under "how to test" you mention "client.json". I this how you're varying the timestamp format? Would it be helpful for us to have those client.json files when we are testing?

@luddaniel
Copy link
Contributor Author

Hi @pdurbin

@luddaniel one question for you. Under "how to test" you mention "client.json". I this how you're varying the timestamp format? Would it be helpful for us to have those client.json files when we are testing?

I added the json to prove I tested and to help you to test.
Both json should be tested without the PR (develop) and with the PR, expecting to see a improve in nakala with the PR and a still working status for ird. Note that nakala will fail without the PR in the 2nd run for the very reason of this PR because on the 2nd run from string query is used.

@pdurbin
Copy link
Member

pdurbin commented Dec 17, 2024

@luddaniel sorry, I think I confused myself. I understand now that you've put the client.json content inline above the curl command.

I'd say this is ready for some QA. I'll approve it. Thanks!

@ofahimIQSS ofahimIQSS self-assigned this Dec 19, 2024
@ofahimIQSS
Copy link
Contributor

Can we please update the 6.4 to 6.5 in the dataverse/modules/dataverse-parent
/pom.xml

@pdurbin
Copy link
Member

pdurbin commented Dec 20, 2024

@ofahimIQSS sure, I went ahead and merged the latest from develop into this PR.

@coveralls
Copy link

Coverage Status

coverage: 22.687% (-0.002%) from 22.689%
when pulling 9ced7d1 on Recherche-Data-Gouv:11020-unconsidered-harvesting-granularity
into 2ae639d on IQSS:develop.

@cmbz cmbz added the FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) label Jan 2, 2025
@ofahimIQSS
Copy link
Contributor

Hello, came across issues while testing this. for the IRD - 341 failed, 1 harvested, and nakala, all 3 failed. Note, this is while testing with the PR in local.

image

server.log --- 11038.txt

@luddaniel
Copy link
Contributor Author

Hi @ofahimIQSS sorry I had to give you harvesting client that gives a lot of failed entries but I have no alternative to give you samples to help your test/review.
Harvesting errors are various, and in IRD case errors are related to #10949. With our customized citation.tsv we have less issues.

The goal and purpose of this PR is to fix FAILED global status and to enlage the capabality in term of partial harvesting.
With PR, 2nd run will work
Without PR, nakala 2nd will fail. (could be verified by switching to develop at the end of the test and restart docker while repackaging mvn -Pct clean package docker:run)

@ofahimIQSS
Copy link
Contributor

Confirmed - Merging PR

@ofahimIQSS ofahimIQSS merged commit 802df48 into IQSS:develop Jan 15, 2025
11 checks passed
@ofahimIQSS ofahimIQSS removed their assignment Jan 15, 2025
@pdurbin pdurbin added this to the 6.6 milestone Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) Size: 3 A percentage of a sprint. 2.1 hours. Type: Bug a defect
Projects
Status: Done 🧹
Status: 🚀 Done (Recherche Data Gouv)
Development

Successfully merging this pull request may close these issues.

Unconsidered harvesting granularity
5 participants