Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new data since snapshot #4

Open
maxheld83 opened this issue Mar 30, 2021 · 0 comments
Open

add new data since snapshot #4

maxheld83 opened this issue Mar 30, 2021 · 0 comments

Comments

@maxheld83
Copy link
Contributor

maxheld83 commented Mar 30, 2021

Perhaps it is possible to do this:

at (presently 2021-03-30), http://api.crossref.org/v1/works?filter=from-deposit-date:2021-03-01 yields, in the worst case, a pretty manageable 258kb json (per page) (see sample below).

Both the below data (includes 2020 stuff) and the docs suggest that this returns both newly added, as well as updated records:

In the snapshot docs:

To get the registered content that has changed since an archive was created, use OAI-PMH Plus or the REST API.

and from the REST API spec:

from-deposit-date | {date} | metadata last (re)deposited since (inclusive) {date}

Takes barely a blink to get the below.

If we wanted to put everything in BigQuery, and only ever query that, the question would be, how we can efficiently and regularly add this to the warehouse.
It would probably have to be triggered on cron (for most of the days) and at runtime (for the last hours).
It's also cheap to do, so at runtime should work.

Perhaps something via the federated query functions in BigQuery?

This might still be a bad idea, I'm not sure.

Though if there's an elegant way, it would be great to be able to completely abstract away the whole update and REST vs BigQuery stuff.
Lots of complexity could be cut.

The diffed data, it appears is readily available:

{
        "indexed": {
          "date-parts": [
            [
              2021,
              3,
              25
            ]
          ],
          "date-time": "2021-03-25T06:49:24Z",
          "timestamp": 1616654964053
        },
        "publisher-location": "Cham",
        "reference-count": 0,
        "publisher": "Springer International Publishing",
        "isbn-type": [
          {
            "value": "9783030510954",
            "type": "print"
          },
          {
            "value": "9783030510961",
            "type": "electronic"
          }
        ],
        "license": [
          {
            "URL": "http:\/\/www.springer.com\/tdm",
            "start": {
              "date-parts": [
                [
                  2021,
                  1,
                  1
                ]
              ],
              "date-time": "2021-01-01T00:00:00Z",
              "timestamp": 1609459200000
            },
            "delay-in-days": 0,
            "content-version": "tdm"
          },
          {
            "URL": "http:\/\/www.springer.com\/tdm",
            "start": {
              "date-parts": [
                [
                  2021,
                  1,
                  1
                ]
              ],
              "date-time": "2021-01-01T00:00:00Z",
              "timestamp": 1609459200000
            },
            "delay-in-days": 0,
            "content-version": "vor"
          }
        ],
        "content-domain": {
          "domain": [],
          "crossmark-restriction": false
        },
        "published-print": {
          "date-parts": [
            [
              2021
            ]
          ]
        },
        "DOI": "10.1007\/978-3-030-51096-1",
        "type": "book",
        "created": {
          "date-parts": [
            [
              2020,
              10,
              5
            ]
          ],
          "date-time": "2020-10-05T17:14:27Z",
          "timestamp": 1601918067000
        },
        "source": "Crossref",
        "is-referenced-by-count": 0,
        "title": [
          "Precarity and International Relations"
        ],
        "prefix": "10.1007",
        "member": "297",
        "container-title": [
          "International Political Economy Series"
        ],
        "link": [
          {
            "URL": "http:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-51096-1.pdf",
            "content-type": "application\/pdf",
            "content-version": "vor",
            "intended-application": "text-mining"
          },
          {
            "URL": "http:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-51096-1",
            "content-type": "unspecified",
            "content-version": "vor",
            "intended-application": "similarity-checking"
          }
        ],
        "deposited": {
          "date-parts": [
            [
              2021,
              3,
              25
            ]
          ],
          "date-time": "2021-03-25T06:13:23Z",
          "timestamp": 1616652803000
        },
        "score": 1.0,
        "editor": [
          {
            "given": "Ritu",
            "family": "Vij",
            "sequence": "first",
            "affiliation": []
          },
          {
            "given": "Tahseen",
            "family": "Kazi",
            "sequence": "additional",
            "affiliation": []
          },
          {
            "given": "Elisa",
            "family": "Wynne-Hughes",
            "sequence": "additional",
            "affiliation": []
          }
        ],
        "issued": {
          "date-parts": [
            [
              2021
            ]
          ]
        },
        "ISBN": [
          "9783030510954",
          "9783030510961"
        ],
        "references-count": 0,
        "URL": "http:\/\/dx.doi.org\/10.1007\/978-3-030-51096-1",
        "ISSN": [
          "2662-2483",
          "2662-2491"
        ],
        "issn-type": [
          {
            "value": "2662-2483",
            "type": "print"
          },
          {
            "value": "2662-2491",
            "type": "electronic"
          }
        ]
      }
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant