-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: parse and clean archive badges and markdown links to URL #243
Changes from all commits
3187744
aa53640
dc3d627
09c276b
053f12d
7dce691
9e051cc
cea651c
2b9db98
03376a9
b6b9f27
b25ebbc
e1246e0
a4e9477
4a56e10
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,9 @@ | |
from datetime import datetime | ||
from typing import Any | ||
|
||
import doi | ||
import requests | ||
|
||
|
||
def get_clean_user(username: str) -> str: | ||
"""Cleans a GitHub username provided in a review issue by removing any | ||
|
@@ -125,3 +128,84 @@ | |
review_dict["date_accepted"] = value | ||
break | ||
return review_dict | ||
|
||
|
||
def check_url(url: str) -> bool: | ||
"""Test url. Return true if there's a valid response, False if not | ||
|
||
Parameters | ||
---------- | ||
url : str | ||
String for a url to a website to test. | ||
|
||
""" | ||
|
||
try: | ||
response = requests.get(url, timeout=6) | ||
return response.status_code == 200 | ||
except Exception: # pragma: no cover | ||
return False | ||
|
||
|
||
def is_doi(archive) -> str | None: | ||
"""Check if the DOI is valid and return the DOI link. | ||
|
||
Parameters | ||
---------- | ||
archive : str | ||
The DOI string to validate, e.g., `10.1234/zenodo.12345678` | ||
|
||
Returns | ||
------- | ||
str | None | ||
The DOI link in the form `https://doi.org/10.1234/zenodo.12345678` or `None` | ||
if the DOI is invalid. | ||
""" | ||
try: | ||
return doi.validate_doi(archive) | ||
except ValueError: | ||
pass | ||
|
||
|
||
def clean_archive(archive): | ||
"""Clean an archive link to ensure it is a valid DOI URL. | ||
|
||
This utility will attempt to parse the DOI link from the various formats | ||
that are commonly present in review metadata. This utility will handle: | ||
|
||
* Markdown links in the format `[label](URL)`, e.g., `[my archive](https://doi.org/10.1234/zenodo.12345678)` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it looks like the issue that failed when i ran it locally using a zenodo badge pyOpenSci/software-submission#83 honestly, I may have updated and added that (it is possible). But it would be good to parse a markdown badge url too. let me know if you don't see that error but i saw it running locally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All good, it was the JOSS DOI field causing the failure as it was left black for pyOpenSci/software-submission#83. The zenodo badge in this example is handle and a recurring pattern on a lot of reviews |
||
* Raw text in the format `DOI` e.g., `10.1234/zenodo.12345678` | ||
* URLs in the format `http(s)://...` e.g., `https://doi.org/10.1234/zenodo.12345678` | ||
* The special cases `n/a` and `tbd` which will be returned as `None` in anticipation of future data | ||
|
||
If the archive link is a URL, it will be returned as is with a check that | ||
it resolves but is not required to be a valid DOI. If the archive link is | ||
a DOI, it will be validated and returned as a URL in the form | ||
`https://doi.org/10.1234/zenodo.12345678` using the `python-doi` package. | ||
|
||
""" | ||
archive = archive.strip() # Remove leading/trailing whitespace | ||
if not archive: | ||
# If field is empty, return None | ||
return None | ||
if archive.startswith("[") and archive.endswith(")"): | ||
# Extract the outermost link | ||
link = archive[archive.rfind("](") + 2 : -1] | ||
# recursively clean the archive link | ||
return clean_archive(link) | ||
elif link := is_doi(archive): | ||
# is_doi returns the DOI link if it is valid | ||
return link | ||
elif archive.startswith("http"): | ||
if archive.startswith("http://"): | ||
archive = archive.replace("http://", "https://") | ||
# Validate that the URL resolves | ||
if not check_url(archive): | ||
raise ValueError(f"Invalid archive URL (not resolving): {archive}") | ||
return archive | ||
elif archive.lower() == "n/a": | ||
return None | ||
elif archive.lower() == "tbd": | ||
return None | ||
else: | ||
raise ValueError(f"Invalid archive URL: {archive}") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Submitting Author: Fakename (@fakeauthor) | ||
All current maintainers: (@fakeauthor1, @fakeauthor2) | ||
Package Name: fake_package | ||
One-Line Description of Package: A fake python package | ||
Repository Link: https://example.com/fakeauthor1/fake_package | ||
Version submitted: v1.0.0 | ||
EiC: @fakeeic | ||
Editor: @fakeeditor | ||
Reviewer 1: @fakereviewer1 | ||
Reviewer 2: @fakereviewer2 | ||
Reviews Expected By: fake date | ||
Archive: 10.5281/zenodo.8415866 | ||
JOSS DOI: 10.21105/joss.01450 | ||
Version accepted: 2.0.0 ([repo](https://example.com/fakeauthor1/fake_package/releases/tag/v2.0.0), [pypi](https://pypi.org/project/fake_project/2.0.0), [archive](https://example.com/fakearchive)) | ||
Date accepted (month/day/year): 06/29/2024 | ||
|
||
--- | ||
|
||
## Scope | ||
|
||
- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package after should it be accepted. | ||
- [x] I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment]. | ||
(etc) | ||
|
||
## Community Partnerships | ||
|
||
- [ ] etc | ||
- [ ] aaaaaa |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Submitting Author: Fakename (@fakeauthor) | ||
All current maintainers: (@fakeauthor1, @fakeauthor2) | ||
Package Name: fake_package | ||
One-Line Description of Package: A fake python package | ||
Repository Link: https://example.com/fakeauthor1/fake_package | ||
Version submitted: v1.0.0 | ||
EiC: @fakeeic | ||
Editor: @fakeeditor | ||
Reviewer 1: @fakereviewer1 | ||
Reviewer 2: @fakereviewer2 | ||
Reviews Expected By: fake date | ||
Archive: 10.1234/zenodo.12345678 | ||
JOSS DOI: 10.21105/joss.00000 | ||
Version accepted: 2.0.0 ([repo](https://example.com/fakeauthor1/fake_package/releases/tag/v2.0.0), [pypi](https://pypi.org/project/fake_project/2.0.0), [archive](https://example.com/fakearchive)) | ||
Date accepted (month/day/year): 06/29/2024 | ||
|
||
--- | ||
|
||
## Scope | ||
|
||
- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package after should it be accepted. | ||
- [x] I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment]. | ||
(etc) | ||
|
||
## Community Partnerships | ||
|
||
- [ ] etc | ||
- [ ] aaaaaa |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Submitting Author: Fakename (@fakeauthor) | ||
All current maintainers: (@fakeauthor1, @fakeauthor2) | ||
Package Name: fake_package | ||
One-Line Description of Package: A fake python package | ||
Repository Link: https://example.com/fakeauthor1/fake_package | ||
Version submitted: v1.0.0 | ||
EiC: @fakeeic | ||
Editor: @fakeeditor | ||
Reviewer 1: @fakereviewer1 | ||
Reviewer 2: @fakereviewer2 | ||
Reviews Expected By: fake date | ||
Archive: | ||
JOSS DOI: | ||
Version accepted: 2.0.0 ([repo](https://example.com/fakeauthor1/fake_package/releases/tag/v2.0.0), [pypi](https://pypi.org/project/fake_project/2.0.0), [archive](https://example.com/fakearchive)) | ||
Date accepted (month/day/year): 06/29/2024 | ||
|
||
--- | ||
|
||
## Scope | ||
|
||
- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package after should it be accepted. | ||
- [x] I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment]. | ||
(etc) | ||
|
||
## Community Partnerships | ||
|
||
- [ ] etc | ||
- [ ] aaaaaa |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Submitting Author: Fakename (@fakeauthor) | ||
All current maintainers: (@fakeauthor1, @fakeauthor2) | ||
Package Name: fake_package | ||
One-Line Description of Package: A fake python package | ||
Repository Link: https://example.com/fakeauthor1/fake_package | ||
Version submitted: v1.0.0 | ||
EiC: @fakeeic | ||
Editor: @fakeeditor | ||
Reviewer 1: @fakereviewer1 | ||
Reviewer 2: @fakereviewer2 | ||
Reviews Expected By: fake date | ||
Archive: TBD | ||
JOSS DOI: N/A | ||
Version accepted: 2.0.0 ([repo](https://example.com/fakeauthor1/fake_package/releases/tag/v2.0.0), [pypi](https://pypi.org/project/fake_project/2.0.0), [archive](https://example.com/fakearchive)) | ||
Date accepted (month/day/year): 06/29/2024 | ||
|
||
--- | ||
|
||
## Scope | ||
|
||
- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package after should it be accepted. | ||
- [x] I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment]. | ||
(etc) | ||
|
||
## Community Partnerships | ||
|
||
- [ ] etc | ||
- [ ] aaaaaa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've added a new dep, we should make sure that it is noted in the changelog and also document why we added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further clarified this in e1246e0