Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New illustration of caching and splitting GOLD records PLUS routine maintenance #125

Merged
merged 17 commits into from
Jan 13, 2025

Conversation

turbomam
Copy link
Member

@turbomam turbomam commented Dec 25, 2024

updated pyproject.toml, GitHub actions etc so that make test would work

@turbomam turbomam changed the title discover and cache first 10 studies New illustration of caching and splitting GOLD records Dec 27, 2024
@turbomam turbomam changed the title New illustration of caching and splitting GOLD records New illustration of caching and splitting GOLD records PLUS routine maintenance Dec 27, 2024
@turbomam turbomam changed the title New illustration of caching and splitting GOLD records PLUS routine maintenance New illustration of caching and splitting GOLD records **PLUS** routine maintenance Dec 27, 2024
@turbomam turbomam changed the title New illustration of caching and splitting GOLD records **PLUS** routine maintenance New illustration of caching and splitting GOLD records PLUS routine maintenance Dec 27, 2024
This was referenced Jan 8, 2025
sample gold records
@turbomam
Copy link
Member Author

turbomam commented Jan 9, 2025

occasionally getting something like this from sample_annotator/gold_to_mongo.py, but it always resumes when restarted from the command line


2025-01-09 15:03:27,179 - INFO - Retrieved 0 biosamples for study Gs0032355
2025-01-09 15:03:27,180 - INFO - Processing study Gs0032356...
2025-01-09 15:03:27,180 - INFO - Fetching study: Gs0032356
2025-01-09 15:03:28,147 - INFO - STATUS=200
2025-01-09 15:03:29,349 - INFO - STATUS=200
2025-01-09 15:03:30,805 - INFO - STATUS=200
2025-01-09 15:03:30,807 - INFO - Retrieved 0 biosamples for study Gs0032356
2025-01-09 15:03:30,809 - INFO - Processing study Gs0032357...
2025-01-09 15:03:30,809 - INFO - Fetching study: Gs0032357
2025-01-09 15:03:31,777 - INFO - STATUS=200
2025-01-09 15:03:32,958 - INFO - STATUS=200

Traceback (most recent call last):
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request
response = conn.getresponse()
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connection.py", line 516, in getresponse
httplib_response = super().getresponse()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/util/util.py", line 38, in reraise
raise value.with_traceback(tb)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request
response = conn.getresponse()
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/urllib3/connection.py", line 516, in getresponse
httplib_response = super().getresponse()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/home/mark/.pyenv/versions/3.10.13/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mark/gitrepos/sample-annotator/sample_annotator/gold_to_mongo.py", line 157, in

File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/gold_to_mongo.py", line 128, in main
study = gc.fetch_study(study_id, **args)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/clients/gold_client.py", line 127, in fetch_biosamples_by_study
projects = self.fetch_projects_by_study(id)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/clients/gold_client.py", line 108, in fetch_projects_by_study
results = self._call("projects", {"studyGoldId": id})
File "/home/mark/gitrepos/sample-annotator/sample_annotator/clients/gold_client.py", line 94, in _call
obj = _fetch_url(endpoint_url, params, user, passwd)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/diskcache/core.py", line 1875, in wrapper
result = func(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/diskcache/core.py", line 1875, in wrapper
result = func(*args, **kwargs)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/clients/gold_client.py", line 42, in _fetch_url
results = requests.get(
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

make: *** [make-gold-cache.Makefile:39: load-gold-biosamples-into-mongo] Error 1

@turbomam
Copy link
Member Author

turbomam commented Jan 9, 2025

ChatGPT suggests this... but that would have to go upstream in gold_cahce.py, right?

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session():
    retries = Retry(
        total=5,  # Total retry attempts
        backoff_factor=0.3,  # Wait time between retries: 0.3, 0.6, 1.2, etc.
        status_forcelist=[500, 502, 503, 504],  # Retry on server errors
        allowed_methods=["HEAD", "GET", "OPTIONS"],  # Retry for safe methods
    )
    session = requests.Session()
    adapter = HTTPAdapter(max_retries=retries)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

Use the session in your requests

session = create_session()
response = session.get(url, params=params, auth=(user, passwd))

@turbomam
Copy link
Member Author

turbomam commented Jan 10, 2025

Another error, but this one can't just be restarted


2025-01-10 07:49:57,990 - WARNING - Duplicate key error for Gs0110165
2025-01-10 07:49:57,990 - INFO - Processing study Gs0110166...
2025-01-10 07:49:57,990 - INFO - Fetching study: Gs0110166

Traceback (most recent call last):
File "/home/mark/gitrepos/sample-annotator/sample_annotator/gold_to_mongo.py", line 160, in
main()
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mark/.cache/pypoetry/virtualenvs/sample-annotator-O-SG-H46-py3.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/gold_to_mongo.py", line 128, in main
study = gc.fetch_study(study_id, **args)
File "/home/mark/gitrepos/sample-annotator/sample_annotator/clients/gold_client.py", line 162, in fetch_study
study = results[0]
IndexError: list index out of range
make: *** [make-gold-cache.Makefile:39: load-gold-biosamples-into-mongo] Error 1

@turbomam
Copy link
Member Author

regarding the previous error message, see the results from this different but similar API to the same (?) backend

image

@turbomam turbomam marked this pull request as ready for review January 13, 2025 21:17
@turbomam turbomam merged commit 835107e into main Jan 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants