-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom proxy settings with requests sessions #501
Comments
It's really hard to say what's going on here without knowing more about your university HPC system. Based on the error, it looks like VSCode is somehow involved?
Can you provide some more detail on how VSCode is involved in your workflow? |
Hi @mfisher87, Here is the complete error traceback:---------------------------------------------------------------------------
ConnectionRefusedError Traceback (most recent call last)
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connection.py:203, in HTTPConnection._new_conn(self)
202 try:
--> 203 sock = connection.create_connection(
204 (self._dns_host, self.port),
205 self.timeout,
206 source_address=self.source_address,
207 socket_options=self.socket_options,
208 )
209 except socket.gaierror as e:
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/util/connection.py:85, in create_connection(address, timeout, source_address, socket_options)
84 try:
---> 85 raise err
86 finally:
87 # Break explicitly a reference cycle
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/util/connection.py:73, in create_connection(address, timeout, source_address, socket_options)
72 sock.bind(source_address)
---> 73 sock.connect(sa)
74 # Break explicitly a reference cycle
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
NewConnectionError Traceback (most recent call last)
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connectionpool.py:791, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
790 # Make the request on the HTTPConnection object
--> 791 response = self._make_request(
792 conn,
793 method,
794 url,
795 timeout=timeout_obj,
796 body=body,
797 headers=headers,
798 chunked=chunked,
799 retries=retries,
800 response_conn=response_conn,
801 preload_content=preload_content,
802 decode_content=decode_content,
803 **response_kw,
804 )
806 # Everything went great!
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connectionpool.py:492, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
491 new_e = _wrap_proxy_error(new_e, conn.proxy.scheme)
--> 492 raise new_e
494 # conn.request() calls http.client.*.request, not the method in
495 # urllib3.request. It also calls makefile (recv) on the socket.
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connectionpool.py:468, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
467 try:
--> 468 self._validate_conn(conn)
469 except (SocketTimeout, BaseSSLError) as e:
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1097, in HTTPSConnectionPool._validate_conn(self, conn)
1096 if conn.is_closed:
-> 1097 conn.connect()
1099 if not conn.is_verified:
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connection.py:611, in HTTPSConnection.connect(self)
610 sock: socket.socket | ssl.SSLSocket
--> 611 self.sock = sock = self._new_conn()
612 server_hostname: str = self.host
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connection.py:218, in HTTPConnection._new_conn(self)
217 except OSError as e:
--> 218 raise NewConnectionError(
219 self, f"Failed to establish a new connection: {e}"
220 ) from e
222 # Audit hooks are only available in Python 3.8+
NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f81075a01d0>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
MaxRetryError Traceback (most recent call last)
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/requests/adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
485 try:
--> 486 resp = conn.urlopen(
487 method=request.method,
488 url=url,
489 body=request.body,
490 headers=request.headers,
491 redirect=False,
492 assert_same_host=False,
493 preload_content=False,
494 decode_content=False,
495 retries=self.max_retries,
496 timeout=timeout,
497 chunked=chunked,
498 )
500 except (ProtocolError, OSError) as err:
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connectionpool.py:845, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
843 new_e = ProtocolError("Connection aborted.", new_e)
--> 845 retries = retries.increment(
846 method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
847 )
848 retries.sleep()
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/util/retry.py:515, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
514 reason = error or ResponseError(cause)
--> 515 raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
517 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
MaxRetryError: HTTPSConnectionPool(host='cmr.earthdata.nasa.gov', port=443): Max retries exceeded with url: /search/granules.umm_json?short_name=GEDI02_A&bounding_box=31.52,-25.08,31.64,-24.99&temporal%5B%5D=2019-01-01T00:00:00Z,2024-01-01T00:00:00Z&page_size=0 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f81075a01d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
Cell In[3], line 1
----> 1 results = earthaccess.search_data(
2 short_name='GEDI02_A',
3 bounding_box=(31.52,-25.08,31.64,-24.99),
4 temporal=("2019-01-01", "2024-01-01"),
5 count=-1
6 )
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/earthaccess/api.py:120, in search_data(count, **kwargs)
118 else:
119 query = DataGranules().parameters(**kwargs)
--> 120 granules_found = query.hits()
121 print(f"Granules found: {granules_found}")
122 if count > 0:
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/earthaccess/search.py:388, in DataGranules.hits(self)
379 """Returns the number of hits the current query will return.
380 This is done by making a lightweight query to CMR and inspecting the returned headers.
381
382 Returns:
383 The number of results reported by CMR.
384 """
386 url = self._build_url()
--> 388 response = self.session.get(url, headers=self.headers, params={"page_size": 0})
390 try:
391 response.raise_for_status()
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/requests/sessions.py:602, in Session.get(self, url, **kwargs)
594 r"""Sends a GET request. Returns :class:`Response` object.
595
596 :param url: URL for the new :class:`Request` object.
597 :param \*\*kwargs: Optional arguments that ``request`` takes.
598 :rtype: requests.Response
599 """
601 kwargs.setdefault("allow_redirects", True)
--> 602 return self.request("GET", url, **kwargs)
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 "timeout": timeout,
586 "allow_redirects": allow_redirects,
587 }
588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
591 return resp
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
700 start = preferred_clock()
702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
705 # Total elapsed time of the request (approximately)
706 elapsed = preferred_clock() - start
File ~/micromamba/envs/woody_env/lib/python3.12/site-packages/requests/adapters.py:519, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
515 if isinstance(e.reason, _SSLError):
516 # This branch is for urllib3 v1.22 and later.
517 raise SSLError(e, request=request)
--> 519 raise ConnectionError(e, request=request)
521 except ClosedPoolError as e:
522 raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='cmr.earthdata.nasa.gov', port=443): Max retries exceeded with url: /search/granules.umm_json?short_name=GEDI02_A&bounding_box=31.52,-25.08,31.64,-24.99&temporal%5B%5D=2019-01-01T00:00:00Z,2024-01-01T00:00:00Z&page_size=0 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f81075a01d0>: Failed to establish a new connection: [Errno 111] Connection refused')) Same error also in a clean environment with Python I also tried downgrading the package (to >>> earthaccess.search_data(
... short_name='GEDI02_A',
... bounding_box=(31.52,-25.08,31.64,-24.99),
... temporal=("2019-01-01", "2024-01-01"),
... count=-1
... )
Granules found: 92
Traceback (most recent call last):
File "/home/du23yow/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/du23yow/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/home/du23yow/micromamba/envs/woody_env/lib/python3.12/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
... Any other ideas of what I could do? |
Okay, I found the explanation in this icepyx discussion. Ping @betolink 🙂 Any suggestion on using |
ConnectionError
- Proxy env variables overridden?
Hi @maawoo, I think this could be resolved if we let users pass the proxy settings to requests, in the meantime you can manually get a session modify it and get the files but that defeats the purpose! import earthaccess
from itertools import chain # to flatten the results
earthaccess.login()
# Define your proxy
proxy = {
'http': 'http://your_proxy_address:port',
'https': 'https://your_proxy_address:port'
}
results = earthaccess.search_data(
short_name='GEDI02_A',
bounding_box=(31.52,-25.08,31.64,-24.99),
temporal=("2019-01-01", "2024-01-01"),
count=-1
)
links = list(chain.from_iterable([r.data_links() for r in results]))
session = earthaccess.get_requests_https_session()
session.proxies.update(proxy)
for url in links:
local_filename = url.split("/")[-1]
path = f"temp_dir/{local_filename}"
with session.get(
url,
stream=True,
allow_redirects=True,
) as r:
r.raise_for_status()
with open(path, "wb") as f:
shutil.copyfileobj(r.raw, f, length=1024 * 1024) This is not concurrent so there is room for improvement, as I said we should implement the proxy here but my guess is that it won't be ready in the next week. |
Thank you for the possible workaround!
No worries! I already have the data I need. My plan was to implement earthaccess into some scripts but that can wait for now. |
The However, whether or not those env vars are used is determined by the boolean value of I suggest attempting to export your |
@maawoo, if you want a workaround until we can come up with a robust and secure solution, here's something based upon the thread from #823. This is pulled from a combination of code from a few comments in that PR, and some minor renaming/refactoring. First, define a import os
from functools import cache, wraps
from typing import Callable
from typing_extensions import ParamSpec
import earthaccess
import requests
P = ParamSpec("P")
def set_proxies(f: Callable[P, requests.Session]) -> Callable[P, requests.Session]:
@wraps(f)
def wrapper(*args: P.args, **kwargs: P.kwargs) -> requests.Session:
session = f(*args, **kwargs)
session.proxies.update(
{
scheme: v
for scheme in ("http", "https")
if (
v := os.environ.get(
k := f"{scheme}_proxy", os.environ.get(k.upper())
)
)
}
)
return session
return wrapper Now you can use earthaccess.login()
auth: earthaccess.Auth = earthaccess.__store__.auth
auth.get_session = cache(set_proxies(auth.get_session)) From here, any further |
I'm trying to download GEDI data on my university's HPC system. The following sample code results in a
ConnectionError
:My initial thought was that the API is not whitelisted in our HTTP/HTTPS proxies, which are set via environment variables. However, according to our sysadmin this should not be an issue. I was able to confirm by requesting the same URL via curl:
Any ideas / workarounds would be appreciated!
The text was updated successfully, but these errors were encountered: