Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Containers #78

Open
nmulyk opened this issue Dec 20, 2024 · 3 comments
Open

Failing Containers #78

nmulyk opened this issue Dec 20, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@nmulyk
Copy link

nmulyk commented Dec 20, 2024

I've had several containers fail this week. I'm spawning containers with the following code:

payload2 = { 'name': 'nicole-baseband', 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest', 'cores': 2, 'ram': 8, 'kind': 'headless', 'cmd': 'workflow', 'args': 'run --site=canfar baseband-nmulyk', 'env': {'SITE': 'canfar', 'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoia3NoaW4iLCJleHAiOjE3MjY2MDAyODEsImlzcyI6ImZyYi1tYXN0ZXIiLCJpYXQiOjE3MjY1OTg0ODF9.jbKUKw3QgfKaXZdHBYER63cONnfkPgEIMxtyJigp-DU', 'CHIME_FRB_REFRESH_TOKEN': 'beb04f3faea1e802cf4a698a2df3f290bff9a112a6328619', # 'PYTHONPATH': '/arc/home/user/baseband-analysis/' # you can set this if you want to run your local branch on this headless session }, 'replicas': 10 } sid2 = s.create(**payload2)

Although I see the usual logs when I request 10 or fewer replicas (but more than half still fail), when I request more (~30 replicas) I get the following errors:

2024-12-20 16:29:34,000 - skaha-client-skaha.session - INFO - Creating 30 session(s) with parameters:
2024-12-20 16:29:34,000 INFO Creating 30 session(s) with parameters:
2024-12-20 16:29:34,004 - skaha-client-skaha.session - INFO - {'name': 'nicole-baseband', 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest', 'cores': 2, 'ram': 8, 'kind': 'headless', 'cmd': 'workflow', 'args': 'run --site=canfar baseband-nmulyk', 'env': {'SITE': 'canfar', 'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoia3NoaW4iLCJleHAiOjE3MjY2MDAyODEsImlzcyI6ImZyYi1tYXN0ZXIiLCJpYXQiOjE3MjY1OTg0ODF9.jbKUKw3QgfKaXZdHBYER63cONnfkPgEIMxtyJigp-DU', 'CHIME_FRB_REFRESH_TOKEN': 'beb04f3faea1e802cf4a698a2df3f290bff9a112a6328619'}}
2024-12-20 16:29:34,004 INFO {'name': 'nicole-baseband', 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest', 'cores': 2, 'ram': 8, 'kind': 'headless', 'cmd': 'workflow', 'args': 'run --site=canfar baseband-nmulyk', 'env': {'SITE': 'canfar', 'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoia3NoaW4iLCJleHAiOjE3MjY2MDAyODEsImlzcyI6ImZyYi1tYXN0ZXIiLCJpYXQiOjE3MjY1OTg0ODF9.jbKUKw3QgfKaXZdHBYER63cONnfkPgEIMxtyJigp-DU', 'CHIME_FRB_REFRESH_TOKEN': 'beb04f3faea1e802cf4a698a2df3f290bff9a112a6328619'}}
2024-12-20 16:29:42,713 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:42,716 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:43,078 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:43,848 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:44,842 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:44,846 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:44,868 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:45,927 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:46,109 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:46,174 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,125 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,269 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,371 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,636 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,981 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:47,982 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:48,032 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:48,045 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:48,112 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-20 16:29:48,147 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10

SSLError Traceback (most recent call last)
File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
704 conn,
705 method,
706 url,
707 timeout=timeout_obj,
708 body=body,
709 headers=headers,
710 chunked=chunked,
711 )
713 # If we're going to release the connection in finally:, then
714 # the response doesn't need to know about the connection. Otherwise
715 # it will also try to release it and we'll have a double-release
716 # mess.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
385 try:
--> 386 self._validate_conn(conn)
387 except (SocketTimeout, BaseSSLError) as e:
388 # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn)
1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock
-> 1042 conn.connect()
1044 if not conn.is_verified:

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connection.py:419, in HTTPSConnection.connect(self)
417 context.load_default_certs()
--> 419 self.sock = ssl_wrap_socket(
420 sock=conn,
421 keyfile=self.key_file,
422 certfile=self.cert_file,
423 key_password=self.key_password,
424 ca_certs=self.ca_certs,
425 ca_cert_dir=self.ca_cert_dir,
426 ca_cert_data=self.ca_cert_data,
427 server_hostname=server_hostname,
428 ssl_context=context,
429 tls_in_tls=tls_in_tls,
430 )
432 # If we're using all defaults and the connection
433 # is TLSv1 or TLSv1.1 we throw a DeprecationWarning
434 # for the host.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/util/ssl_.py:418, in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
417 if key_password is None:
--> 418 context.load_cert_chain(certfile, keyfile)
419 else:

SSLError: [X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
485 try:
--> 486 resp = conn.urlopen(
487 method=request.method,
488 url=url,
489 body=request.body,
490 headers=request.headers,
491 redirect=False,
492 assert_same_host=False,
493 preload_content=False,
494 decode_content=False,
495 retries=self.max_retries,
496 timeout=timeout,
497 chunked=chunked,
498 )
500 except (ProtocolError, OSError) as err:

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
785 e = ProtocolError("Connection aborted.", e)
--> 787 retries = retries.increment(
788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
789 )
790 retries.sleep()

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='ws-uv.canfar.net', port=443): Max retries exceeded with url: /skaha/v0/session?name=nicole-baseband-8&image=images.canfar.net%2Fchimefrb-public%2Fbaseband-analysis%3Alatest&cores=2&ram=8&kind=headless&cmd=workflow&args=run+--site%3Dcanfar+baseband-nmulyk&env=SITE%3Dcanfar&env=CHIME_FRB_ACCESS_TOKEN%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoia3NoaW4iLCJleHAiOjE3MjY2MDAyODEsImlzcyI6ImZyYi1tYXN0ZXIiLCJpYXQiOjE3MjY1OTg0ODF9.jbKUKw3QgfKaXZdHBYER63cONnfkPgEIMxtyJigp-DU&env=CHIME_FRB_REFRESH_TOKEN%3Dbeb04f3faea1e802cf4a698a2df3f290bff9a112a6328619&env=REPLICA_ID%3D8&env=REPLICA_COUNT%3D30 (Caused by SSLError(SSLError(116, '[X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)')))

During handling of the above exception, another exception occurred:

SSLError Traceback (most recent call last)
Cell In[23], line 18
1 ### spawning 20 containers for all the baseband pipeline stuff (beamforming, merge, analysis)
2 ### you're of course free to use your own chime tokens -- otherwise you'll be masquerading as kshin :-)
3 payload2 = {
4 'name': 'nicole-baseband',
5 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest',
(...)
16 'replicas': 30
17 }
---> 18 sid2 = s.create(**payload2)

File /opt/pysetup/.venv/lib/python3.8/site-packages/skaha/session.py:236, in Session.create(self, name, image, cores, ram, kind, gpu, cmd, args, env, replicas)
234 arguments.append({"url": self.server, "params": payload})
235 loop = get_event_loop()
--> 236 results = loop.run_until_complete(scale(self.session.post, arguments))
237 responses: List[str] = []
238 for response in results:

File /opt/pysetup/.venv/lib/python3.8/site-packages/nest_asyncio.py:90, in _patch_loop..run_until_complete(self, future)
87 if not f.done():
88 raise RuntimeError(
89 'Event loop stopped before Future completed.')
---> 90 return f.result()

File /usr/local/lib/python3.8/asyncio/futures.py:178, in Future.result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result

File /usr/local/lib/python3.8/asyncio/tasks.py:282, in Task.__step(failed resolving arguments)
280 result = coro.send(None)
281 else:
--> 282 result = coro.throw(exc)
283 except StopIteration as exc:
284 if self._must_cancel:
285 # Task is cancelled right before coro stops.

File /opt/pysetup/.venv/lib/python3.8/site-packages/skaha/utils/threaded.py:35, in scale(function, arguments)
30 loop = asyncio.get_event_loop()
31 futures = [
32 loop.run_in_executor(executor, partial(function, **arguments[index]))
33 for index in range(workers)
34 ]
---> 35 return await asyncio.gather(*futures)

File /usr/local/lib/python3.8/asyncio/tasks.py:349, in Task.__wakeup(self, future)
347 def __wakeup(self, future):
348 try:
--> 349 future.result()
350 except BaseException as exc:
351 # This may also be a cancellation.
352 self.__step(exc)

File /usr/local/lib/python3.8/concurrent/futures/thread.py:57, in _WorkItem.run(self)
54 return
56 try:
---> 57 result = self.fn(*self.args, **self.kwargs)
58 except BaseException as exc:
59 self.future.set_exception(exc)

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:635, in Session.post(self, url, data, json, **kwargs)
624 def post(self, url, data=None, json=None, **kwargs):
625 r"""Sends a POST request. Returns :class:Response object.
626
627 :param url: URL for the new :class:Request object.
(...)
632 :rtype: requests.Response
633 """
--> 635 return self.request("POST", url, data=data, json=json, **kwargs)

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
582 send_kwargs = {
583 "timeout": timeout,
584 "allow_redirects": allow_redirects,
585 }
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
589 return resp

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
698 start = preferred_clock()
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
703 # Total elapsed time of the request (approximately)
704 elapsed = preferred_clock() - start

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/adapters.py:517, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
513 raise ProxyError(e, request=request)
515 if isinstance(e.reason, _SSLError):
516 # This branch is for urllib3 v1.22 and later.
--> 517 raise SSLError(e, request=request)
519 raise ConnectionError(e, request=request)
521 except ClosedPoolError as e:

SSLError: HTTPSConnectionPool(host='ws-uv.canfar.net', port=443): Max retries exceeded with url: /skaha/v0/session?name=nicole-baseband-8&image=images.canfar.net%2Fchimefrb-public%2Fbaseband-analysis%3Alatest&cores=2&ram=8&kind=headless&cmd=workflow&args=run+--site%3Dcanfar+baseband-nmulyk&env=SITE%3Dcanfar&env=CHIME_FRB_ACCESS_TOKEN%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoia3NoaW4iLCJleHAiOjE3MjY2MDAyODEsImlzcyI6ImZyYi1tYXN0ZXIiLCJpYXQiOjE3MjY1OTg0ODF9.jbKUKw3QgfKaXZdHBYER63cONnfkPgEIMxtyJigp-DU&env=CHIME_FRB_REFRESH_TOKEN%3Dbeb04f3faea1e802cf4a698a2df3f290bff9a112a6328619&env=REPLICA_ID%3D8&env=REPLICA_COUNT%3D30 (Caused by SSLError(SSLError(116, '[X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)')))

@nmulyk nmulyk added the bug Something isn't working label Dec 20, 2024
@shinybrar
Copy link
Owner

@nmulyk Your CADC certificate might be expired. Could you run a cadc-get-cert in a terminal and report back?

@nmulyk
Copy link
Author

nmulyk commented Dec 20, 2024

Hi @shinybrar, I had the same thought and tried that earlier. Oddly, I can spawn some jobs but most fail.

@masonng-astro
Copy link

masonng-astro commented Dec 24, 2024

I'm also trying to spawn containers with the following code, and I'm getting a similar error to Nicole after having tried cadc-get-cert and using the same CHIME_FRB_ACCESS_TOKEN and CHIME_FRB_REFRESH_TOKEN. The issue with failed containers only started for me yesterday evening, whereas it had been fine for me the last few days. Any advice?

### spawning 20 containers for all the baseband pipeline stuff (beamforming, merge, analysis)
payload2 = {
    'name': 'masonng-baseband',
    'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest',
    'cores': 2,
    'ram': 16,
    'kind': 'headless',
    'cmd': 'workflow',
    'args': 'run --site=canfar baseband-masonng',
    'env': {'SITE': 'canfar',
            'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoibWFzb25uZyIsImV4cCI6MTczMDg1MjI2MiwiaXNzIjoiZnJiLW1hc3RlciIsImlhdCI6MTczMDg1MDQ2Mn0.J8T85nZpRTHGpgz7HvoV0AA_tig7A7nXjI_bvP9C6o0',
            'CHIME_FRB_REFRESH_TOKEN': '6c8c6282db025c5baf6ae348c9be5f43fd9a9eef99984926',
            # 'PYTHONPATH': '/arc/home/user/baseband-analysis/' # you can set this if you want to run your local branch on this headless session
           },
    'replicas': 20
}
sid2 = s.create(**payload2)
2024-12-24 13:17:23,238 - skaha-client-skaha.session - INFO - Creating 20 session(s) with parameters:
2024-12-24 13:17:23,238 INFO Creating 20 session(s) with parameters:
2024-12-24 13:17:23,245 - skaha-client-skaha.session - INFO - {'name': 'masonng-baseband', 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest', 'cores': 2, 'ram': 16, 'kind': 'headless', 'cmd': 'workflow', 'args': 'run --site=canfar baseband-masonng', 'env': {'SITE': 'canfar', 'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoibWFzb25uZyIsImV4cCI6MTczMDg1MjI2MiwiaXNzIjoiZnJiLW1hc3RlciIsImlhdCI6MTczMDg1MDQ2Mn0.J8T85nZpRTHGpgz7HvoV0AA_tig7A7nXjI_bvP9C6o0', 'CHIME_FRB_REFRESH_TOKEN': '6c8c6282db025c5baf6ae348c9be5f43fd9a9eef99984926'}}
2024-12-24 13:17:23,245 INFO {'name': 'masonng-baseband', 'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest', 'cores': 2, 'ram': 16, 'kind': 'headless', 'cmd': 'workflow', 'args': 'run --site=canfar baseband-masonng', 'env': {'SITE': 'canfar', 'CHIME_FRB_ACCESS_TOKEN': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoibWFzb25uZyIsImV4cCI6MTczMDg1MjI2MiwiaXNzIjoiZnJiLW1hc3RlciIsImlhdCI6MTczMDg1MDQ2Mn0.J8T85nZpRTHGpgz7HvoV0AA_tig7A7nXjI_bvP9C6o0', 'CHIME_FRB_REFRESH_TOKEN': '6c8c6282db025c5baf6ae348c9be5f43fd9a9eef99984926'}}
2024-12-24 13:17:31,013 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:31,206 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:31,307 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:31,414 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:31,773 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:31,859 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:32,366 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:32,412 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:32,832 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
2024-12-24 13:17:32,846 WARNING Connection pool is full, discarding connection: ws-uv.canfar.net. Connection pool size: 10
---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
    704     conn,
    705     method,
    706     url,
    707     timeout=timeout_obj,
    708     body=body,
    709     headers=headers,
    710     chunked=chunked,
    711 )
    713 # If we're going to release the connection in ``finally:``, then
    714 # the response doesn't need to know about the connection. Otherwise
    715 # it will also try to release it and we'll have a double-release
    716 # mess.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    385 try:
--> 386     self._validate_conn(conn)
    387 except (SocketTimeout, BaseSSLError) as e:
    388     # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn)
   1041 if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1042     conn.connect()
   1044 if not conn.is_verified:

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connection.py:419, in HTTPSConnection.connect(self)
    417     context.load_default_certs()
--> 419 self.sock = ssl_wrap_socket(
    420     sock=conn,
    421     keyfile=self.key_file,
    422     certfile=self.cert_file,
    423     key_password=self.key_password,
    424     ca_certs=self.ca_certs,
    425     ca_cert_dir=self.ca_cert_dir,
    426     ca_cert_data=self.ca_cert_data,
    427     server_hostname=server_hostname,
    428     ssl_context=context,
    429     tls_in_tls=tls_in_tls,
    430 )
    432 # If we're using all defaults and the connection
    433 # is TLSv1 or TLSv1.1 we throw a DeprecationWarning
    434 # for the host.

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/util/ssl_.py:418, in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
    417 if key_password is None:
--> 418     context.load_cert_chain(certfile, keyfile)
    419 else:

SSLError: [X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/adapters.py:486, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    485 try:
--> 486     resp = conn.urlopen(
    487         method=request.method,
    488         url=url,
    489         body=request.body,
    490         headers=request.headers,
    491         redirect=False,
    492         assert_same_host=False,
    493         preload_content=False,
    494         decode_content=False,
    495         retries=self.max_retries,
    496         timeout=timeout,
    497         chunked=chunked,
    498     )
    500 except (ProtocolError, OSError) as err:

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    785     e = ProtocolError("Connection aborted.", e)
--> 787 retries = retries.increment(
    788     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    789 )
    790 retries.sleep()

File /opt/pysetup/.venv/lib/python3.8/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    591 if new_retry.is_exhausted():
--> 592     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='ws-uv.canfar.net', port=443): Max retries exceeded with url: /skaha/v0/session?name=masonng-baseband-11&image=images.canfar.net%2Fchimefrb-public%2Fbaseband-analysis%3Alatest&cores=2&ram=16&kind=headless&cmd=workflow&args=run+--site%3Dcanfar+baseband-masonng&env=SITE%3Dcanfar&env=CHIME_FRB_ACCESS_TOKEN%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoibWFzb25uZyIsImV4cCI6MTczMDg1MjI2MiwiaXNzIjoiZnJiLW1hc3RlciIsImlhdCI6MTczMDg1MDQ2Mn0.J8T85nZpRTHGpgz7HvoV0AA_tig7A7nXjI_bvP9C6o0&env=CHIME_FRB_REFRESH_TOKEN%3D6c8c6282db025c5baf6ae348c9be5f43fd9a9eef99984926&env=REPLICA_ID%3D11&env=REPLICA_COUNT%3D20 (Caused by SSLError(SSLError(116, '[X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)')))

During handling of the above exception, another exception occurred:

SSLError                                  Traceback (most recent call last)
Cell In[131], line 17
      1 ### spawning 20 containers for all the baseband pipeline stuff (beamforming, merge, analysis)
      2 payload2 = {
      3     'name': 'masonng-baseband',
      4     'image': 'images.canfar.net/chimefrb-public/baseband-analysis:latest',
   (...)
     15     'replicas': 20
     16 }
---> 17 sid2 = s.create(**payload2)

File /opt/pysetup/.venv/lib/python3.8/site-packages/skaha/session.py:236, in Session.create(self, name, image, cores, ram, kind, gpu, cmd, args, env, replicas)
    234     arguments.append({"url": self.server, "params": payload})
    235 loop = get_event_loop()
--> 236 results = loop.run_until_complete(scale(self.session.post, arguments))
    237 responses: List[str] = []
    238 for response in results:

File /opt/pysetup/.venv/lib/python3.8/site-packages/nest_asyncio.py:90, in _patch_loop.<locals>.run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()

File /usr/local/lib/python3.8/asyncio/futures.py:178, in Future.result(self)
    176 self.__log_traceback = False
    177 if self._exception is not None:
--> 178     raise self._exception
    179 return self._result

File /usr/local/lib/python3.8/asyncio/tasks.py:282, in Task.__step(***failed resolving arguments***)
    280         result = coro.send(None)
    281     else:
--> 282         result = coro.throw(exc)
    283 except StopIteration as exc:
    284     if self._must_cancel:
    285         # Task is cancelled right before coro stops.

File /opt/pysetup/.venv/lib/python3.8/site-packages/skaha/utils/threaded.py:35, in scale(function, arguments)
     30 loop = asyncio.get_event_loop()
     31 futures = [
     32     loop.run_in_executor(executor, partial(function, **arguments[index]))
     33     for index in range(workers)
     34 ]
---> 35 return await asyncio.gather(*futures)

File /usr/local/lib/python3.8/asyncio/tasks.py:349, in Task.__wakeup(self, future)
    347 def __wakeup(self, future):
    348     try:
--> 349         future.result()
    350     except BaseException as exc:
    351         # This may also be a cancellation.
    352         self.__step(exc)

File /usr/local/lib/python3.8/concurrent/futures/thread.py:57, in _WorkItem.run(self)
     54     return
     56 try:
---> 57     result = self.fn(*self.args, **self.kwargs)
     58 except BaseException as exc:
     59     self.future.set_exception(exc)

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:635, in Session.post(self, url, data, json, **kwargs)
    624 def post(self, url, data=None, json=None, **kwargs):
    625     r"""Sends a POST request. Returns :class:`Response` object.
    626 
    627     :param url: URL for the new :class:`Request` object.
   (...)
    632     :rtype: requests.Response
    633     """
--> 635     return self.request("POST", url, data=data, json=json, **kwargs)

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    582 send_kwargs = {
    583     "timeout": timeout,
    584     "allow_redirects": allow_redirects,
    585 }
    586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
    589 return resp

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
    698 start = preferred_clock()
    700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
    703 # Total elapsed time of the request (approximately)
    704 elapsed = preferred_clock() - start

File /opt/pysetup/.venv/lib/python3.8/site-packages/requests/adapters.py:517, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    513         raise ProxyError(e, request=request)
    515     if isinstance(e.reason, _SSLError):
    516         # This branch is for urllib3 v1.22 and later.
--> 517         raise SSLError(e, request=request)
    519     raise ConnectionError(e, request=request)
    521 except ClosedPoolError as e:

SSLError: HTTPSConnectionPool(host='ws-uv.canfar.net', port=443): Max retries exceeded with url: /skaha/v0/session?name=masonng-baseband-11&image=images.canfar.net%2Fchimefrb-public%2Fbaseband-analysis%3Alatest&cores=2&ram=16&kind=headless&cmd=workflow&args=run+--site%3Dcanfar+baseband-masonng&env=SITE%3Dcanfar&env=CHIME_FRB_ACCESS_TOKEN%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoibWFzb25uZyIsImV4cCI6MTczMDg1MjI2MiwiaXNzIjoiZnJiLW1hc3RlciIsImlhdCI6MTczMDg1MDQ2Mn0.J8T85nZpRTHGpgz7HvoV0AA_tig7A7nXjI_bvP9C6o0&env=CHIME_FRB_REFRESH_TOKEN%3D6c8c6282db025c5baf6ae348c9be5f43fd9a9eef99984926&env=REPLICA_ID%3D11&env=REPLICA_COUNT%3D20 (Caused by SSLError(SSLError(116, '[X509: KEY_VALUES_MISMATCH] key values mismatch (_ssl.c:4071)')))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants