-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add endpoint-specific Endpoint classes #213
base: master
Are you sure you want to change the base?
Conversation
Adds observation endpoint and related tests
Adds node endpoint class and related tests
Adds resolve endpoint class and related tests
Removes all references to Sparql endpoint (and tests)
Change test to use a `return_value` instead of a `side_effect`, for consistency with other tests
Provide default empty `facet_data`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review, I'll take a look at the test classes later today!
datacommons_client/endpoints/node.py
Outdated
if isinstance(expression, str): | ||
return expression | ||
|
||
return (f"[{', '.join(expression)}]" | ||
if isinstance(expression, list) else expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified to only need one isinstance
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed!
datacommons_client/endpoints/node.py
Outdated
Args: | ||
api (API): The API instance providing the environment configuration | ||
(base URL, headers, authentication) to be used for requests. | ||
max_pages (Optional[int]|None): Optionally, set the maximum number of pages to fetch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is |None
is redundant with Optional
?
datacommons_client/endpoints/node.py
Outdated
nodes=["geoId/06"], | ||
property="<-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to node_dcids
and expression
?
datacommons_client/endpoints/node.py
Outdated
if out: | ||
expression = f"->{expression}" | ||
if constraints: | ||
expression += f"{{{constraints}}}" | ||
else: | ||
expression = f"<-{expression}" | ||
if constraints: | ||
expression += f"{{{constraints}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicate adding constraints? Maybe have arrow direction be on one line like line 89?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better!
datacommons_client/endpoints/node.py
Outdated
# Normalize the input expression | ||
expression = _normalize_expression_to_string(expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip normalizing here since the payload class also knows to normalize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the expression
, the payload currently only checks that it gets a string (and it currently only accepts a string).
I currently normalize here to enable expressions such as:
->[name, latitude, longitude]
which may contain a direction, multiple properties, and even a constraint.
In the process of building them, the properties need to be listed as a comma-separated string, per this bit of the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the normalization function to the payload script and simplified where we're calling it.
from datacommons_client.endpoints.response import ResolveResponse | ||
|
||
|
||
def flatten_resolve_response(data: ResolveResponse) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make the return type stricter? Maybe dict[str, list[str] | str]
or is that not supported by Python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supported! Changed.
return items | ||
|
||
|
||
def resolve_correspondence_expression(from_type: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is meant to be a private helper for ResolveEndpoint. I'm new to the Python styleguide but I think the recommendation in this case is to append a single underscore to the start of the function name: https://google.github.io/styleguide/pyguide#3162-naming-conventions
# Send the request and return the response | ||
return ResolveResponse.from_json(self.post(payload)) | ||
|
||
def fetch_dcid_by_name(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: fetch_dcid
-> fetch_dcids
here and elsewhere for consistency with other methods that can handle either one or multiple inputs.
Fetches DCIDs for entities by their geographic coordinates. | ||
|
||
Args: | ||
latitude (str): Latitude of the entity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Include example values for lat and lon
coordinates = f"{latitude}#{longitude}" | ||
return self.fetch(node_dcids=coordinates, expression=expression) | ||
|
||
def fetch_from_type_to_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the design this was called get_entity_correspondence_dictionary
. WDYT of a name that splits the difference, maybe something like fetch_entity_type_correspondence
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name because it was no longer a dictionary (originally we weren't modelling the API responses, now it returns a ResolveResponse
(which can be used to produce the dictionary).
But I agree that your suggested name is a better option.
from datacommons_client.endpoints.response import ObservationResponse | ||
|
||
|
||
@patch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to use base function mocks here vs mock of API elsewhere? If not my preference would be to use one approach for all endpoint tests, with a slight preference for mocking API because it makes the tests more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all - you're right! I've simplified the observation and resolve tests to match using MagicMock for the API, like in node.
@patch( | ||
"datacommons_client.endpoints.base.post_request", | ||
return_value={"success": True}, | ||
) | ||
@patch( | ||
"datacommons_client.endpoints.base.check_instance_is_valid", | ||
return_value="https://custom.api/v2", | ||
) | ||
def test_endpoint_post_request(mock_check_instance, mock_post_request): | ||
"""Tests making a POST request using the Endpoint object.""" | ||
api = API(url="https://custom.api/v2") | ||
endpoint = Endpoint(endpoint="node", api=api) | ||
payload = {"key": "value"} | ||
|
||
response = endpoint.post(payload=payload, max_pages=5) | ||
assert response == {"success": True} | ||
mock_post_request.assert_called_once_with( | ||
url="https://custom.api/v2/node", | ||
payload=payload, | ||
headers=api.headers, | ||
max_pages=5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm duplicate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooops
return post_request(url=url, | ||
payload=payload, | ||
headers=self.headers, | ||
max_pages=max_pages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using max_pages
, is nextToken
still returned as part of the response? I wonder if we ought to strip it out when it can't be used via this client.
I'm curious too about the use case for max_pages
and whether instead it might be preferable to still expose the ability to use the token. Or in any case if there should be some way to tell whether the response contains complete or truncated results (I suppose that's what leaving the token in the response accomplishes! Okay talking myself around to it...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that we discuss this in more detail. In short:
max_pages
is really only exposed to the user when they instantiate anEndpoint
class which supports it. Right now that's onlyNodeEndpoint
. It allows the user to limit the number of 'pages' fetched. IfNone
then all are fetched.- In its current implementation, it will fetch all pages up to
max_pages
or once no more pages are left (whichever happens first).
The whole thing is sort of invisible to the user (in my thinking for convenience and simplicity). In practice, once the response object gets returned to the user (in the Node case ObservationResponse
) the next_token is always None
... which could be an argument to remove it, to avoid creating the impression that a full response was returned when it may have gotten truncated by max_pages
. There isn't much utility for the next_token
from any of these responses, since we don't have anything that allows the user to use it directly (other than a very low-level api call).
A couple of options:
- We could return the actual
next_token
as part of the Response object. In that case,None
would mean the full content was returned and a token would mean that it was truncated bymax_pages
. - Instead of returning a
next_token
(given the lack of practical use for it), we could add a flag liketruncated = True
to the Response object, to let the user know (very explicitly) that the response is not complete.
What do you think?
) | ||
|
||
# Check the response | ||
assert isinstance(response, ObservationResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also assert on the contents of the response? Doesn't have to be in every test, we can have a dedicated test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow - could you clarify the type of test you'd like to see?
I've written response tests here, which is why I only check the type. But happy to write additional tests if you see that tests for parts of the logic are missing.
- Refactor expression normalization - renamed `expression` to `properties` in cases where the expression is built from the method parameters - updated tests accordingly
MagicMock API instead of individual methods
MagicMock API instead of individual methods
MagicMock API instead of individual methods
Thank you so much @hqpho, this was all very helpful! I've addressed everything, except for:
|
Note: this was previously submitted as #211. I closed that one since the formatting changes had made it difficult to rebase and keep things relatively clean. This is basically the same code, but with a much cleaner Git history, and based off of the current
master
branch (with fixed tests and the expected formatting).@keyurva and @hqpho - I just requested a review from you, but please feel free to redirect as needed.
Sorry in advance for the slightly bigger PR. Since the different Endpoint classes were just more specific implementations of the base Endpoint, I decided to put them all together instead of 3 separate PRs for quite similar behaviour.
This PR introduces several changes to the package. Broadly, it:
Endpoints:
datacommons_client/endpoints/node.py
datacommons_client/endpoints/observation.py
datacommons_client/endpoints/resolve.py
Other changes
datacommons_client/endpoints/base.py
: Updatedpost
method to include an optionalmax_pages
parameter for pagination support.datacommons_client/endpoints/payloads.py
: ModifiedObservationRequestPayload
to make theselect
field optional and added logic to set default values if not provided.