Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint-specific Endpoint classes #213

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

jm-rivera
Copy link
Contributor

@jm-rivera jm-rivera commented Jan 24, 2025

Note: this was previously submitted as #211. I closed that one since the formatting changes had made it difficult to rebase and keep things relatively clean. This is basically the same code, but with a much cleaner Git history, and based off of the current master branch (with fixed tests and the expected formatting).

@keyurva and @hqpho - I just requested a review from you, but please feel free to redirect as needed.
Sorry in advance for the slightly bigger PR. Since the different Endpoint classes were just more specific implementations of the base Endpoint, I decided to put them all together instead of 3 separate PRs for quite similar behaviour.


This PR introduces several changes to the package. Broadly, it:

  • Introduces the specific Endpoint classes and tests (details below)
  • Removes the Sparql endpoint
  • Adds max_pages support to the Endpoints

Endpoints:

Other changes

Adds observation endpoint and related tests
Adds node endpoint class and related tests
Adds resolve endpoint class and related tests
Removes all references to Sparql endpoint (and tests)
Change test to use a `return_value` instead of a `side_effect`, for consistency with other tests
Provide default empty `facet_data`
@jm-rivera jm-rivera requested review from keyurva and hqpho January 24, 2025 02:50
@jm-rivera jm-rivera self-assigned this Jan 24, 2025
@jm-rivera jm-rivera changed the title Add node-specific Endpoint classes Add endpoint-specific Endpoint classes Jan 24, 2025
Copy link
Collaborator

@hqpho hqpho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review, I'll take a look at the test classes later today!

Comment on lines 11 to 15
if isinstance(expression, str):
return expression

return (f"[{', '.join(expression)}]"
if isinstance(expression, list) else expression)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to only need one isinstance check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed!

Args:
api (API): The API instance providing the environment configuration
(base URL, headers, authentication) to be used for requests.
max_pages (Optional[int]|None): Optionally, set the maximum number of pages to fetch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is |None is redundant with Optional?

Comment on lines 47 to 48
nodes=["geoId/06"],
property="<-"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to node_dcids and expression?

Comment on lines 119 to 126
if out:
expression = f"->{expression}"
if constraints:
expression += f"{{{constraints}}}"
else:
expression = f"<-{expression}"
if constraints:
expression += f"{{{constraints}}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplicate adding constraints? Maybe have arrow direction be on one line like line 89?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better!

Comment on lines 53 to 54
# Normalize the input expression
expression = _normalize_expression_to_string(expression)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip normalizing here since the payload class also knows to normalize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the expression, the payload currently only checks that it gets a string (and it currently only accepts a string).

I currently normalize here to enable expressions such as:
->[name, latitude, longitude] which may contain a direction, multiple properties, and even a constraint.

In the process of building them, the properties need to be listed as a comma-separated string, per this bit of the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the normalization function to the payload script and simplified where we're calling it.

from datacommons_client.endpoints.response import ResolveResponse


def flatten_resolve_response(data: ResolveResponse) -> dict[str, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make the return type stricter? Maybe dict[str, list[str] | str] or is that not supported by Python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supported! Changed.

return items


def resolve_correspondence_expression(from_type: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is meant to be a private helper for ResolveEndpoint. I'm new to the Python styleguide but I think the recommendation in this case is to append a single underscore to the start of the function name: https://google.github.io/styleguide/pyguide#3162-naming-conventions

# Send the request and return the response
return ResolveResponse.from_json(self.post(payload))

def fetch_dcid_by_name(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fetch_dcid -> fetch_dcids here and elsewhere for consistency with other methods that can handle either one or multiple inputs.

Fetches DCIDs for entities by their geographic coordinates.

Args:
latitude (str): Latitude of the entity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Include example values for lat and lon

coordinates = f"{latitude}#{longitude}"
return self.fetch(node_dcids=coordinates, expression=expression)

def fetch_from_type_to_type(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the design this was called get_entity_correspondence_dictionary. WDYT of a name that splits the difference, maybe something like fetch_entity_type_correspondence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name because it was no longer a dictionary (originally we weren't modelling the API responses, now it returns a ResolveResponse (which can be used to produce the dictionary).

But I agree that your suggested name is a better option.

from datacommons_client.endpoints.response import ObservationResponse


@patch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to use base function mocks here vs mock of API elsewhere? If not my preference would be to use one approach for all endpoint tests, with a slight preference for mocking API because it makes the tests more concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all - you're right! I've simplified the observation and resolve tests to match using MagicMock for the API, like in node.

Comment on lines +119 to +139
@patch(
"datacommons_client.endpoints.base.post_request",
return_value={"success": True},
)
@patch(
"datacommons_client.endpoints.base.check_instance_is_valid",
return_value="https://custom.api/v2",
)
def test_endpoint_post_request(mock_check_instance, mock_post_request):
"""Tests making a POST request using the Endpoint object."""
api = API(url="https://custom.api/v2")
endpoint = Endpoint(endpoint="node", api=api)
payload = {"key": "value"}

response = endpoint.post(payload=payload, max_pages=5)
assert response == {"success": True}
mock_post_request.assert_called_once_with(
url="https://custom.api/v2/node",
payload=payload,
headers=api.headers,
max_pages=5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm duplicate test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops

return post_request(url=url,
payload=payload,
headers=self.headers,
max_pages=max_pages)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using max_pages, is nextToken still returned as part of the response? I wonder if we ought to strip it out when it can't be used via this client.

I'm curious too about the use case for max_pages and whether instead it might be preferable to still expose the ability to use the token. Or in any case if there should be some way to tell whether the response contains complete or truncated results (I suppose that's what leaving the token in the response accomplishes! Okay talking myself around to it...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we discuss this in more detail. In short:

  • max_pages is really only exposed to the user when they instantiate an Endpoint class which supports it. Right now that's only NodeEndpoint. It allows the user to limit the number of 'pages' fetched. If None then all are fetched.
  • In its current implementation, it will fetch all pages up to max_pages or once no more pages are left (whichever happens first).

The whole thing is sort of invisible to the user (in my thinking for convenience and simplicity). In practice, once the response object gets returned to the user (in the Node case ObservationResponse) the next_token is always None... which could be an argument to remove it, to avoid creating the impression that a full response was returned when it may have gotten truncated by max_pages. There isn't much utility for the next_token from any of these responses, since we don't have anything that allows the user to use it directly (other than a very low-level api call).

A couple of options:

  • We could return the actual next_token as part of the Response object. In that case, None would mean the full content was returned and a token would mean that it was truncated by max_pages.
  • Instead of returning a next_token (given the lack of practical use for it), we could add a flag like truncated = True to the Response object, to let the user know (very explicitly) that the response is not complete.

What do you think?

)

# Check the response
assert isinstance(response, ObservationResponse)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also assert on the contents of the response? Doesn't have to be in every test, we can have a dedicated test for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow - could you clarify the type of test you'd like to see?

I've written response tests here, which is why I only check the type. But happy to write additional tests if you see that tests for parts of the logic are missing.

@jm-rivera jm-rivera requested a review from hqpho January 27, 2025 02:32
@jm-rivera
Copy link
Contributor Author

Thank you so much @hqpho, this was all very helpful!

I've addressed everything, except for:

  • This comment where more details would be helpful
  • This comment where your views on the best way forward would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants