New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add endpoint-specific Endpoint classes #213

Open

jm-rivera wants to merge 24 commits into datacommonsorg:master from ONEcampaign:endpoints

Contributor

jm-rivera commented Jan 24, 2025 •

edited

Loading

Note: this was previously submitted as #211. I closed that one since the formatting changes had made it difficult to rebase and keep things relatively clean. This is basically the same code, but with a much cleaner Git history, and based off of the current master branch (with fixed tests and the expected formatting).

@keyurva and @hqpho - I just requested a review from you, but please feel free to redirect as needed.
Sorry in advance for the slightly bigger PR. Since the different Endpoint classes were just more specific implementations of the base Endpoint, I decided to put them all together instead of 3 separate PRs for quite similar behaviour.

This PR introduces several changes to the package. Broadly, it:

Introduces the specific Endpoint classes and tests (details below)
Removes the Sparql endpoint
Adds max_pages support to the Endpoints

Endpoints:

Other changes

datacommons_client/endpoints/base.py: Updated post method to include an optional max_pages parameter for pagination support.
datacommons_client/endpoints/payloads.py: Modified ObservationRequestPayload to make the select field optional and added logic to set default values if not provided.

jm-rivera added 8 commits

January 23, 2025 20:28


          Add default values to Observation

29a5a16


          Add max_pages

ee3bafd


          Observation endpoint

e4f9456

Adds observation endpoint and related tests


          Add node endpoint

6198f75

Adds node endpoint class and related tests


          Add resolve endpoint

Adds resolve endpoint class and related tests


          Remove sparql endpoint

066e6f6

Removes all references to Sparql endpoint (and tests)


          Update test_base.py

86399dc

Change test to use a `return_value` instead of a `side_effect`, for consistency with other tests


          Update observation.py

452581f

Provide default empty `facet_data`

jm-rivera requested review from keyurva and hqpho

January 24, 2025 02:50

jm-rivera self-assigned this

jm-rivera changed the title ~~Add node-specific Endpoint classes~~ Add endpoint-specific Endpoint classes

hqpho reviewed

View reviewed changes

Collaborator

hqpho left a comment

This is a partial review, I'll take a look at the test classes later today!

datacommons_client/endpoints/node.py Outdated

Comment on lines 11 to 15

+                if isinstance(expression, str):
+                  return expression
+                return (f"[{', '.join(expression)}]"
+                        if isinstance(expression, list) else expression)

Collaborator

hqpho Jan 24, 2025

Can this be simplified to only need one isinstance check?

Contributor Author

jm-rivera Jan 26, 2025

Yes indeed!

datacommons_client/endpoints/node.py Outdated

+                  Args:
+                      api (API): The API instance providing the environment configuration
+                          (base URL, headers, authentication) to be used for requests.
+                      max_pages (Optional[int]|None): Optionally, set the maximum number of pages to fetch.

Collaborator

hqpho Jan 24, 2025

Nit: Is |None is redundant with Optional?

datacommons_client/endpoints/node.py Outdated

Comment on lines 47 to 48

		nodes=["geoId/06"],
		property="<-"

Collaborator

hqpho Jan 24, 2025

Update to node_dcids and expression?

datacommons_client/endpoints/node.py Outdated

Comment on lines 119 to 126

+                  if out:
+                    expression = f"->{expression}"
+                    if constraints:
+                      expression += f"{{{constraints}}}"
+                  else:
+                    expression = f"<-{expression}"
+                    if constraints:
+                      expression += f"{{{constraints}}}"

Collaborator

hqpho Jan 24, 2025

Deduplicate adding constraints? Maybe have arrow direction be on one line like line 89?

Contributor Author

jm-rivera Jan 26, 2025

Yes, much better!

datacommons_client/endpoints/node.py Outdated

Comment on lines 53 to 54

		# Normalize the input expression
		expression = _normalize_expression_to_string(expression)

Collaborator

hqpho Jan 24, 2025

Can we skip normalizing here since the payload class also knows to normalize?

Contributor Author

jm-rivera Jan 26, 2025

For the expression, the payload currently only checks that it gets a string (and it currently only accepts a string).

I currently normalize here to enable expressions such as:
->[name, latitude, longitude] which may contain a direction, multiple properties, and even a constraint.

In the process of building them, the properties need to be listed as a comma-separated string, per this bit of the docs.

Contributor Author

jm-rivera Jan 26, 2025

I moved the normalization function to the payload script and simplified where we're calling it.

datacommons_client/endpoints/resolve.py Outdated

		from datacommons_client.endpoints.response import ResolveResponse


		def flatten_resolve_response(data: ResolveResponse) -> dict[str, Any]:

Collaborator

hqpho Jan 24, 2025

Is it possible to make the return type stricter? Maybe dict[str, list[str] | str] or is that not supported by Python?

Contributor Author

jm-rivera Jan 26, 2025

It is supported! Changed.

datacommons_client/endpoints/resolve.py Outdated

		return items


		def resolve_correspondence_expression(from_type: str,

Collaborator

hqpho Jan 24, 2025

It looks like this is meant to be a private helper for ResolveEndpoint. I'm new to the Python styleguide but I think the recommendation in this case is to append a single underscore to the start of the function name: https://google.github.io/styleguide/pyguide#3162-naming-conventions

datacommons_client/endpoints/resolve.py Outdated

+                  # Send the request and return the response
+                  return ResolveResponse.from_json(self.post(payload))
+                def fetch_dcid_by_name(self,

Collaborator

hqpho Jan 24, 2025

Nit: fetch_dcid -> fetch_dcids here and elsewhere for consistency with other methods that can handle either one or multiple inputs.

datacommons_client/endpoints/resolve.py

+                      Fetches DCIDs for entities by their geographic coordinates.
+                      Args:
+                          latitude (str): Latitude of the entity.

Collaborator

hqpho Jan 24, 2025

Optional: Include example values for lat and lon

datacommons_client/endpoints/resolve.py Outdated

+                  coordinates = f"{latitude}#{longitude}"
+                  return self.fetch(node_dcids=coordinates, expression=expression)
+                def fetch_from_type_to_type(

Collaborator

hqpho Jan 24, 2025

I see in the design this was called get_entity_correspondence_dictionary. WDYT of a name that splits the difference, maybe something like fetch_entity_type_correspondence?

Contributor Author

jm-rivera Jan 27, 2025

I changed the name because it was no longer a dictionary (originally we weren't modelling the API responses, now it returns a ResolveResponse (which can be used to produce the dictionary).

But I agree that your suggested name is a better option.

hqpho reviewed

View reviewed changes

datacommons_client/tests/endpoints/test_observation_endpoint.py Outdated

		from datacommons_client.endpoints.response import ObservationResponse


		@patch(

Collaborator

hqpho Jan 24, 2025

Is there a particular reason to use base function mocks here vs mock of API elsewhere? If not my preference would be to use one approach for all endpoint tests, with a slight preference for mocking API because it makes the tests more concise.

Contributor Author

jm-rivera Jan 27, 2025

Not at all - you're right! I've simplified the observation and resolve tests to match using MagicMock for the API, like in node.

datacommons_client/tests/endpoints/test_base.py

Comment on lines +119 to +139

+              @patch(
+                  "datacommons_client.endpoints.base.post_request",
+                  return_value={"success": True},
+              )
+              @patch(
+                  "datacommons_client.endpoints.base.check_instance_is_valid",
+                  return_value="https://custom.api/v2",
+              )
+              def test_endpoint_post_request(mock_check_instance, mock_post_request):
+                """Tests making a POST request using the Endpoint object."""
+                api = API(url="https://custom.api/v2")
+                endpoint = Endpoint(endpoint="node", api=api)
+                payload = {"key": "value"}
+                response = endpoint.post(payload=payload, max_pages=5)
+                assert response == {"success": True}
+                mock_post_request.assert_called_once_with(
+                    url="https://custom.api/v2/node",
+                    payload=payload,
+                    headers=api.headers,
+                    max_pages=5,

Collaborator

hqpho Jan 24, 2025

rm duplicate test?

Contributor Author

jm-rivera Jan 27, 2025

ooops

datacommons_client/endpoints/base.py

+                  return post_request(url=url,
+                                      payload=payload,
+                                      headers=self.headers,
+                                      max_pages=max_pages)

Collaborator

hqpho Jan 24, 2025

When using max_pages, is nextToken still returned as part of the response? I wonder if we ought to strip it out when it can't be used via this client.

I'm curious too about the use case for max_pages and whether instead it might be preferable to still expose the ability to use the token. Or in any case if there should be some way to tell whether the response contains complete or truncated results (I suppose that's what leaving the token in the response accomplishes! Okay talking myself around to it...)

Contributor Author

jm-rivera Jan 27, 2025

It's good that we discuss this in more detail. In short:

max_pages is really only exposed to the user when they instantiate an Endpoint class which supports it. Right now that's only NodeEndpoint. It allows the user to limit the number of 'pages' fetched. If None then all are fetched.
In its current implementation, it will fetch all pages up to max_pages or once no more pages are left (whichever happens first).

The whole thing is sort of invisible to the user (in my thinking for convenience and simplicity). In practice, once the response object gets returned to the user (in the Node case ObservationResponse) the next_token is always None... which could be an argument to remove it, to avoid creating the impression that a full response was returned when it may have gotten truncated by max_pages. There isn't much utility for the next_token from any of these responses, since we don't have anything that allows the user to use it directly (other than a very low-level api call).

A couple of options:

We could return the actual next_token as part of the Response object. In that case, None would mean the full content was returned and a token would mean that it was truncated by max_pages.
Instead of returning a next_token (given the lack of practical use for it), we could add a flag like truncated = True to the Response object, to let the user know (very explicitly) that the response is not complete.

What do you think?

datacommons_client/tests/endpoints/test_observation_endpoint.py

+                )
+                # Check the response
+                assert isinstance(response, ObservationResponse)

Collaborator

hqpho Jan 24, 2025

Can we also assert on the contents of the response? Doesn't have to be in every test, we can have a dedicated test for it.

Contributor Author

jm-rivera Jan 27, 2025

I'm not sure I follow - could you clarify the type of test you'd like to see?

I've written response tests here, which is why I only check the type. But happy to write additional tests if you see that tests for parts of the logic are missing.

jm-rivera added 16 commits

January 26, 2025 12:38


          Simplify type check for expressions

3e2c86b


          Update docstrings

0b0d62a


          Deduplicate adding constraints

a8031fa


          Update docstrings

73c9cb7


          Improve node

e54cd9d

- Refactor expression normalization
- renamed `expression` to `properties` in cases where the expression is built from the method parameters
- updated tests accordingly


          Default values

ca864e0


          Rename fetch_latest_observations

cd683e1


          stricter return for flatted response

587abda


          Rename _resolve_correspondence_expression

19500c0


          Rename fetch_dcids_by_name

747b183


          Add resolve example

f55c200


          Rename fetch_entity_type_correspondence

232359f


          Simplify observation endpoint tests

7e87231

MagicMock API instead of individual methods


          Update test_observation_endpoint.py

5898efb

MagicMock API instead of individual methods


          Update test_resolve_endpoint.py

001edd7

MagicMock API instead of individual methods


          Remove duplicate test

afa709f

jm-rivera requested a review from hqpho

January 27, 2025 02:32

Contributor Author

jm-rivera commented Jan 27, 2025

Thank you so much @hqpho, this was all very helpful!

I've addressed everything, except for:

This comment where more details would be helpful
This comment where your views on the best way forward would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet