Cognite `extractor-utils` REST extension

The REST extension for Cognite extractor-utils provides a way to easily write your own extractors for RESTful source systems.

The library is currently under development, and should not be used in production environments yet.

Overview

The REST extension for extractor utils templetizes how the extractor will make HTTP requests to the source, automatically serializes the response into user-defined DTO classes, and handles uploading of data to CDF.

The only part of the extractor necessary to for a user to implement are

Describing how HTTP requests should be constructed using pre-built function decorators
Describing the response schema using Python dataclasses
Implementing a mapping from the source data model to the CDF data model

For example, consider CDF's Events API as a source. We could describe the response schema as an EventsList dataclass:

@dataclass
class RawEvent:
    externalId: Optional[str]
    dataSetId: Optional[int]
    startTime: Optional[int]
    endTime: Optional[int]
    type: Optional[str]
    subtype: Optional[str]
    description: Optional[str]
    metadata: Optional[Dict[str, str]]
    assetIds: Optional[List[Optional[int]]]
    source: Optional[str]
    id: Optional[int]
    lastUpdatedTime: Optional[int]
    createdTime: Optional[int]


@dataclass
class EventsList:
    items: List[RawEvent]
    nextCursor: Optional[str]

We can then write a handler that takes in one of these EventLists, and returns CDF Events, as represented by instances of the Event class from the cognite.extractorutils.rest.typing module.

extractor = RestExtractor(
    name="Event extractor",
    description="Extractor from CDF events to CDF events",
    version="1.0.0",
    base_url=f"https://api.cognitedata.com/api/v1/projects/{os.environ['COGNITE_PROJECT']}/",
    headers={"api-key": os.environ["COGNITE_API_KEY"]},
)

@extractor.get("events", response_type=EventsList)
def get_events(events: EventsList) -> Generator[Event, None, None]:
    for event in events.items:
        yield Event(
            external_id=f"testy-{event.id}",
            description=event.description,
            start_time=event.startTime,
            end_time=event.endTime,
            type=event.type,
            subtype=event.subtype,
            metadata=event.metadata,
            source=event.source,
        )

with extractor:
    extractor.run()

A full example is provided in the example.py file.

The return type

If the return type is set to cognite.extractorutils.rest.http.JsonBody then the raw json payload will be passed to the handler. This is useful for cases where the payload is hard or impossible to describe with data classes.

If the return type is set to requests.Response, the raw response message itself is passed to the handler.

Lists at the root

Using Python dataclasses we're not able to express JSON structures where the root element is a list. To get around that responses of this nature will be automatically converted to something which can be modeled with Python dataclasses.

A JSON structure containing a list as it's root element will be converted to an object containing a single key, "items", which has the original JSON list as it's value, as in the example below.

[{"object_id": 1}, {"object_id": 2}, {"object_id": 3}]

will be converted to

{
    "items": [{"object_id": 1}, {"object_id": 2}, {"object_id": 3}]
}

This does not apply if the return type is set to JsonBody.

Contributing

We use poetry to manage dependencies and to administrate virtual environments. To develop extractor-utils, follow the following steps to set up your local environment:

Install poetry: (add --user if desirable)
```
$ pip install poetry
```

Clone repository:

$ git clone [email protected]:cognitedata/python-extractor-utils-rest.git

Move into the newly created local repository:
```
$ cd python-extractor-utils-rest
```
Create virtual environment and install dependencies:
```
$ poetry install
```

All code must pass typing and style checks to be merged. It is recommended to install pre-commit hooks to ensure that these checks pass before commiting code:

$ poetry run pre-commit install

This project adheres to the Contributor Covenant v2.0 as a code of conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
cognite/extractorutils/rest		cognite/extractorutils/rest
tests/unit		tests/unit
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
example.py		example.py
example_config.yaml		example_config.yaml
example_nested.py		example_nested.py
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
renovate.json		renovate.json
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cognite `extractor-utils` REST extension

Overview

The return type

Lists at the root

Contributing

About

Releases

Contributors 7

Languages

License

cognitedata/python-extractor-utils-rest

Folders and files

Latest commit

History

Repository files navigation

Cognite extractor-utils REST extension

Overview

The return type

Lists at the root

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 7

Languages

Cognite `extractor-utils` REST extension