Skip to content

Commit

Permalink
start addressing PR comments
Browse files Browse the repository at this point in the history
  • Loading branch information
atvaccaro committed Aug 28, 2023
1 parent 1cc440a commit 1b30a04
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 34 deletions.
12 changes: 10 additions & 2 deletions .github/workflows/build-dags.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,25 @@ jobs:
python-version: 3.11
- uses: abatilo/actions-poetry@v2
- name: run mypy and pytest
id: test
working-directory: dags
run: |
poetry export --with=dev --without-hashes --format=requirements.txt > requirements.txt
poetry run pip install -r requirements.txt
poetry run mypy .
echo "VERSION=$(poetry version)" >> "$GITHUB_OUTPUT"
# un-comment and move up once we have tests; pytest exits with an exit code if no tests are found
# poetry run pytest
env:
RAW_BUCKET: gs://this-does-not-exist-raw
PARSED_BUCKET: gs://this-does-not-exist-parsed
- uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v4
with:
context: dags
push: false
tags: 'ghcr.io/jarvusinnovations/transit-data-analytics-demo/dags:test'
push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
tags: 'ghcr.io/jarvusinnovations/transit-data-analytics-demo/dags:${{ steps.test.outputs.VERSION }}'
4 changes: 2 additions & 2 deletions .github/workflows/build-fetcher-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ jobs:
- uses: docker/build-push-action@v4
with:
context: fetcher
push: false
tags: 'ghcr.io/jarvusinnovations/transit-data-analytics-demo/fetcher:test'
push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
tags: 'ghcr.io/jarvusinnovations/transit-data-analytics-demo/fetcher:latest'
1 change: 1 addition & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
# sqlfluff needs dbt to be set up and authenticated
- uses: pre-commit/[email protected]
env:
# skip sqlfluff for now; it's not authenticating to bigquery properly
SKIP: sqlfluff-lint
BIGQUERY_SERVICE_ACCOUNT: /tmp/keyfile
DBT_PROFILES_DIR: warehouse
Expand Down
37 changes: 9 additions & 28 deletions dags/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,26 @@ This is a [Dagster](https://dagster.io/) project scaffolded with [`dagster proje

## Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.

First, ensure that poetry is installed and then install the dependencies.
```bash
pip install -e ".[dev]"
curl -sSL https://install.python-poetry.org | python3 -
poetry install
```

Then, start the Dagster UI web server:

Then, start the Dagster UI web server (optionally specifying a port):
```bash
dagster dev
poetry run dagster dev <--port 1234>
```

Open http://localhost:3000 with your browser to see the project.

You can start writing assets in `dags/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.

## Development


### Adding new Python dependencies

You can specify new Python dependencies in `setup.py`.
Open http://localhost:<port, 3000 default> with your browser to see the project.

### Unit testing

Tests are in the `dags_tests` directory and you can run tests using `pytest`:

```bash
pytest dags_tests
poetry run pytest dags_tests
```

### Schedules and sensors

If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

## Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the [Dagster Cloud Documentation](https://docs.dagster.cloud) to learn more.
### Deployment
Dagster itself is deployed via hologit and Helm; the [values file](../kubernetes/values/prod-dagster.yml) contains any Kubernetes overrides. The dags/source code in this folder are deployed by pushing a Docker image (currently `ghcr.io/jarvusinnovations/transit-data-analytics-demo/dags:latest` built from [this folder](./Dockerfile)) that is then referenced by a user code deployment in the values.
5 changes: 4 additions & 1 deletion dags/dags/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ class HivePartitionedPydanticGCSIOManager(PickledObjectGCSIOManager):
def get_path_for_partition(
self, context: Union[InputContext, OutputContext], path: UPath, partition: str
) -> "UPath":
"""Override this method if you want to use a different partitioning scheme
"""
(Docs taken from parent class)
Override this method if you want to use a different partitioning scheme
(for example, if the saving function handles partitioning instead).
The extension will be added later.
Expand Down
2 changes: 1 addition & 1 deletion dags/dags/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def base64url(self) -> str:
@property
def filename(self) -> str:
params_with_page = {
**{kv.key: kv.value for kv in self.config.query if kv.value}, # exclude secrets
**{kv.key: kv.value for kv in self.config.query if kv.value}, # excludes secrets
**{kv.key: kv.value for kv in self.page},
}
url = requests.Request(url=self.config.url, params=params_with_page).prepare().url
Expand Down

0 comments on commit 1b30a04

Please sign in to comment.