Skip to content

Commit

Permalink
feat: rework db and alembic docker images (teaxyz#2)
Browse files Browse the repository at this point in the history
* Use Ubuntu and Python base images
* Don't mount a script volume for the db
* Don't use a volume for app code in alembic
* Move db creation DDL to alembic image

Co-authored-by: Toby Padilla <[email protected]>
  • Loading branch information
sanchitram1 and toby authored Oct 4, 2024
1 parent 5c50c3f commit 0e6f931
Show file tree
Hide file tree
Showing 9 changed files with 49 additions and 82 deletions.
38 changes: 3 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,38 +12,8 @@ are 3 services to it:

## Setup

first run `mkdir -p data/{crates,pkgx,homebrew,npm,pypi,rubys}`, to setup the data
directory where the fetchers will store the data.

then, running `docker compose up` will setup the db and run the pipeline. a successful
run will look something like this:

```
db-1 | 2024-09-23 18:33:31.199 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
db-1 | 2024-09-23 18:33:31.199 UTC [1] LOG: listening on IPv6 address "::", port 5432
db-1 | 2024-09-23 18:33:31.202 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-1 | 2024-09-23 18:33:31.230 UTC [30] LOG: database system was shut down at 2024-09-23 18:04:05 UTC
db-1 | 2024-09-23 18:33:31.242 UTC [1] LOG: database system is ready to accept connections
alembic-1 | db:5432 - accepting connections
alembic-1 | INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
alembic-1 | INFO [alembic.runtime.migration] Will assume transactional DDL.
alembic-1 | db currently at 0db06140525f (head)
alembic-1 | INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
alembic-1 | INFO [alembic.runtime.migration] Will assume transactional DDL.
alembic-1 | migrations run
alembic-1 exited with code 0
alembic-1 | postgresql://postgres:s3cr3t@db:5432/chai
alembic-1 | s3cr3t
pipeline-1 | 0.01: [crates_orchestrator]: [DEBUG]: logging is working
pipeline-1 | 0.01: [main_pipeline]: [DEBUG]: logging is working
pipeline-1 | 0.01: [DB]: [DEBUG]: logging is working
pipeline-1 | 0.03: [DB]: [DEBUG]: created engine
pipeline-1 | 0.03: [DB]: [DEBUG]: created session
pipeline-1 | 0.03: [DB]: [DEBUG]: connected to postgresql://postgres:s3cr3t@db:5432/chai
pipeline-1 | 0.03: [crates_orchestrator]: fetching crates packages
pipeline-1 | 0.03: [crates_fetcher]: [DEBUG]: logging is working
pipeline-1 | 0.03: [crates_fetcher]: [DEBUG]: adding package manager crates
```
1. Run `docker compose build` to create the latest Docker images.
2. Run `docker compose up` to launch.

> [!IMPORTANT]
>
Expand All @@ -58,10 +28,8 @@ pipeline-1 | 0.03: [crates_fetcher]: [DEBUG]: adding package manager crates

if at all you need to do a hard reset, here's the steps

1. `rm -rf db/data`: removes all the data that was loaded into the db
1. `rm -rf .venv`: if you created a virtual environment for local dev, this removes it
1. `rm -rf data`: removes all the data the fetcher is putting
1. `docker system prune -a -f --volumes`: removes **everything** docker-related
2. `docker system prune -a -f --volumes`: removes **everything** docker-related

> [!WARNING]
>
Expand Down
19 changes: 11 additions & 8 deletions alembic/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
FROM pkgxdev/pkgx:latest
# WORKDIR /app

# # install alembic
# COPY .pkgx.yaml .
# RUN dev

RUN pkgx install alembic.sqlalchemy.org^1 psycopg.org/psycopg2^2 postgresql.org^16
FROM ubuntu:24.10
# RUN pkgx install alembic.sqlalchemy.org^1 psycopg.org/psycopg2^2 postgresql.org^16
RUN apt -y update && apt -y upgrade
RUN apt -y install postgresql
RUN apt -y install alembic
RUN apt -y install python3-psycopg2
RUN apt -y install python3-sqlalchemy python3-sqlalchemy-ext
COPY . .
WORKDIR /alembic
RUN chmod +x run_migrations.sh
ENTRYPOINT ["/alembic/run_migrations.sh"]
File renamed without changes.
17 changes: 14 additions & 3 deletions alembic/run_migrations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,22 @@ until pg_isready -h db -p 5432 -U postgres; do
sleep 2
done

# create db if needed
# if [ "$( psql -XtAc "SELECT 1 FROM pg_database WHERE datname='chai'" 2&>/dev/null)" = '1' ]
if [ "$( psql -XtAc "SELECT 1 FROM pg_database WHERE datname='chai'" -h db -U postgres)" = '1' ]
then
echo "Database 'chai' already exists"
else
echo "Database 'chai' does not exist, creating..."
psql -U postgres -h db -f init-script.sql -a
fi

# migrate
echo "db currently at $(pkgx +alembic +psycopg.org/psycopg2 alembic current)"
if pkgx +alembic +psycopg.org/psycopg2 alembic upgrade head; then
echo "db currently at $(alembic current)"
if alembic upgrade head
then
echo "migrations run successfully"
else
echo "migrations failed"
exit 1
fi
fi
23 changes: 11 additions & 12 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,26 @@ services:
ports:
- "5435:5432"
volumes:
- ./db/data:/var/lib/postgresql/data
- ./db/init-scripts/init-script.sql:/docker-entrypoint-initdb.d/create-database.sql
- ./data/db/data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d chai"]
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5

alembic:
build: alembic
build:
context: ./
dockerfile: ./alembic/Dockerfile
environment:
- CHAI_DATABASE_URL=postgresql://postgres:s3cr3t@db:5432/chai
- POSTGRES_PASSWORD=s3cr3t
volumes:
- .:/app
- PGPASSWORD=s3cr3t
depends_on:
db:
condition: service_healthy
working_dir: /app/alembic
working_dir: /alembic
entrypoint: ["./run_migrations.sh"]

pipeline:
build: src
environment:
Expand All @@ -51,8 +50,8 @@ services:
volumes:
- ./monitor:/app
- /var/run/docker.sock:/var/run/docker.sock
working_dir: /app
depends_on:
pipeline:
condition: service_started
entrypoint: ["./run_monitor.sh"]
working_dir: /usr/src/monitor
entrypoint: ["./run_monitor.sh"]
8 changes: 4 additions & 4 deletions monitor/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM pkgxdev/pkgx:latest
RUN pkgx install python.org^3.11 astral.sh/uv^0
COPY requirements.txt .
RUN CC=clang pkgx +clang@18 uv pip install -r requirements.txt --system
FROM python:3
WORKDIR /usr/src/monitor
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
2 changes: 1 addition & 1 deletion monitor/run_monitor.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#!/bin/bash
pkgx +python.org^3.11 python -u main.py
python -u main.py
16 changes: 4 additions & 12 deletions src/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,4 @@
FROM pkgxdev/pkgx
# WORKDIR /app

# gets pkgx setup
# COPY .pkgx.yaml .
# RUN dev

# just install everything

RUN pkgx install python.org^3.11 astral.sh/uv^0 postgresql.org^16
COPY requirements.txt .
RUN CC=clang pkgx +clang@18 uv pip install -r requirements.txt --system
FROM python:3
WORKDIR /usr/src/pipeline
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
8 changes: 1 addition & 7 deletions src/run_pipeline.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,7 @@
#!/bin/bash

# wait for db to be ready
until pg_isready -h db -p 5432 -U postgres; do
echo "waiting for database..."
sleep 2
done

# make directory structure
# working_dir is /app
mkdir -p data/{crates,pkgx,homebrew,npm,pypi,rubys}

pkgx +python^3.11 +postgresql.org^16 python -u src/pipeline/main.py crates
python -u src/pipeline/main.py crates

0 comments on commit 0e6f931

Please sign in to comment.