Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add Support for GCP #193

Merged
merged 117 commits into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
0fb9477
Initial Dockerfile draft
haroldrubio Feb 17, 2024
21f15a7
Add new endpoints
haroldrubio Feb 17, 2024
eeacdcb
Build predict expertise function
haroldrubio Feb 17, 2024
bbf1041
Expect rawPredict format
haroldrubio Feb 17, 2024
398a743
Use entrypoint instead of cmd
haroldrubio Feb 17, 2024
e959335
Fix entrypoint
haroldrubio Feb 18, 2024
f0775eb
Changes to config
haroldrubio Feb 18, 2024
c89cd58
Add production config
haroldrubio Feb 18, 2024
72b41dc
Set containerized
haroldrubio Feb 18, 2024
cc8e602
Remove Redis interaction if container
haroldrubio Feb 18, 2024
2f91392
Adjust model directories
haroldrubio Feb 18, 2024
eda9927
From NVIDIA container
haroldrubio Feb 18, 2024
e8a96f4
Fetch expertise utils
haroldrubio Feb 18, 2024
18cd0e3
Avoid artifact download
haroldrubio Feb 18, 2024
9f17f6b
Move artifact copying to entrypoint
haroldrubio Feb 18, 2024
28bcee5
Add container flag
haroldrubio Feb 18, 2024
4b1e2bf
Update container flag
haroldrubio Feb 18, 2024
490fc2e
Remove type
haroldrubio Feb 19, 2024
006f70b
Add support for instances in request
haroldrubio Feb 19, 2024
6ea81e8
Move artifact loading
haroldrubio Feb 19, 2024
545e1f0
Async load artifacts
haroldrubio Feb 19, 2024
145f6ab
Add startup route to check for artifact loading
haroldrubio Feb 19, 2024
96224f3
Don't call artifacts
haroldrubio Feb 19, 2024
cebd266
Allow token pass in body
haroldrubio Feb 19, 2024
0e57ca0
Pass token
haroldrubio Feb 19, 2024
99aa6a5
Properly handle no token
haroldrubio Feb 19, 2024
0370dff
Isolate token
haroldrubio Feb 19, 2024
c64d273
Load into dict if not a dict
haroldrubio Feb 20, 2024
58d8a69
Add flag to block on artifacts
haroldrubio Feb 20, 2024
e6ad0bf
Rollback blocking
haroldrubio Feb 20, 2024
04b31fe
Block on loading artifacts
haroldrubio Feb 20, 2024
b934318
Log model ready
haroldrubio Feb 20, 2024
bd0acb9
Log URI and bucket
haroldrubio Feb 20, 2024
bd129e1
Point to /app
haroldrubio Feb 20, 2024
6b2cabb
Fix return schema
haroldrubio Feb 20, 2024
2fc2c92
Index into predictions list
haroldrubio Feb 20, 2024
fd2b13b
Fix blob prefix
haroldrubio Feb 20, 2024
f498173
Remove unused script
haroldrubio Feb 20, 2024
bec5001
Fix prefix parsing
haroldrubio Feb 20, 2024
d036d6f
Support reviewer_ids
haroldrubio Feb 21, 2024
21b585d
Fix reviewer IDs bug
haroldrubio Feb 21, 2024
291bbf5
Fix bug in expertise invitation for reviewerIds
haroldrubio Feb 21, 2024
394dfe4
Merge instances on reviewerIds
haroldrubio Feb 21, 2024
16b1e00
Return list of predictions
haroldrubio Feb 21, 2024
f4dad73
Parsing must happen in routes
haroldrubio Feb 21, 2024
7863a1c
Check correctly formed dataset
haroldrubio Feb 22, 2024
c420f22
Fix subscriptable bug
haroldrubio Feb 22, 2024
872c7a7
Remove prod config
haroldrubio Feb 22, 2024
ad339a3
Add retry safety
haroldrubio Feb 22, 2024
982792c
Validate dataset creation
haroldrubio Feb 22, 2024
3cb5fd5
Support count in validation
haroldrubio Feb 22, 2024
61b8078
Get entityA properly
haroldrubio Feb 22, 2024
b17f17d
Move statements
haroldrubio Feb 22, 2024
c2a7cfd
Fix Path bug
haroldrubio Feb 22, 2024
c24b638
Use sub IDs for validation
haroldrubio Feb 22, 2024
16653c8
Fix convert field to path
haroldrubio Feb 22, 2024
559fe45
Add failure explanation
haroldrubio Feb 22, 2024
4fe6965
Create execute_pipeline.py
haroldrubio Feb 23, 2024
1026ea7
Absolute import
haroldrubio Feb 23, 2024
044b3e0
Fix script
haroldrubio Feb 23, 2024
74154c8
Upload results to bucket
haroldrubio Feb 23, 2024
8b9f03d
Fix prefix
haroldrubio Feb 23, 2024
79c3e08
Merge branch 'master' into feature/containerize
haroldrubio Nov 5, 2024
32ed960
Avoid installing SPECTER deps
haroldrubio Nov 5, 2024
0304738
Remove cd into specter
haroldrubio Nov 5, 2024
0088d37
Draft push action
haroldrubio Nov 13, 2024
571820c
Remove VertexParser
haroldrubio Dec 3, 2024
a7b3e03
Remove /predict
haroldrubio Dec 3, 2024
9fc3b9c
Remove import
haroldrubio Dec 3, 2024
1d74cfe
Remove /predict func
haroldrubio Dec 3, 2024
e27972f
Remove import
haroldrubio Dec 3, 2024
72cd9e9
Use auth@v2
haroldrubio Dec 3, 2024
09a5fb0
Bump Miniconda
haroldrubio Dec 3, 2024
2b1c1b6
Dump metadata
haroldrubio Dec 3, 2024
b809b33
Push new action
haroldrubio Dec 4, 2024
4c7862b
Trigger on push to branch
haroldrubio Dec 4, 2024
d2960bb
Use absolute path
haroldrubio Dec 4, 2024
c0ccb55
Use proper path
haroldrubio Dec 4, 2024
87c20b0
Fix argparse
haroldrubio Dec 4, 2024
1039d38
Only try script
haroldrubio Dec 4, 2024
1e063cc
Clean up credentials again
haroldrubio Dec 4, 2024
c499d80
Add some logging
haroldrubio Dec 5, 2024
cee215b
Fix execute and skip build
haroldrubio Dec 6, 2024
298fed7
Add GCPInterface
haroldrubio Dec 6, 2024
1dc5ba2
Add interface tests
haroldrubio Dec 7, 2024
e420a58
Add pipeline tests
haroldrubio Dec 9, 2024
e6d11b1
Reduce image size
haroldrubio Dec 9, 2024
b1bb6cf
Merge branch 'master' into feature/vertex-pipeline
haroldrubio Dec 9, 2024
e4886a7
Use logger
haroldrubio Dec 9, 2024
922d3db
fix tests
haroldrubio Dec 10, 2024
a13a126
Use workflow dispatch
haroldrubio Dec 10, 2024
5073f3b
Parameterize pipeline name
haroldrubio Dec 11, 2024
7ef18ce
Try fix gensim tokenizers
haroldrubio Dec 11, 2024
f6555b4
Revert "Try fix gensim tokenizers"
haroldrubio Dec 11, 2024
9a5dccc
Merge branch 'master' into feature/vertex-pipeline
haroldrubio Dec 15, 2024
8d8a964
Update image to 3.11
haroldrubio Dec 15, 2024
1525bd5
Install openreview-py afterwards
haroldrubio Dec 17, 2024
d00c3da
Merge branch 'master' into feature/vertex-pipeline
haroldrubio Dec 18, 2024
1260794
Start implementing cloud service
haroldrubio Dec 19, 2024
ba5755c
Fix deps
haroldrubio Dec 19, 2024
dde625c
Add logging
haroldrubio Dec 19, 2024
104be3a
Finish service tests
haroldrubio Dec 20, 2024
fc4825a
Clear Redis at beginning of test
haroldrubio Dec 20, 2024
bfed76c
Use filesystem for GCP mock
haroldrubio Dec 20, 2024
a116b24
Wait until complete
haroldrubio Dec 20, 2024
e7f8dda
Log Cloud ID
haroldrubio Dec 20, 2024
012691c
Separate contexts
haroldrubio Dec 21, 2024
0a45107
Log service class
haroldrubio Dec 21, 2024
3cdc3a8
Patch run_once
haroldrubio Dec 21, 2024
b9c268d
Isolate cloud queue
haroldrubio Dec 21, 2024
ce3fb31
Fix variables
haroldrubio Dec 21, 2024
d74d9d4
Merge branch 'master' into feature/vertex-pipeline
haroldrubio Jan 2, 2025
c25932a
Change publication count
haroldrubio Jan 2, 2025
0586070
Log arg list
haroldrubio Jan 2, 2025
5893ee8
update count
haroldrubio Jan 2, 2025
5fbd095
Reduce code reusage
haroldrubio Jan 3, 2025
770d11c
Clean up comments
haroldrubio Jan 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/push-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This workflow builds and pushes the expertise image to the Artifact Registry

name: push-workflow-image

# Controls when the workflow will run
on:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
env:
REGION: us
KFP_REGION: us-central1
KFP_REPO: openreview-kfp
REPO: openreview-docker-images
PROJECT: sunlit-realm-131518
IMAGE: expertise-test
TAG: latest

jobs:
push-workflow-image:
# Allow the job to fetch a GitHub ID token
permissions:
id-token: write
contents: read
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Add SSH key
run: |
mkdir -p /home/runner/.ssh
echo "${{ secrets.GCLOUD_SSH_KEY }}" > /home/runner/.ssh/google_compute_engine
echo "${{ secrets.GCLOUD_SSH_KEY_PUB }}" > /home/runner/.ssh/google_compute_engine.pub
chmod 600 /home/runner/.ssh/google_compute_engine
chmod 600 /home/runner/.ssh/google_compute_engine.pub
- name: Authenticate with Google Cloud
id: auth
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}
create_credentials_file: true
cleanup_credentials: true
export_environment_variables: true
- name: Set Image Tag
run: echo "IMAGE_TAG=$REGION-docker.pkg.dev/$PROJECT/$REPO/$IMAGE" >> $GITHUB_ENV
- name: Setup gcloud
uses: google-github-actions/setup-gcloud@v1
- name: Setup Docker authentication
run: gcloud auth configure-docker ${{ env.REGION }}-docker.pkg.dev --quiet
- name: Setup Python 3.9
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
pip install kfp
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.IMAGE_TAG }}
#- name: Run pipeline script
# run: |
# python expertise/build_pipeline.py \
# --region "${{ env.REGION }}" \
# --kfp_region "${{ env.KFP_REGION }}" \
# --project "${{ env.PROJECT }}" \
# --repo "${{ env.REPO }}" \
# --kfp_repo "${{ env.KFP_REPO }}" \
# --image "${{ env.IMAGE }}" \
# --tag "${{ env.TAG }}"
55 changes: 55 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04

WORKDIR /app

ENV PYTHON_VERSION=3.11 \
HOME="/app" \
PATH="/app/miniconda/bin:${PATH}" \
FLASK_ENV=production \
AIP_STORAGE_URI="gs://openreview-expertise/expertise-utils/" \
SPECTER_DIR="/app/expertise-utils/specter/" \
MFR_VOCAB_DIR="/app/expertise-utils/multifacet_recommender/feature_vocab_file" \
MFR_CHECKPOINT_DIR="/app/expertise-utils/mfr_model_checkpoint/"

COPY . /app/openreview-expertise

RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
curl \
ca-certificates \
git \
build-essential \
&& rm -rf /var/lib/apt/lists/* \
\
&& cd $HOME \
&& wget "https://repo.anaconda.com/miniconda/Miniconda3-py311_24.9.2-0-Linux-x86_64.sh" -O miniconda.sh \
&& echo "62ef806265659c47e37e22e8f9adce29e75c4ea0497e619c280f54c823887c4f miniconda.sh" | sha256sum -c - \
&& bash miniconda.sh -b -p $HOME/miniconda \
&& rm miniconda.sh \
\
&& conda update -y conda \
&& conda create -y -n expertise python=$PYTHON_VERSION -c conda-forge \
\
&& . $HOME/miniconda/etc/profile.d/conda.sh \
&& conda activate expertise \
&& conda install pytorch pytorch-cuda=12.4 -c pytorch -c nvidia \
&& conda install -y filelock intel-openmp faiss-cpu -c pytorch \
&& python -m pip install --no-cache-dir -e $HOME/openreview-expertise \
&& python -m pip install --no-cache-dir -I protobuf==3.20.1 \
&& python -m pip install --no-cache-dir numpy==1.26.4 --force-reinstall \
&& python -m pip install openreview-py --force-reinstall \
&& conda clean --all -y \
&& apt-get purge -y build-essential wget curl git \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*

# Add conda environment bin to PATH so that 'python' uses the environment by default
ENV PATH="/app/miniconda/envs/expertise/bin:${PATH}"

RUN mkdir ${HOME}/expertise-utils \
&& cp ${HOME}/openreview-expertise/expertise/service/config/default_container.cfg \
${HOME}/openreview-expertise/expertise/service/config/production.cfg

EXPOSE 8080

ENTRYPOINT ["python", "-m", "expertise.service", "--host", "0.0.0.0", "--port", "8080", "--container"]
105 changes: 105 additions & 0 deletions expertise/build_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# pip install kfp
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline
from kfp.registry import RegistryClient
import argparse

if __name__ == '__main__':
parser = argparse.ArgumentParser(description="Builds and Uploads a Kubeflow Pipeline for the Expertise Model")
parser.add_argument(
"--region",
type=str,
required=True,
help="Region for Docker Images in Artifact Registry"
)
parser.add_argument(
"--kfp_region",
type=str,
required=True,
help="Region Kubeflow Pipelines in Artifact Registry"
)
parser.add_argument(
"--project",
type=str,
required=True,
help="GCP Project ID"
)
parser.add_argument(
"--repo",
type=str,
required=True,
help="Name of the Artifact Registry Docker Repository"
)
parser.add_argument(
"--kfp_repo",
type=str,
required=True,
help="Name of the Artifact Registry Kubeflow Repository"
)
parser.add_argument(
"--kfp_name",
type=str,
required=True,
help="Name of the Kubeflow Pipeline"
)
parser.add_argument(
"--image",
type=str,
required=True,
help="Name of the Docker Image"
)
parser.add_argument(
"--tag",
type=str,
required=False,
default='latest',
help="Tag of the Docker Image"
)
args = parser.parse_args()

@dsl.container_component
def execute_expertise_pipeline_op(job_config: str):
return dsl.ContainerSpec(
image=f'{args.region}-docker.pkg.dev/{args.project}/{args.repo}/{args.image}:{args.tag}',
command=['python', '-m', 'expertise.execute_pipeline'],
args=[job_config]
)

@pipeline(
name=args.kfp_name,
description='Processes request for user-paper expertise scores'
)
def expertise_pipeline(job_config: str):
import os
# Setting environment variables within the function
os.environ["AIP_STORAGE_URI"] = "gs://openreview-expertise/expertise-utils/"
os.environ["SPECTER_DIR"] = "/app/expertise-utils/specter/"
os.environ["MFR_VOCAB_DIR"] = "/app/expertise-utils/multifacet_recommender/feature_vocab_file"
os.environ["MFR_CHECKPOINT_DIR"] = "/app/expertise-utils/multifacet_recommender/mfr_model_checkpoint/"
op = (execute_expertise_pipeline_op(job_config=job_config)
.set_cpu_limit('4')
.set_memory_limit('32G')
.add_node_selector_constraint('NVIDIA_TESLA_T4')
.set_accelerator_limit('1')
)


compiler.Compiler().compile(
pipeline_func=expertise_pipeline,
package_path='expertise_pipeline.yaml'
)

client = RegistryClient(host=f"https://{args.kfp_region}-kfp.pkg.dev/{args.project}/{args.kfp_repo}")
client.delete_tag(
args.kfp_name,
'latest'
)

tags = [args.tag]
if 'latest' not in tags:
tags.append('latest')
templateName, versionName = client.upload_pipeline(
tags=tags,
file_name="expertise_pipeline.yaml"
)
Loading