Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: testing idea to wrap coscheduling #69

Merged
merged 28 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4d826e4
docs: add section to README for developer
vsoch Jan 8, 2024
1026549
build: add commands to make for clone and update
vsoch Jan 8, 2024
242d169
fix: restore pod group
vsoch Jan 9, 2024
7b9c470
fix: use podgroup millisecond precision timestamp
vsoch Jan 15, 2024
d6949a0
logs: more for various steps to see what is going on
vsoch Jan 16, 2024
f8ca47e
add examples with lammps to reproduce error
vsoch Jan 16, 2024
275cd04
clean up logging and unused files
vsoch Jan 18, 2024
f243852
support for skeleton grpc server and service/ingress for external client
vsoch Jan 19, 2024
673e34d
feat: add controller base image to build from here
vsoch Feb 17, 2024
41b2ad2
docker: simplify fluence build to use fluxion-go
vsoch Feb 17, 2024
1c0e5a3
ci: add support to build and deploy fluence-controller
vsoch Feb 17, 2024
8add1e0
feat: add start of webhook
vsoch Feb 17, 2024
10d624d
webhook: adding support for adding pod group labels
vsoch Feb 18, 2024
000baac
pod-group: labels for name and size now lead to creation
vsoch Feb 18, 2024
7874d57
fluence: refactor to use new PodGroup
vsoch Feb 18, 2024
8e0b461
feat: podgroup deletion when finished/failed
vsoch Feb 19, 2024
68815a5
feat: add support for other abstractions
vsoch Feb 19, 2024
0e47259
bug: the metav1.MicroTime was not being set
vsoch Feb 19, 2024
956123a
docs: update to design description
vsoch Feb 19, 2024
1037935
testing: gke then eks
vsoch Feb 19, 2024
f52e209
refactor: testing idea to wrap coscheduling
vsoch Mar 7, 2024
726149c
update: adding back in fluence logic
vsoch Mar 14, 2024
5a86a23
feat: add small logger just for fluence
vsoch Mar 15, 2024
8c99f10
test: only allow scheduling first pod
vsoch Apr 5, 2024
d8e67fa
test: adding permit to allow for sibling pod scheduling
vsoch Apr 17, 2024
ef0ed50
go: update to 1.21
vsoch Apr 20, 2024
3bd9cb5
naming: expand short named variables
vsoch May 3, 2024
cbeffce
fix: response to review comments
vsoch May 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/test-kind-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
2 changes: 2 additions & 0 deletions .github/test.sh
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ cd upstream/manifests/install/charts
helm install \
--set scheduler.image=ghcr.io/flux-framework/fluence:latest \
--set scheduler.sidecarimage=ghcr.io/flux-framework/fluence-sidecar:latest \
--set controller.image=ghcr.io/flux-framework/fluence-controller:latest \
--set controller.pullPolicy=Never \
--set scheduler.pullPolicy=Never \
--set scheduler.sidecarPullPolicy=Never \
schedscheduler-plugins as-a-second-scheduler/
Expand Down
43 changes: 40 additions & 3 deletions .github/workflows/build-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
name: build fluence
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v3
- uses: actions/setup-go@v4
with:
go-version: ^1.19

Expand All @@ -45,7 +45,44 @@ jobs:
- name: Deploy Container
if: (github.event_name != 'pull_request')
run: docker push ${{ env.container }} --all-tags


build-controller:
permissions:
packages: write
env:
container: ghcr.io/flux-framework/fluence-controller
runs-on: ubuntu-latest
name: build fluence-controller
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v4
with:
go-version: ^1.19

- name: Build Containers
run: |
make prepare
make build REGISTRY=ghcr.io/flux-framework CONTROLLER_IMAGE=fluence-controller

- name: Tag Release Image
if: (github.event_name == 'release')
run: |
tag=${GITHUB_REF#refs/tags/}
echo "Tagging and releasing ${{ env.container}}:${tag}"
docker tag ${{ env.container }}:latest ${{ env.container }}:${tag}

- name: GHCR Login
if: (github.event_name != 'pull_request')
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Deploy Container
if: (github.event_name != 'pull_request')
run: docker push ${{ env.container }} --all-tags

build-sidecar:
permissions:
packages: write
Expand All @@ -55,7 +92,7 @@ jobs:
name: build sidecar
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v3
- uses: actions/setup-go@v4
with:
go-version: ^1.19

Expand Down
60 changes: 50 additions & 10 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,50 @@ on:

jobs:
build-fluence:

# The scheduler and controller are built together with the hack script
# in the upstream scheduler-plugins
env:
container: ghcr.io/flux-framework/fluence
controller: ghcr.io/flux-framework/fluence-controller
runs-on: ubuntu-latest
name: build fluence
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v3
- uses: actions/setup-go@v4
with:
go-version: ^1.19

- name: Build Containers
run: |
make prepare
make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence
make build REGISTRY=ghcr.io/flux-framework SCHEDULER_IMAGE=fluence CONTROLLER_IMAGE=fluence-controller

- name: Save Container
run: docker save ${{ env.container }} | gzip > fluence_latest.tar.gz
- name: Save Containers
run: |
docker save ${{ env.container }} | gzip > fluence_latest.tar.gz
docker save ${{ env.controller }} | gzip > fluence_controller_latest.tar.gz

- name: Upload container artifact
uses: actions/upload-artifact@v4
with:
name: fluence
path: fluence_latest.tar.gz


- name: Upload container artifact
uses: actions/upload-artifact@v4
with:
name: fluence_controller
path: fluence_controller_latest.tar.gz

build-sidecar:
env:
container: ghcr.io/flux-framework/fluence-sidecar
runs-on: ubuntu-latest
name: build sidecar
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v3
- uses: actions/setup-go@v4
with:
go-version: ^1.19

Expand All @@ -59,22 +71,23 @@ jobs:
with:
name: fluence_sidecar
path: fluence_sidecar_latest.tar.gz

test-fluence:
needs: [build-fluence, build-sidecar]
permissions:
packages: write
env:
fluence_container: ghcr.io/flux-framework/fluence
sidecar_container: ghcr.io/flux-framework/fluence-sidecar
controller_container: ghcr.io/flux-framework/fluence-controller

runs-on: ubuntu-latest
name: build fluence
name: test fluence
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v3
- uses: actions/setup-go@v4
with:
go-version: ^1.20
go-version: ^1.19

- name: Download fluence artifact
uses: actions/download-artifact@v4
Expand All @@ -88,11 +101,27 @@ jobs:
name: fluence_sidecar
path: /tmp

- name: Download fluence_controller artifact
uses: actions/download-artifact@v4
with:
name: fluence_controller
path: /tmp

- name: Make Space For Build
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc

- name: Load Docker images
run: |
ls /tmp/*.tar.gz
docker load --input /tmp/fluence_sidecar_latest.tar.gz
rm /tmp/fluence_sidecar_latest.tar.gz
docker load --input /tmp/fluence_latest.tar.gz
rm /tmp/fluence_latest.tar.gz
docker load --input /tmp/fluence_controller_latest.tar.gz
rm /tmp/fluence_controller_latest.tar.gz
docker image ls -a | grep fluence

- name: Create Kind Cluster
Expand All @@ -101,15 +130,23 @@ jobs:
cluster_name: kind
kubectl_version: v1.28.2
version: v0.20.0
config: ./.github/test-kind-config.yaml

- name: Load Docker Containers into Kind
env:
fluence: ${{ env.fluence_container }}
sidecar: ${{ env.sidecar_container }}
controller: ${{ env.controller_container }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
kind load docker-image ${fluence}
kind load docker-image ${sidecar}
kind load docker-image ${controller}

- name: Install Cert Manager
run: |
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.1/cert-manager.yaml
sleep 10

- name: Test Fluence
run: /bin/bash ./.github/test.sh
Expand All @@ -122,6 +159,8 @@ jobs:
docker tag ${{ env.fluence_container }}:latest ${{ env.fluence_container }}:${tag}
echo "Tagging and releasing ${{ env.sidecar_container}}:${tag}"
docker tag ${{ env.sidecar_container }}:latest ${{ env.sidecar_container }}:${tag}
echo "Tagging and releasing ${{ env.controller_container}}:${tag}"
docker tag ${{ env.controller_container }}:latest ${{ env.controller_container }}:${tag}

# If we get here, tests pass, and we can deploy
- name: GHCR Login
Expand All @@ -137,3 +176,4 @@ jobs:
run: |
docker push ${{ env.fluence_container }} --all-tags
docker push ${{ env.sidecar_container }} --all-tags
docker push ${{ env.controller_container }} --all-tags
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
plugins
upstream
scheduler-plugins
sig-scheduler-plugins/pkg/fluence/bin/
src/bin
src/fluence/vendor
29 changes: 23 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,41 @@ SIDECAR_IMAGE ?= fluence-sidecar:latest
CONTROLLER_IMAGE ?= fluence-controller
SCHEDULER_IMAGE ?= fluence

.PHONY: all build build-sidecar prepare push push-sidecar push-controller
.PHONY: all build build-sidecar clone update push push-sidecar push-controller

all: build-sidecar prepare build
all: prepare build-sidecar build

build-sidecar:
make -C ./src LOCAL_REGISTRY=${REGISTRY} LOCAL_IMAGE=${SIDECAR_IMAGE}

prepare:
clone:
if [ -d "$(CLONE_UPSTREAM)" ]; then echo "Upstream is cloned"; else git clone $(UPSTREAM) ./$(CLONE_UPSTREAM); fi

update: clone
git -C $(CLONE_UPSTREAM) pull origin master

prepare: clone
# These are entirely new directory structures
rm -rf $(CLONE_UPSTREAM)/pkg/fluence
rm -rf $(CLONE_UPSTREAM)/pkg/logger
# rm -rf $(CLONE_UPSTREAM)/cmd/app
rm -rf $(CLONE_UPSTREAM)/pkg/controllers/podgroup_controller.go
rm -rf $(CLONE_UPSTREAM)/cmd/controller/app/server.go
cp -R sig-scheduler-plugins/pkg/logger $(CLONE_UPSTREAM)/pkg/logger
cp -R sig-scheduler-plugins/pkg/fluence $(CLONE_UPSTREAM)/pkg/fluence
cp -R sig-scheduler-plugins/manifests/fluence $(CLONE_UPSTREAM)/manifests/fluence
cp -R sig-scheduler-plugins/pkg/controllers/* $(CLONE_UPSTREAM)/pkg/controllers/
# This is the one exception not from sig-scheduler-plugins because it is needed in both spots
cp -R src/fluence/fluxcli-grpc $(CLONE_UPSTREAM)/pkg/fluence/fluxcli-grpc
# cp -R sig-scheduler-plugins/cmd/app ./upstream/cmd/app
# These are files with subtle changes to add fluence
cp sig-scheduler-plugins/cmd/scheduler/main.go ./upstream/cmd/scheduler/main.go
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/templates/deployment.yaml $(CLONE_UPSTREAM)/manifests/install/charts/as-a-second-scheduler/templates/deployment.yaml
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/templates/*.yaml $(CLONE_UPSTREAM)/manifests/install/charts/as-a-second-scheduler/templates/
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/crds/*.yaml $(CLONE_UPSTREAM)/manifests/install/charts/as-a-second-scheduler/crds/
cp sig-scheduler-plugins/manifests/install/charts/as-a-second-scheduler/values.yaml $(CLONE_UPSTREAM)/manifests/install/charts/as-a-second-scheduler/values.yaml
cp sig-scheduler-plugins/apis/scheduling/v1alpha1/*.go $(CLONE_UPSTREAM)/apis/scheduling/v1alpha1/
cp sig-scheduler-plugins/cmd/controller/app/server.go $(CLONE_UPSTREAM)/cmd/controller/app/server.go

build:
build: prepare
REGISTRY=${REGISTRY} IMAGE=${SCHEDULER_IMAGE} CONTROLLER_IMAGE=${CONTROLLER_IMAGE} $(BASH) $(CLONE_UPSTREAM)/hack/build-images.sh

push-sidecar:
Expand Down
Loading
Loading