Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gpu support extended #87

Open
wants to merge 76 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
5b9a16a
adding GPU support, adding test cases, adding NVIDIA runtime support
ksatzke Jul 10, 2020
5628d51
removing local configuration
ksatzke Jul 10, 2020
2d8a7ff
adding description on how to prepare and add a GPU node, and correspo…
ksatzke Jul 23, 2020
9fab0e3
fixing typos
ksatzke Jul 23, 2020
01c5c3d
corrections to description on adding GPU nodes
ksatzke Jul 29, 2020
977ee01
add GPU sandbox type to Makefiles and helm charts
ksatzke Aug 11, 2020
87fd792
add GPU sandbox type to Makefiles and helm charts
ksatzke Aug 11, 2020
32dd904
adding logig to spin up a GPU sandbox on demand
ksatzke Aug 12, 2020
cdd0faf
fixing typos in README
ksatzke Aug 20, 2020
9fba583
configure separate GPU support for management and common workflow kscv's
Aug 21, 2020
0124123
added configuration of sandox_gpu container image for wf pods to run …
Sep 29, 2020
cf047f0
improved configuration for workflows calling for GPUs
ksatzke Sep 29, 2020
89e8d24
first cut on extending Workflow class with GPU properties
ksatzke Oct 2, 2020
dfc7cd7
fixing bug on addWorkflow
ksatzke Oct 2, 2020
4508ea6
adding support for dynamic config of helm deployments on GPU to Manag…
ksatzke Oct 5, 2020
e251bf2
removing bug on java function executions
ksatzke Oct 6, 2020
5508abc
fixing a bug on asl_Map state tests with helm deployment
ksatzke Oct 6, 2020
a9ad920
adding first cut on gpu node selection logic for ansible multi-host d…
ksatzke Oct 7, 2020
ea3b5f6
merge develop branch
ksatzke Oct 8, 2020
ef7ed75
fixing bugs on SDK and GPU test configurations
ksatzke Oct 8, 2020
f7571f2
adding logic to configure gpu hosts, fixing bug on deployWorkflow on …
ksatzke Oct 9, 2020
a250fb3
cleanup tests and values.yaml
ksatzke Oct 12, 2020
f86c970
adjustments to available_hosts script and cleanup
ksatzke Oct 12, 2020
e334db1
final adjustments to values.yaml
ksatzke Oct 12, 2020
297a8e0
addressing comments from PR review, first part
ksatzke Oct 13, 2020
677cacd
adding ansible inventory group for GPU host configuration
ksatzke Oct 20, 2020
6941e7e
Merge branch 'feature/GPU_support_extended' of https://github.com/kni…
ksatzke Oct 20, 2020
6c3efd5
fixing errors in GPU deployment description
Nov 5, 2020
eaae6be
fixing errors in GPU deployment description
Nov 5, 2020
f707010
fixing errors in GPU deployment description
Nov 5, 2020
46577ad
1st cut on API modifications to allow configuration of mfn GPU requir…
Nov 11, 2020
dd3a1c3
adding GUI support for indicating assigned GPU cores in function table
abeckn Nov 11, 2020
4d734fa
adding GUI support for indicating assigned GPU cores in function tabl…
abeckn Nov 11, 2020
95d81ca
merging develop branch, first cut on integrating GUI support, partly …
Nov 16, 2020
232b3c0
fixing bug in deployWorkflow choosing the wrong sandbox image
Nov 16, 2020
d494c86
fixing bugs on GPU configuration in Management functions
Nov 16, 2020
a6f84a8
debug GPU parameter modifucation via GUI
Nov 17, 2020
9055a30
cleanup, tests are passing on gpu machine
ksatzke Nov 18, 2020
ecc1b8f
first cut on logic to deduce sandbox GPU requirements from function d…
ksatzke Nov 18, 2020
3fbcfb1
cleaning asl_Tensorflow_HelloWorld test
ksatzke Nov 18, 2020
99e04b0
adapted values.yaml to lab testbed vagrant conf
ksatzke Nov 30, 2020
1012b31
Merge branch 'develop' into feature/GPU_support_extended
ksatzke Nov 30, 2020
5db4cfb
updated values for kubespray setup
ksatzke Dec 1, 2020
50dc35f
remove hardcoding of imageRepo, adjusting values.yaml for GPU testbed
ksatzke Dec 1, 2020
ff25274
remove blocking of concurrent gpu pods in k8s deployment caused by 'l…
ksatzke Dec 10, 2020
d8c01e9
resolving conflicts with develop branch
ksatzke Dec 10, 2020
5163ede
adding support for configurable GPU core+memory sharing based on gpu-…
ksatzke Jan 12, 2021
47bd2cd
Merge branch 'develop' into feature/GPU_support_extended
ksatzke Jan 12, 2021
defacaf
fixing bug on GPU parameter calculations
ksatzke Jan 15, 2021
730e0f7
WIP: adding logic for node GPU capacity queries to ManagementService
ksatzke Jan 26, 2021
cd779af
use vgpu parameters in kservice setup
ksatzke Jan 27, 2021
09f58e8
adding capability to handle secret token for k8s core API
ksatzke Feb 2, 2021
7311aec
adding GUI and ManagementService changes for GPU parameters
ksatzke Feb 2, 2021
8520d5f
fixing bugs on GPU memory parameter calculations
ksatzke Feb 3, 2021
338fcd3
fixing more bugs on GPU memory parameter calculations
ksatzke Feb 8, 2021
5697446
fixing bugs in deployment script, adjusting values
ksatzke Feb 16, 2021
051b327
fixing bugs in available_hosts scripts
ksatzke Feb 26, 2021
70ce3ec
resolving bugs in host selection logic for deployment
ksatzke Mar 2, 2021
e0e4a86
fixing a bug in workflow GPU resource calculation
ksatzke Mar 25, 2021
d92a604
extending mfn SDK to handle GPU parameters
ksatzke Apr 1, 2021
1de8f8b
Merge branch 'feature/GPU_support_extended' of https://github.com/kni…
ksatzke Apr 1, 2021
48f7546
fixing bugs on ASL tests using GPUs
ksatzke Apr 13, 2021
463cbab
merge develop; update Dockerfile_gpu
iakkus Apr 19, 2021
7a1b157
fix to helm template management.yaml
iakkus Apr 26, 2021
191e0da
Revert "fix to helm template management.yaml"
iakkus Apr 26, 2021
4528bc6
fix to helm template management.yaml after merging with develop
iakkus Apr 26, 2021
ea5e1bd
Merge branch 'develop' into feature/GPU_support_extended
iakkus May 5, 2021
6913fbb
make Dockerfile installation instructions follow the same order
iakkus May 7, 2021
3647851
ansible: fix available hosts script
May 7, 2021
3187c49
management: fix deployWorkflow for bare metal with gpu hosts
iakkus May 7, 2021
700e298
update ansible readme; fixes #117
iakkus May 14, 2021
c46055b
Revert "update ansible readme; fixes #117"
iakkus May 14, 2021
f30f610
function worker: fix addressable function stopping when blocking for …
May 17, 2021
529be65
adding Dockerfile for opencv package
ksatzke Jun 9, 2021
05e6b25
fixing Map state, all tests are running
ksatzke Jun 11, 2021
a198f2b
merging recent develop into GPU feature branch
ksatzke Jan 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ManagementService/management_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,7 @@ def printUsage():
sys.path.append(workflowdir)
if os.getenv("KUBERNETES_PORT", None) != None:
import deployWorkflow
url, endpoint_key = deployWorkflow.create_k8s_deployment(email, workflow_info, "Python", management=True)
url, endpoint_key = deployWorkflow.create_k8s_deployment(email, workflow_info, "Python", 0, management=True)
DLCLIENT_MANAGEMENT.putMapEntry("Management_workflow_endpoint_map", endpoint_key, url)
# Kubernetes mode only has one url
endpoint_list = [url]
Expand Down
6 changes: 6 additions & 0 deletions ManagementService/python/addWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ def handle(value, sapi):
success = False

email = data["email"]


if "workflow" in data:
workflow = data["workflow"]
Expand All @@ -38,9 +39,14 @@ def handle(value, sapi):
wf["status"] = "undeployed"
wf["modified"] = time.time()
wf["endpoints"] = []
#wf["gpu_usage"] = None
if "gpu_usage" in workflow:
wf["gpu_usage"] = str(workflow["gpu_usage"])

wf["id"] = hashlib.md5(str(uuid.uuid4()).encode()).hexdigest()

#wf["on_gpu"] = True # add metadata on GPU requirements for this workflow. ToDo: make this configurable via GUI

sapi.put(email + "_workflow_" + wf["id"], json.dumps(wf), True, True)
#sapi.put(email + "_workflow_json_" + wf["id"], "", True, True)
#sapi.put(email + "_workflow_requirements_" + wf["id"], "", True, True)
Expand Down
101 changes: 94 additions & 7 deletions ManagementService/python/deployWorkflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,23 @@
WF_TYPE_SAND = 0
WF_TYPE_ASL = 1

def get_kv_pairs(testdict, keys, dicts=None):
# find and return kv pairs with particular keys in testdict
if not dicts:
dicts = [testdict]
testdict = [testdict]
data = testdict.pop(0)
if isinstance(data, dict):
data = data.values()
for d in data:
if isinstance(d, dict) or isinstance(d, list): # check d for type
testdict.append(d)
if isinstance(d, dict):
dicts.append(d)
if testdict: # no more data to search
return get_kv_pairs(testdict, keys, dicts)
return [(k, v) for d in dicts for k, v in d.items() if k in keys]

def is_asl_workflow(wfobj):
return 'StartAt' in wfobj and 'States' in wfobj and isinstance(wfobj['States'], dict)

Expand Down Expand Up @@ -202,7 +219,8 @@ def start_docker_sandbox(host_to_deploy, uid, sid, wid, wname, sandbox_image_nam
try:
print("Starting sandbox docker container for: " + uid + " " + sid + " " + wid + " " + sandbox_image_name)
print("Docker daemon: " + "tcp://" + host_to_deploy[1] + ":2375" + ", environment variables: " + str(env_vars))
client.containers.run(sandbox_image_name, init=True, detach=True, ports={"8080/tcp": None}, ulimits=ulimit_list, auto_remove=True, name=sid, environment=env_vars, extra_hosts={host_to_deploy[0]:host_to_deploy[1]}, log_config=lc)
client.containers.run(sandbox_image_name, init=True, detach=True, ports={"8080/tcp": None}, ulimits=ulimit_list, auto_remove=True, name=sid, environment=env_vars, extra_hosts={host_to_deploy[0]:host_to_deploy[1]}, log_config=lc, runtime="nvidia")
#client.containers.run(sandbox_image_name, init=True, detach=True, ports={"8080/tcp": None}, ulimits=ulimit_list, auto_remove=True, name=sid, environment=env_vars, extra_hosts={host_to_deploy[0]:host_to_deploy[1]}, log_config=lc)
# TEST/DEVELOPMENT: no auto_remove to access sandbox logs
#client.containers.run(sandbox_image_name, init=True, detach=True, ports={"8080/tcp": None}, ulimits=ulimit_list, name=sid, environment=env_vars, extra_hosts={host_to_deploy[0]:host_to_deploy[1]}, log_config=lc)
except Exception as exc:
Expand Down Expand Up @@ -241,7 +259,7 @@ def get_workflow_host_port(host_to_deploy, sid):

return success, host_port

def create_k8s_deployment(email, workflow_info, runtime, management=False):
def create_k8s_deployment(email, workflow_info, runtime, gpu_usage, management=False):
# KUBERNETES MODE
new_workflow_conf = {}
conf_file = '/opt/mfn/SandboxAgent/conf/new_workflow.conf'
Expand All @@ -258,7 +276,11 @@ def create_k8s_deployment(email, workflow_info, runtime, management=False):
raise Exception("Unable to load "+ksvc_file+". Ensure that the configmap has been setup properly", e)

# Kubernetes labels cannot contain @ or _ and should start and end with alphanumeric characters
wfNameSanitized = 'wf-' + workflow_info["workflowId"].replace('@', '-').replace('_', '-').lower() + '-wf'
wfNameSanitized = 'wf-' + workflow_info["workflowId"].replace('@', '-').replace('_', '-').replace('/','-').lower() + '-wf'
#wfActualNameSanitized = 'wf-' + workflow_info["workflowName"].replace('@', '-').replace('_', '-').replace('/','-').lower() + '-wf'
if len(wfNameSanitized) > 63:
print("Error creating kubernetes deployment for "+email+" "+workflow_info["workflowId"] + ", workflow name too long")

emailSanitized = 'u-' + email.replace('@', '-').replace('_', '-').lower() + '-u'
# Pod, Deployment and Hpa names for the new workflow will have a prefix containing the workflow name and user name
app_fullname_prefix = ''
Expand Down Expand Up @@ -291,11 +313,40 @@ def create_k8s_deployment(email, workflow_info, runtime, management=False):
env.append({'name': 'WORKFLOWID', 'value': workflow_info["workflowId"]})
env.append({'name': 'WORKFLOWNAME', 'value': workflow_info["workflowName"]})

# Special handling for the management container
# apply gpu_usage fraction to k8s deployment configuration
print("GPU sage in create_k8s_service: "+ str(gpu_usage))
use_gpus = gpu_usage

if runtime=="Java": # non gpu python function
# overwrite values from values.yaml for new workflows
#kservice['spec']['template']['spec']['containers'][0]['resources']['limits']['nvidia.com/gpu'] = str(use_gpus)
#kservice['spec']['template']['spec']['containers'][0]['resources']['requests']['nvidia.com/gpu'] = str(use_gpus)
kservice['spec']['template']['spec']['containers'][0]['image'] = "localhost:5000/microfn/sandbox_java"

if not management and use_gpus == 0. and runtime=="Python": # non gpu python function
# overwrite values from values.yaml for new workflows
kservice['spec']['template']['spec']['containers'][0]['resources']['limits'].pop('nvidia.com/gpu', None) # ['nvidia.com/gpu'] = str(use_gpus)
kservice['spec']['template']['spec']['containers'][0]['resources']['requests'].pop('nvidia.com/gpu', None) # ['nvidia.com/gpu'] = str(use_gpus)
kservice['spec']['template']['spec']['containers'][0]['image'] = "localhost:5000/microfn/sandbox"

if not management and use_gpus > 0. and runtime=="Python": # gpu using python function
# overwrite values from values.yaml for new workflows
kservice['spec']['template']['spec']['containers'][0]['resources']['limits']['nvidia.com/gpu'] = str(use_gpus)
kservice['spec']['template']['spec']['containers'][0]['resources']['requests']['nvidia.com/gpu'] = str(use_gpus)
kservice['spec']['template']['spec']['containers'][0]['image'] = "localhost:5000/microfn/sandbox_gpu"

# Special handling for the management container: never run on gpu
if management:
kservice['spec']['template']['spec']['volumes'] = [{ 'name': 'new-workflow-conf', 'configMap': {'name': new_workflow_conf['configmap']}}]
kservice['spec']['template']['spec']['containers'][0]['volumeMounts'] = [{'name': 'new-workflow-conf', 'mountPath': '/opt/mfn/SandboxAgent/conf'}]
kservice['spec']['template']['spec']['serviceAccountName'] = new_workflow_conf['mgmtserviceaccount']

# management container should not consume a CPU and use standard sandbox image
if (labels['workflowid'] == "Management"):
kservice['spec']['template']['spec']['containers'][0]['resources']['limits']['nvidia.com/gpu'] = "0"
kservice['spec']['template']['spec']['containers'][0]['resources']['requests']['nvidia.com/gpu'] = "0"
kservice['spec']['template']['spec']['containers'][0]['image'] = "localhost:5000/microfn/sandbox"

if 'HTTP_GATEWAYPORT' in new_workflow_conf:
env.append({'name': 'HTTP_GATEWAYPORT', 'value': new_workflow_conf['HTTP_GATEWAYPORT']})
if 'HTTPS_GATEWAYPORT' in new_workflow_conf:
Expand Down Expand Up @@ -325,6 +376,7 @@ def create_k8s_deployment(email, workflow_info, runtime, management=False):
print("ERROR deleting existing kservice")
print(resp.text)

# no change for Java function
print('Creating new kservice')
resp = requests.post(
"https://kubernetes.default:"+os.getenv("KUBERNETES_SERVICE_PORT_HTTPS")+"/apis/serving.knative.dev/v1/namespaces/"+namespace+"/services",
Expand Down Expand Up @@ -385,6 +437,8 @@ def handle(value, sapi):
raise Exception("malformed input")
sapi.log(json.dumps(workflow))
wfmeta = sapi.get(email + "_workflow_" + workflow["id"], True)
print("WFMETA in deployWorkflow: "+ str(wfmeta))

if wfmeta is None or wfmeta == "":
raise Exception("workflow metadata is not valid.")
try:
Expand Down Expand Up @@ -413,6 +467,8 @@ def handle(value, sapi):
if is_asl_workflow(wfobj):
wf_type = WF_TYPE_ASL

#use_gpus = int(wfmeta._gpu_usage)

success, errmsg, resource_names, uploaded_resources = check_workflow_functions(wf_type, wfobj, email, sapi)
if not success:
raise Exception("Couldn't deploy workflow; " + errmsg)
Expand Down Expand Up @@ -445,6 +501,14 @@ def handle(value, sapi):
#dlc.put("deployment_info_workflow_" + workflow["id"], json.dumps(deployment_info))
# _XXX_: important!
# put must not be queued as the function currently waits for the container to become ready

if "gpu_usage" in wfmeta and wfmeta["gpu_usage"] != "None":
gpu_usage = float(wfmeta["gpu_usage"])
else:
gpu_usage = 0.

print("deduced gpu_usage: " + str(gpu_usage))

sapi.put("deployment_info_workflow_" + workflow["id"], json.dumps(deployment_info), True, False)

status = "deploying"
Expand All @@ -454,7 +518,8 @@ def handle(value, sapi):
runtime = "Java"
else:
runtime = "Python"
url, endpoint_key = create_k8s_deployment(email, workflow_info, runtime)

url, endpoint_key = create_k8s_deployment(email, workflow_info, runtime, gpu_usage)
if url is not None and len(url) > 0:
status = "deploying"
sapi.addSetEntry(workflow_info["workflowId"] + "_workflow_endpoints", str(url), is_private=True)
Expand All @@ -467,7 +532,12 @@ def handle(value, sapi):
else:
# We're running BARE METAL mode
# _XXX_: due to the queue service still being in java in the sandbox
sandbox_image_name = "microfn/sandbox"

if gpu_usage == 0:
sandbox_image_name = "microfn/sandbox" # default value
elif gpu_usage != 0 and runtime == "Python":
sandbox_image_name = "microfn/sandbox_gpu" # sandbox uses GPU

if any(resource_info_map[res_name]["runtime"] == "Java" for res_name in resource_info_map):
sandbox_image_name = "microfn/sandbox_java"

Expand All @@ -477,8 +547,25 @@ def handle(value, sapi):
if hosts is not None and hosts != "":
hosts = json.loads(hosts)
deployed_hosts = {}
# instruct hosts to start the sandbox and deploy workflow
gpu_hosts = {}
picked_hosts = {}

for hostname in hosts:
#if hostname.endswith("_gpu"):
if "has_gpu" in hosts[hostname]:
hostip = hosts[hostname]
gpu_hosts[hostname] = hostip

# instruct hosts to start the sandbox and deploy workflow
if runtime=="Java" or sandbox_image_name == "microfn/sandbox": # can use any host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had the "microfn/sandbox_java_gpu" image?

picked_hosts = hosts
elif len(gpu_hosts) > 0:
picked_hosts = gpu_hosts
else:
picked_hosts = hosts # fallback as there are no gpu hosts available
print("available GPU hosts is empty. Deploying on general purpose host")

for hostname in picked_hosts: # loop over all hosts, need to pich gpu hosts for python/gpu workflows
hostip = hosts[hostname]
host_to_deploy = (hostname, hostip)
success, endpoint_key = start_docker_sandbox(host_to_deploy, email, workflow_info["sandboxId"], workflow_info["workflowId"], workflow_info["workflowName"], sandbox_image_name)
Expand Down
75 changes: 75 additions & 0 deletions Sandbox/Dockerfile_gpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright 2020 The KNIX Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#FROM ubuntu:18.04
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04

# Install (as root)
# Base
RUN apt-get update --fix-missing
RUN apt-get -y --no-install-recommends install build-essential
RUN apt-get -y --no-install-recommends install netbase unzip file libmagic1

# CUDA 10.1 dependencies and tools to build dlib
RUN apt-get -y --no-install-recommends install libsm6 libxrender1 libxrender-dev libxext6 libglib2.0-0 git cmake
RUN apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 libnvinfer-dev=6.0.1-1+cuda10.1 libnvinfer-plugin6=6.0.1-1+cuda10.1

# Python
RUN apt-get -y --no-install-recommends install python3 python3-dev
RUN apt-get -y --no-install-recommends install python3-pip
RUN apt-get -y --no-install-recommends install zlib1g libssl1.0 libsasl2-2 ca-certificates

RUN /usr/bin/python3 -m pip install --upgrade pip

RUN /usr/bin/python3 -m pip install setuptools
RUN /usr/bin/python3 -m pip install thrift>=0.12.0
RUN /usr/bin/python3 -m pip install anytree
RUN /usr/bin/python3 -m pip install ujsonpath
RUN /usr/bin/python3 -m pip install requests
RUN /usr/bin/python3 -m pip install retry
# remove warnings from anytree package
RUN /usr/bin/python3 -m pip install fastcache
# Needed for multi-language support (currently just Java)
RUN /usr/bin/python3 -m pip install thriftpy2

# Install dlib for CUDA
RUN git clone https://github.com/davisking/dlib.git
RUN mkdir -p /dlib/build

RUN cmake -H/dlib -B/dlib/build -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1
RUN cmake --build /dlib/build

RUN cd /dlib; python3 /dlib/setup.py install

# Install the face recognition package and tensorflow
RUN pip3 install face_recognition
RUN pip3 install tensorflow==2.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need to install all these custom libraries for the GPU usage.

If these are needed by the workflows, then they should specify it in the function requirements.


# Java (for queue service)
RUN apt-get -y --no-install-recommends install openjdk-8-jdk-headless

# Add components (as mfn)
RUN groupadd -o -g 1000 -r mfn && useradd -d /opt/mfn -u 1000 -m -r -g mfn mfn
RUN mkdir /opt/mfn/logs

COPY build/queueservice.jar /opt/mfn/
ADD frontend/frontend /opt/mfn/frontend
ADD build/SandboxAgent.tar.gz /opt/mfn/
ADD build/FunctionWorker.tar.gz /opt/mfn/
ADD build/LoggingService.tar.gz /opt/mfn/

RUN chown mfn:mfn -R /opt/mfn
USER mfn
WORKDIR /opt/mfn
CMD ["python3", "/opt/mfn/SandboxAgent/sandboxagent.py"]
66 changes: 66 additions & 0 deletions Sandbox/Dockerfile_java_gpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Copyright 2020 The KNIX Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#FROM ubuntu:18.04
FROM nvidia/cuda:10.1-cudnn-devel-ubuntu18.04

# Install (as root)
# Base
RUN apt-get update --fix-missing
RUN apt-get -y --no-install-recommends install build-essential
RUN apt-get -y --no-install-recommends install netbase unzip file libmagic1

# Python
RUN apt-get -y --no-install-recommends install python3 python3-dev
RUN apt-get -y --no-install-recommends install python3-pip
RUN apt-get -y --no-install-recommends install zlib1g libssl1.0 libsasl2-2 ca-certificates

RUN /usr/bin/python3 -m pip install --upgrade pip

RUN /usr/bin/python3 -m pip install setuptools
RUN /usr/bin/python3 -m pip install thrift>=0.12.0
RUN /usr/bin/python3 -m pip install anytree
RUN /usr/bin/python3 -m pip install ujsonpath
RUN /usr/bin/python3 -m pip install requests
RUN /usr/bin/python3 -m pip install retry
# remove warnings from anytree package
RUN /usr/bin/python3 -m pip install fastcache
# Needed for multi-language support (currently just Java)
RUN /usr/bin/python3 -m pip install thriftpy2

# Java
RUN apt-get -y --no-install-recommends install openjdk-8-jdk-headless

RUN apt-get -y --no-install-recommends install maven

# Add components (as mfn)
RUN groupadd -o -g 1000 -r mfn && useradd -d /opt/mfn -u 1000 -m -r -g mfn mfn
RUN mkdir /opt/mfn/logs

COPY build/queueservice.jar /opt/mfn/
ADD frontend/frontend /opt/mfn/frontend
ADD build/SandboxAgent.tar.gz /opt/mfn/
ADD build/FunctionWorker.tar.gz /opt/mfn/
ADD build/LoggingService.tar.gz /opt/mfn/

ADD build/JavaRequestHandler.tar.gz /opt/mfn/

RUN chmod +x /opt/mfn/JavaRequestHandler/setup_maven.sh
RUN /opt/mfn/JavaRequestHandler/./setup_maven.sh True
RUN mvn -Duser.home=/tmp -DskipTests -gs /opt/mfn/JavaRequestHandler/maven/sandbox-mvn-settings.xml -f /opt/mfn/JavaRequestHandler/maven/init-mvn.pom.xml dependency:resolve-plugins

RUN chown mfn:mfn -R /opt/mfn
USER mfn
WORKDIR /opt/mfn
CMD ["python3", "/opt/mfn/SandboxAgent/sandboxagent.py"]
12 changes: 12 additions & 0 deletions Sandbox/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ include ../build_env.mk

default: build_thrift \
image \
image_gpu \
image_java

clean:
Expand Down Expand Up @@ -95,6 +96,16 @@ image: \
build/SandboxAgent.tar.gz
$(call build_image,Dockerfile,microfn/sandbox)

image_gpu: \
Dockerfile_gpu \
build/queueservice.jar \
frontend/frontend \
build/LoggingService.tar.gz \
build/FunctionWorker.tar.gz \
build/SandboxAgent.tar.gz
$(call build_image,Dockerfile_gpu,microfn/sandbox_gpu)


image_java: \
Dockerfile_java \
build/queueservice.jar \
Expand All @@ -107,6 +118,7 @@ image_java: \

push: image image_java
$(call push_image,microfn/sandbox)
$(call push_image,microfn/sandbox_gpu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microfn/sandbox_java_gpu?

Need to also update the dependencies for the Makefile target.

$(call push_image,microfn/sandbox_java)


Expand Down
Loading