From 5d463dcee804f9aea5e2a0fffb42f509dc889db6 Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Thu, 5 Oct 2023 15:25:19 -0400 Subject: [PATCH 1/8] Adds and restructures docs Signed-off-by: Rafael Vasquez --- README.md | 45 +++----------------------------- docs/images/vmodels.png | Bin 0 -> 4199 bytes docs/overview.md | 40 ++++++++++++++++++++++++++++ docs/vmodels.md | 56 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 99 insertions(+), 42 deletions(-) create mode 100644 docs/images/vmodels.png create mode 100644 docs/overview.md create mode 100644 docs/vmodels.md diff --git a/README.md b/README.md index ef9e2431..fbfc6108 100644 --- a/README.md +++ b/README.md @@ -2,49 +2,10 @@ The ModelMesh framework is a mature, general-purpose model serving management/routing layer designed for high-scale, high-density and frequently-changing model use cases. It works with existing or custom-built model servers and acts as a distributed LRU cache for serving runtime models. -See these [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf) for more information on supported features and design details. +See these [charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf) for more information on supported features and design details. For full Kubernetes-based deployment and management of ModelMesh clusters and models, see the [ModelMesh Serving](https://github.com/kserve/modelmesh-serving) repo. This includes a separate controller and provides K8s custom resource based management of ServingRuntimes and InferenceServices along with common, abstracted handling of model repository storage and ready-to-use integrations with some existing OSS model servers. -### Quick-Start +## Get Started -1. Wrap your model-loading and invocation logic in this [model-runtime.proto](./src/main/proto/current/model-runtime.proto) gRPC service interface - - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout - - `loadModel()` - load the specified model into memory from backing storage, returning when complete - - `modelSize()` - determine size (mem usage) of previously-loaded model. If very fast, can be omitted and provided instead in the response from `loadModel` - - `unloadModel()` - unload previously loaded model, returning when complete - - Use a separate, arbitrary gRPC service interface for model inferencing requests. It can have any number of methods and they are assumed to be idempotent. See [predictor.proto](src/test/proto/predictor.proto) for a very simple example. - - The methods of your custom applier interface will be called only for already fully-loaded models. -2. Build a grpc server docker container which exposes these interfaces on localhost port 8085 or via a mounted unix domain socket -3. Extend the [Kustomize-based Kubernetes manifests](config) to use your docker image, and with appropriate mem and cpu resource allocations for your container -4. Deploy to a Kubernetes cluster as a regular Service, which will expose [this grpc service interface](./src/main/proto/current/model-mesh.proto) via kube-dns (you do not implement this yourself), consume using grpc client of your choice from your upstream service components - - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster - - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids) - -### Deployment and Upgrades - -Prerequisites: - -- An etcd cluster (shared or otherwise) -- A Kubernetes namespace with the etcd cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md) - - Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all of the framework's use of etcd - -From an operational standpoint, ModelMesh behaves just like any other homogeneous clustered microservice. This means it can be deployed, scaled, migrated and upgraded as a regular Kubernetes deployment without any special coordination needed, and without any impact to live service usage. - -In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml` - -### Build - -Sample build: - -```bash -GIT_COMMIT=$(git rev-parse HEAD) -BUILD_ID=$(date '+%Y%m%d')-$(git rev-parse HEAD | cut -c -5) -IMAGE_TAG_VERSION="dev" -IMAGE_TAG=${IMAGE_TAG_VERSION}-$(git branch --show-current)_${BUILD_ID} - -docker build -t modelmesh:${IMAGE_TAG} \ - --build-arg imageVersion=${IMAGE_TAG} \ - --build-arg buildId=${BUILD_ID} \ - --build-arg commitSha=${GIT_COMMIT} . -``` +To get started with the ModelMesh framework, check out [this guide](/docs/overview.md). diff --git a/docs/images/vmodels.png b/docs/images/vmodels.png new file mode 100644 index 0000000000000000000000000000000000000000..47943abf169defd5653409e7c5a779c2d72ecc90 GIT binary patch literal 4199 zcmc&&d05iv76vuNQbX@lrm2`s+AQsA8=<07mSI_0;yO;Q2`U*X8MtJY3pHwGOp=yc zhFXe*rlw44mQjiuijJe@f|8B`27wn0yEFH>_j&H}Joo;?1Hbc~^PTVap6`9n;fl)< zg!XcS&C@smnhR5tH2j0C@93d)wntUQ=8sX+!6#rWqaSrg?zQ zy4{gKTt`DX}X^)}6h?|y*(ur{u|o7nJe->GuP#;5u-!Ij&+ zKFmGM9POGk;^#0%PeSQ<{|9I(;n)=_ijLS5Al!Et*DZXw2N3pfQUzeIrUqC81U_@H z0fgy-0UrN{pI2&YYtaD#g}r9%LI{Ltub+3LSfC5Gu`{~MILb4F5EJ*QJ6z+xArew~Ya{+ygJk9R9ef*1`_`VC3xdd?-mJ5xMlzYSQM!(5sxb^GK^E%Fsb#`6j z`BqzAt21)CQif$871lF6@`Y(f2Q4vG`$k{_Xrp?aW$uu;kuyj9U^8 z(&(D7AABTdPhugMb>Vz}EUl`Bwc^7M`=l=t8gx6meif9ny)jTkFDmad^p368)^uy> zA^)<|ej-k6eyv}pFrbBsNv|>Aze_Wbu5-lu$9@!UaDnFw8=>&~xXi|s>p2{Qr^YwL zJnE>$7o^#@Fy(Y}PmxeJEIxK??QJY8WcJT*Kj}69X#Z7a@sjnH3Iwvn} zs*C(SrDCn$Zm`X@>i$MMFQP3+&PLH!9@94n&FcjEeCTJF<_=UgGbU!RfqhmC&DlfS zzZ)XPR2$vBv~I9Qi0jiijn>D8Y{GMwyhld&5_ZM%wyD`gV2_DOxdlfS86rb9FVgBO z@$g7=MAYktR@Ua$sSH|XEt-W55k9r=Mg9Kr&dmyI^R^dOb5CAc~N-HXyr%p?k0?Q;3?D&01;DN}#C^pmhw12Z4a9E{+5a+yA=kACvP18i1N zA*RU57mvq}%HcYfDE>;$w;(us`@g`z{~48Ad?PjZ<>zgZ4k#Sod`u|@*SEL8@l%5S zqlc7Q;Lr+KIVI>yXs2ycpeJX3mz_K3irWYJZ(I745KF7Z-`r4-3iL*|8B{bc=}BI0 z1F4DsZE>u}U8Tb0JxYL`_dYl^tfD;CSG~Cc^&b9=Fgn;c<4X=k9*dj|XG?c0Wyj(% zGi0ge7J0&b!KTA`r9ky)p#}RWQ>PrtqV{50J+Xd&Qi>Msa5>#^cleX+KGU7&?bTve z6dZlcT01$HXSt*aiwZK56rDk|@a#*J-)U19a?Q@~=L)OTF?TzNg~wr#Tx5`97R=q;)6 zE`;Jo2ZJW$(gh5l;XfI*W$As zo?w#3bvq_`;*%je?4ZoKX(B69^eUkFJfqOcdX)ytu&!OOgmJvHc*iQ`!Sze-;0v1% zy6h7@Px#DSln83&7HI{YTr}9FnE(!thrhe<8UV z7SJI(Y`eJg3*gMW1i_tXs+RSjL~#0xnKx8sS~_B2_oQ*{9oXw-_WQi{j$Ne9J`TP7 zopth`I4e)~j-1Eka_iDhl3XZuI>}{So;bNO2*P>fVc9|pc^xwLLgm|;9z0?+-ghhH zNKVA3Ne-{d=yCtn_g%+?)n97u(-?=!AEK|Up)ZgL+8VQF8uq-`%b0Cb(C45C)oj$E zccq0P@NTG|(BfiEMW4nm6PKhibJ{da^_3yecO>4F<7lG>XNJBLYQ?7Yy9bZOr1emc zqM8m$Q|Sj>iM38O8TZQZl{(NjQ)Wlk1Lud3%cYt~grxi7c~M1N%DB9@Iu8hb4s<5z zVUPykljGYMg3xI(R#e2xng4Zzg_k?7#7!zWyj4y_&kTtMeoWJ8VOE_Z8p?&8W9v__ z#2eOJl$PC-EgK9=T$9Q)C`ciKwX}(LYaKD(;~^tmp`grk11CZwQnzHVOhWml5xHil zQ0uUq30>5Et^|w!;&l@wx|P_ysgbv8D00PN4WMb!TH@Ee_hCbqhNe|!CU6qzUWnTB zonUahv$~IF)mutf#)&7F*Y_XN;8i^-*>l!0^4x!GB%^Pg;P1 z4QJ-epBjlrE{ibL)eL@)^k)%bi=Nz;3cOYEauWHZVPYJa&O%o&7o7;}=oIRL<6o{L z?yi0yo3r-3#9AJGrDg{`jR^MW1iX5FLL_fnCj<#-!EfWvZuvDkJ{FD*%MVgwOe`il z5#+fJ?9;y5@9Z|T?`do+Zw(U!YB;X$^Pr%MQb8^~FOrbfrdoRnJ|rcwKJC)f`ctXXtBbUgDuy|=d~rSLow9dNnaLom zyCX5$G?ZNbJLE!G+JE2gACTTz?rk4JI`TPEgDeaSoakhtuMRPLagV%8_0T zQz*`}yt(>(o(!YMT6SrV&2=L$%(W?FjQ3Zf)ZV>Gi+o2O?%X=%})k#3#Z2xU2 z>k)8UB_Vmy8V!MLx0MUokC(d@{m(~zOVYt&f%8U4qsHZdx*Dn3h2Ze{Sj{!ziAam; zWU~w92S0%`LXBbbD4D9O6}0g~>m(t(t%~G^OgBXBFEuwg6++73)iZ>7(vsa*mFnf! zxqtJMX(EX5_O<1C!{%AhYr+nCc>jV?`kL?_+uknS{drs+kyZLMBgHDUDm-!U>=`E3 zusN-ar6DUoEEIQT*W-Pm7sRwZHb>EciYehLr+Ha$Y=jK}yE=Zmyo*{igs7~i`=|8| zD Date: Thu, 5 Oct 2023 15:29:11 -0400 Subject: [PATCH 2/8] Change flow of MM overviews Signed-off-by: Rafael Vasquez --- README.md | 4 ++-- docs/overview.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index fbfc6108..b34e97ce 100644 --- a/README.md +++ b/README.md @@ -2,10 +2,10 @@ The ModelMesh framework is a mature, general-purpose model serving management/routing layer designed for high-scale, high-density and frequently-changing model use cases. It works with existing or custom-built model servers and acts as a distributed LRU cache for serving runtime models. -See these [charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf) for more information on supported features and design details. - For full Kubernetes-based deployment and management of ModelMesh clusters and models, see the [ModelMesh Serving](https://github.com/kserve/modelmesh-serving) repo. This includes a separate controller and provides K8s custom resource based management of ServingRuntimes and InferenceServices along with common, abstracted handling of model repository storage and ready-to-use integrations with some existing OSS model servers. +For more information on supported features and design details, see [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf). + ## Get Started To get started with the ModelMesh framework, check out [this guide](/docs/overview.md). diff --git a/docs/overview.md b/docs/overview.md index 89e16f94..c1121ee8 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -1,11 +1,11 @@ # Overview -ModelMesh is a distributed LRU cache for serving runtime models. - -See these [charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf) for more information on supported features and design details. +ModelMesh is a mature, general-purpose model serving management/routing layer designed for high-scale, high-density and frequently-changing model use cases. It works with existing or custom-built model servers and acts as a distributed LRU cache for serving runtime models. For full Kubernetes-based deployment and management of ModelMesh clusters and models, see the [ModelMesh Serving](https://github.com/kserve/modelmesh-serving) repo. This includes a separate controller and provides K8s custom resource based management of ServingRuntimes and InferenceServices along with common, abstracted handling of model repository storage and ready-to-use integrations with some existing OSS model servers. +For more information on supported features and design details, see [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf). + ## What is a model? In ModelMesh, a **model** refers to an abstraction of machine learning models. It is not aware of the underlying model format. There are two model types: model (regular) and vmodel. Regular models in ModelMesh are assumed and required to be immutable. VModels add a layer of indirection in front of the immutable models. See [VModels Reference](/docs/vmodels.md) for further reading. From 3cb39b5199f00eff5d9b9388e54ecdd55d5c5c1c Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Wed, 11 Oct 2023 15:06:07 -0400 Subject: [PATCH 3/8] Update readme and overview Signed-off-by: Rafael Vasquez --- README.md | 37 +------------------------------------ docs/overview.md | 26 +++++++++++++++++++++----- 2 files changed, 22 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index f6559cda..c6b25d09 100644 --- a/README.md +++ b/README.md @@ -8,44 +8,9 @@ For full Kubernetes-based deployment and management of ModelMesh clusters and mo For more information on supported features and design details, see [these charts](https://github.com/kserve/modelmesh/files/8854091/modelmesh-jun2022.pdf). -## Quickstart - -1. Wrap your model-loading and invocation logic in this [model-runtime.proto](./src/main/proto/current/model-runtime.proto) gRPC service interface - - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout - - `loadModel()` - load the specified model into memory from backing storage, returning when complete - - `modelSize()` - determine size (mem usage) of previously-loaded model. If very fast, can be omitted and provided instead in the response from `loadModel` - - `unloadModel()` - unload previously loaded model, returning when complete - - Use a separate, arbitrary gRPC service interface for model inferencing requests. It can have any number of methods and they are assumed to be idempotent. See [predictor.proto](src/test/proto/predictor.proto) for a very simple example. - - The methods of your custom applier interface will be called only for already fully-loaded models. -2. Build a grpc server docker container which exposes these interfaces on localhost port 8085 or via a mounted unix domain socket -3. Extend the [Kustomize-based Kubernetes manifests](config) to use your docker image, and with appropriate mem and cpu resource allocations for your container -4. Deploy to a Kubernetes cluster as a regular Service, which will expose [this grpc service interface](./src/main/proto/current/model-mesh.proto) via kube-dns (you do not implement this yourself), consume using grpc client of your choice from your upstream service components - - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster - - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids) - -## Deployment and upgrades - -Prerequisites: - -- An `etcd` cluster (shared or otherwise) -- A Kubernetes namespace with the `etcd` cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md) - - Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all of the framework's use of etcd - -From an operational standpoint, ModelMesh behaves just like any other homogeneous clustered microservice. This means it can be deployed, scaled, migrated and upgraded as a regular Kubernetes deployment without any special coordination needed, and without any impact to live service usage. - -In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml` - -## Build - -Sample build: - ## Get Started -docker build -t modelmesh:${IMAGE_TAG} \ - --build-arg imageVersion=${IMAGE_TAG} \ - --build-arg buildId=${BUILD_ID} \ - --build-arg commitSha=${GIT_COMMIT} . -``` +To get started with the ModelMesh framework, check out [this guide](/docs/overview.md). ## Developer guide diff --git a/docs/overview.md b/docs/overview.md index c1121ee8..78ef477c 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -27,14 +27,30 @@ In ModelMesh, a **model** refers to an abstraction of machine learning models. I - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids) -### Deployment and Upgrades +### Deployment and upgrades Prerequisites: -- An etcd cluster (shared or otherwise) -- A Kubernetes namespace with the etcd cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md) - - Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all the framework's use of etcd +- An `etcd` cluster (shared or otherwise) +- A Kubernetes namespace with the `etcd` cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md) + - Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all of the framework's use of etcd From an operational standpoint, ModelMesh behaves just like any other homogeneous clustered microservice. This means it can be deployed, scaled, migrated and upgraded as a regular Kubernetes deployment without any special coordination needed, and without any impact to live service usage. -In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml` \ No newline at end of file +In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml` + +### Build + +Sample build: + +```bash +GIT_COMMIT=$(git rev-parse HEAD) +BUILD_ID=$(date '+%Y%m%d')-$(git rev-parse HEAD | cut -c -5) +IMAGE_TAG_VERSION="dev" +IMAGE_TAG=${IMAGE_TAG_VERSION}-$(git branch --show-current)_${BUILD_ID} + +docker build -t modelmesh:${IMAGE_TAG} \ + --build-arg imageVersion=${IMAGE_TAG} \ + --build-arg buildId=${BUILD_ID} \ + --build-arg commitSha=${GIT_COMMIT} . +``` \ No newline at end of file From 5b41cfa15ad3052f4814063146b524f58588d6b4 Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Wed, 11 Oct 2023 16:04:24 -0400 Subject: [PATCH 4/8] Update overview, add config, move payload Signed-off-by: Rafael Vasquez --- README.md | 4 +- docs/configuration/README.md | 74 +++++++++++++++++++ docs/configuration/payloads.md | 26 +++++++ docs/configuration/tls.md | 68 +++++++++++++++++ docs/overview.md | 16 ++-- .../ibm/watson/modelmesh/payload/README.md | 34 --------- 6 files changed, 178 insertions(+), 44 deletions(-) create mode 100644 docs/configuration/README.md create mode 100644 docs/configuration/payloads.md create mode 100644 docs/configuration/tls.md delete mode 100644 src/main/java/com/ibm/watson/modelmesh/payload/README.md diff --git a/README.md b/README.md index c6b25d09..0b81a3d5 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,8 @@ For more information on supported features and design details, see [these charts ## Get Started -To get started with the ModelMesh framework, check out [this guide](/docs/overview.md). +To get started with the ModelMesh framework, check out [this overview](/docs/overview.md). ## Developer guide -Check out the [developer guide](developer-guide.md) to learn about development practices for the project. +Use the [developer guide](developer-guide.md) to learn about development practices for the project. diff --git a/docs/configuration/README.md b/docs/configuration/README.md new file mode 100644 index 00000000..3734def1 --- /dev/null +++ b/docs/configuration/README.md @@ -0,0 +1,74 @@ +A core goal of the ModelMesh framework was minimizing the amount of custom configuration required. It should be possible to get up and running without changing most of these things. + +## Model Runtime Configuration + +There are a few basic parameters (some optional) that the model runtime implementation must report in a `RuntimeStatusResponse` response to the `ModelRuntime.runtimeStatus` rpc method once it has successfully initialized: + +- `uint64 capacityInBytes` +- `uint32 maxLoadingConcurrency` +- `uint32 modelLoadingTimeoutMs` +- `uint64 defaultModelSizeInBytes` +- `string runtimeVersion` (optional) +- ~~`uint64 numericRuntimeVersion`~~ (deprecated, unused) +- `map methodInfos` (optional) +- `bool allowAnyMethod` - applicable only if one or more `methodInfos` are provided. +- `bool limitModelConcurrency` - (experimental) + +It's expected that all model runtime instances in the same cluster (with same Kubernetes deployment config including image version) will report the same values for these, although it's not strictly necessary. + +## TLS (SSL) Configuration + +This can be configured via environment variables on the ModelMesh container, refer to [the documentation](/docs/configuration/tls.md). + +## Model Auto-Scaling + +Nothing needs to be configured to enable this, it is on by default. There is a single configuration parameter which can optionally be used to tune the sensitivity of the scaling, based on rate of requests per model. Note that this applies to scaling copies of models within existing pods, not scaling of the pods themselves. + +The scale-up RPM threshold specifies a target request rate per model **copy** measured in requests per minute. Model-mesh balances requests between loaded copies of a given model evenly, and if one copy's share of requests increases above this threshold more copies will be added if possible in instances (replicas) that do not currently have the model loaded. + +The default for this parameter is 2000 RPM. It can be overridden by setting either the `MM_SCALEUP_RPM_THRESHOLD` environment variable or `scaleup_rpm_threshold` etcd/zookeeper dynamic config parameter, with the latter taking precedence. + +Other points to note: + +- Scale up can happen by more than one additional copy at a time if the request rate breaches the configured threshold by a sufficient amount. +- The number of replicas in the deployment dictates the maximum number of copies that a given model can be scaled to (one in each Pod). +- Models will scale to two copies if they have been used recently regardless of the load - the autoscaling behaviour applies between 2 and N>2 copies. +- Scale-down will occur slowly once the per-copy load remains below the configured threshold for long enough. +- Note that if the runtime is in latency-based auto-scaling mode (when the runtime returns non-default `limitModelConcurrency = true` in the `RuntimeStatusResponse`), scaling is triggered based on measured latencies/queuing rather than request rates, and the RPM threshold parameter will have no effect. + +## Request Header Logging + +To have particular gRPC request metadata headers included in any request-scoped log messages, set the `MM_LOG_REQUEST_HEADERS` environment variable to a json string->string map (object) whose keys are the header names to log and values are the names of corresponding entries to insert into the logger thread context map (MDC). + +Values can be either raw ascii or base64-encoded utf8; in the latter case the corresponding header name must end with `-bin`. For example: +``` +{ + "transaction_id": "txid", + "user_id-bin": "user_id" +} +``` +**Note**: this does not generate new log messages and successful requests aren't logged by default. To log a message for every request, additionally set the `MM_LOG_EACH_INVOKE` environment variable to true. + +## Other Optional Parameters + +Set via environment variables on the ModelMesh container: + +- `MM_SVC_GRPC_PORT` - external grpc port, default 8033 +- `INTERNAL_GRPC_SOCKET_PATH` - unix domain socket, which should be a file location on a persistent volume mounted in both the model-mesh and model runtime containers, defaults to /tmp/mmesh/grpc.sock +- `INTERNAL_SERVING_GRPC_SOCKET_PATH` - unix domain socket to use for inferencing requests, defaults to be same as primary domain socket +- `INTERNAL_GRPC_PORT` - pod-internal grpc port (model runtime localhost), default 8056 +- `INTERNAL_SERVING_GRPC_PORT` - pod-internal grpc port to use for inferencing requests, defaults to be same as primary pod-internal grpc port +- `MM_SVC_GRPC_MAX_MSG_SIZE` - max message size in bytes, default 16MiB +- `MM_SVC_GRPC_MAX_HEADERS_SIZE` - max headers size in bytes, defaults to gRPC default +- `MM_METRICS` - metrics configuration, see Metrics wiki page +- `MM_MULTI_PARALLELISM` - max multi-model request parallelism, default 4 +- `KV_READ_ONLY` (advanced) - run in "read only" mode where new (v)models cannot be registered or unregistered +- `MM_LOG_EACH_INVOKE` - log an INFO level message for every request; default is false, set to true to enable +- `MM_SCALEUP_RPM_THRESHOLD` - see Model auto-scaling above + +**Note**: only one of `INTERNAL_GRPC_SOCKET_PATH` and `INTERNAL_GRPC_PORT` can be set. The same goes for `INTERNAL_SERVING_GRPC_SOCKET_PATH` and `INTERNAL_SERVING_GRPC_PORT`. + +Set dynamically in kv-store (etcd or zookeeper): +- log_each_invocation - dynamic override of `MM_LOG_EACH_INVOKE` env var +- logger_level - TODO +- scaleup_rpm_threshold - dynamic override of `MM_SCALEUP_RPM_THRESHOLD` env var, see [auto-scaling](#model-auto-scaling) above. diff --git a/docs/configuration/payloads.md b/docs/configuration/payloads.md new file mode 100644 index 00000000..e1eb66d5 --- /dev/null +++ b/docs/configuration/payloads.md @@ -0,0 +1,26 @@ +## Payload Processing Overview +ModelMesh exchanges `Payloads` with models deployed within runtimes. In ModelMesh, a `Payload` consists of information regarding the id of the model and the method of the model being called, together with some data (actual binary requests or responses) and metadata (e.g., headers). + +A `PayloadProcessor` is responsible for processing such `Payloads` for models served by ModelMesh. Examples would include loggers of prediction requests, data sinks for data visualization, model quality assessment, or monitoring tooling. + +They can be configured to only look at payloads that are consumed and produced by certain models, or payloads containing certain headers, etc. This configuration is performed at the ModelMesh instance level. Multiple `PayloadProcessors` can be configured per each ModelMesh instance, and they can be set to care about specific portions of the payload (e.g., model inputs, model outputs, metadata, specific headers, etc.). + +As an example, a `PayloadProcessor` can see input data as below: + +```text +[mmesh.ExamplePredictor/predict, Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.51.1,mm-model-id=myModel,another-custom-header=custom-value,grpc-accept-encoding=gzip,grpc-timeout=1999774u), CompositeByteBuf(ridx: 0, widx: 2000004, cap: 2000004, components=147) +``` + +and/or output data as `ByteBuf`: +```text +java.nio.HeapByteBuffer[pos=0 lim=65 cap=65] +``` + +A `PayloadProcessor` can be configured by means of a whitespace separated `String` of URIs. For example, in a URI like `logger:///*?pytorch1234#predict`: +- the scheme represents the type of processor, e.g., `logger` +- the query represents the model id to observe, e.g., `pytorch1234` +- the fragment represents the method to observe, e.g., `predict` + +## Featured `PayloadProcessors`: +- `logger` : logs requests/responses payloads to `model-mesh` logs (_INFO_ level), e.g., use `logger://*` to log every `Payload` +- `http` : sends requests/responses payloads to a remote service (via _HTTP POST_), e.g., use `http://10.10.10.1:8080/consumer/kserve/v2` to send every `Payload` to the specified HTTP endpoint \ No newline at end of file diff --git a/docs/configuration/tls.md b/docs/configuration/tls.md new file mode 100644 index 00000000..3a445a0d --- /dev/null +++ b/docs/configuration/tls.md @@ -0,0 +1,68 @@ +## Enable TLS/SSL + +TLS between the ModelMesh container and the model runtime container isn't currently required or supported, since the communication happens with a single pod over localhost. + +However, TLS must be enabled in production deployments for the external gRPC service interfaces exposed by ModelMesh itself (which include your proxied custom gRPC interface). + +To do this, you must provide both private key and corresponding cert files in pem format, volume-mounting them into the ModelMesh container from a kubernetes secret. TLS is then enabled by setting the values of the following env vars on the ModelMesh container to the paths of those mounted files as demonstrated [here](https://github.com/kserve/modelmesh/blob/main/config/base/patches/tls.yaml#L39-L42). + +The same certificate pair will then also be used for "internal" communication between the model-mesh pods, which is unencrypted otherwise (in prior versions the internal traffic was encrypted unconditionally, but using "hardcoded" certs baked into the image which have now been removed). + +## Client Authentication + +To additionally enable TLS Client Auth (aka Mutual Auth, mTLS): + +- Set the `MM_TLS_CLIENT_AUTH` env var to either `REQUIRE` or `OPTIONAL` (case-insensitive) +- Mount pem-format cert(s) to use for trust verification into the container, and set the `MM_TLS_TRUST_CERT_PATH` to a comma-separated list of the mounted paths to these files + +## Certificate Format + +A `PKCS8` format key is required due to netty [only supporting PKCS8 keys](https://github.com/netty/netty/wiki/SslContextBuilder-and-Private-Key). + +For a key cert pair, `server.crt` and `server.key`, you can convert an unencrypted `PKCS1` key to `PKCS8`. + +``` +$ openssl pkcs8 -topk8 -nocrypt -in server.key -out mmesh.key +``` + +If only one hash is displayed, they match. You can also use the above command to verify the original key cert pair `server.crt` and `server.key`. + +### cert-manager +If you are using [cert-manager](https://github.com/cert-manager/cert-manager) on Kubernetes/OpenShift to generate certificates, just ensure that the `.spec.privateKey.encoding` field of your Certificate CR is set to `PKCS8` (it defaults to `PKCS1`). + +## Updating and Rotating Private Keys + +Because the provided certificates are also used for intra-cluster communication, care must be taken when updating to a new private key to avoid potential temporary impact to the service. All pods inter-communicate during rolling upgrade transitions, so the new pods must be able to connect to the old pods and vice versa. If new trust certs are required for the new private key, an update must be performed first to ensure both old and new trust certs are used, and these must both remain present for the subsequent key update. Note that these additional steps are not required if a common and unchanged CA certificate is used for trust purposes. + +There is a dedicated env var `MM_INTERNAL_TRUST_CERTS` which can be used to specify additional trust (public) certificates for inter-cluster communication only. It can be set to one or more comma-separated paths which point to either individual pem-formatted cert files or directories containing certs with `.pem` and/or `.crt` extensions. These paths would correspond to Kube-mounted secrets. Here is an example of the three distinct updates required: + +1. Add `MM_INTERNAL_TRUST_CERTS` pointing to the new cert: +``` +- name: MM_TLS_KEY_CERT_PATH + value: /path/to/existing-keycert.pem +- name: MM_TLS_PRIVATE_KEY_PATH + value: /path/to/existing-key.pem +- name: MM_INTERNAL_TRUST_CERTS + value: /path/to/new-cacert.pem +``` +2. Switch to the new private key pair, with `MM_INTERNAL_TRUST_CERTS` now pointing to the old cert: +``` +- name: MM_TLS_KEY_CERT_PATH + value: /path/to/new-keycert.pem +- name: MM_TLS_PRIVATE_KEY_PATH + value: /path/to/new-key.pem +- name: MM_INTERNAL_TRUST_CERTS + value: /path/to/existing-keycert.pem +``` +3. Optionally remove `MM_TRUST_CERTS`: +``` +- name: MM_TLS_KEY_CERT_PATH + value: /path/to/new-keycert.pem +- name: MM_TLS_PRIVATE_KEY_PATH + value: /path/to/new-key.pem +``` + +**Note**: these additional steps shouldn't be required if either: + +- The same CA is used for both the old and new public certs (so they are not self-signed) +- Some temporary service disruption is acceptable - this will likely manifest as some longer response times during the upgrade, possibly with some timeouts and failures. It should not persist beyond the rolling update process and the exact magnitude of the impact depends on various factors such as cluster size, loading time, request volume and patterns, etc. \ No newline at end of file diff --git a/docs/overview.md b/docs/overview.md index 78ef477c..f5415d08 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -14,16 +14,16 @@ In ModelMesh, a **model** refers to an abstraction of machine learning models. I ### Implement a model runtime -1. Wrap your model-loading and invocation logic in this [model-runtime.proto](/src/main/proto/current/model-runtime.proto) gRPC service interface - - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout - - `loadModel()` - load the specified model into memory from backing storage, returning when complete - - `modelSize()` - determine size (mem usage) of previously-loaded model. If very fast, can be omitted and provided instead in the response from `loadModel` - - `unloadModel()` - unload previously loaded model, returning when complete +1. Wrap your model-loading and invocation logic in this [model-runtime.proto](/src/main/proto/current/model-runtime.proto) gRPC service interface. + - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout. + - `loadModel()` - load the specified model into memory from backing storage, returning when complete. + - `modelSize()` - determine size (memory usage) of previously-loaded model. If very fast, can be omitted and provided instead in the response from `loadModel`. + - `unloadModel()` - unload previously loaded model, returning when complete. - Use a separate, arbitrary gRPC service interface for model inferencing requests. It can have any number of methods and they are assumed to be idempotent. See [predictor.proto](/src/test/proto/predictor.proto) for a very simple example. - The methods of your custom applier interface will be called only for already fully-loaded models. -2. Build a grpc server docker container which exposes these interfaces on localhost port 8085 or via a mounted unix domain socket -3. Extend the [Kustomize-based Kubernetes manifests](/config) to use your docker image, and with appropriate mem and cpu resource allocations for your container -4. Deploy to a Kubernetes cluster as a regular Service, which will expose [this grpc service interface](/src/main/proto/current/model-mesh.proto) via kube-dns (you do not implement this yourself), consume using grpc client of your choice from your upstream service components +2. Build a grpc server docker container which exposes these interfaces on localhost port 8085 or via a mounted unix domain socket. +3. Extend the [Kustomize-based Kubernetes manifests](/config) to use your docker image, and with appropriate memory and CPU resource allocations for your container. +4. Deploy to a Kubernetes cluster as a regular Service, which will expose [this grpc service interface](/src/main/proto/current/model-mesh.proto) via kube-dns (you do not implement this yourself), consume using grpc client of your choice from your upstream service components. - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids) diff --git a/src/main/java/com/ibm/watson/modelmesh/payload/README.md b/src/main/java/com/ibm/watson/modelmesh/payload/README.md deleted file mode 100644 index 1b4e2464..00000000 --- a/src/main/java/com/ibm/watson/modelmesh/payload/README.md +++ /dev/null @@ -1,34 +0,0 @@ -Processing model-mesh payloads -============================= - -`Model-mesh` exchange `Payloads` with the models deployed within runtimes. -In `model-mesh` a `Payload` consists of information regarding the id of the model and the _method_ of the model being called, together with some data (actual binary requests or responses) and metadata (e.g., headers). -A `PayloadProcessor` is responsible for processing such `Payloads` for models served by `model-mesh`. - -Reasonable examples of `PayloadProcessors` include loggers of prediction requests, data sinks for data visualization, model quality assessment or monitoring tooling. - -A `PayloadProcessor` can be configured to only look at payloads that are consumed and produced by certain models, or payloads containing certain headers, etc. -This configuration is performed at `ModelMesh` instance level. -Multiple `PayloadProcessors` can be configured per each `ModelMesh` instance. - -Implementations of `PayloadProcessors` can care about only specific portions of the payload (e.g., model inputs, model outputs, metadata, specific headers, etc.). - -A `PayloadProcessor` can see input data like the one in this example: -```text -[mmesh.ExamplePredictor/predict, Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.51.1,mm-model-id=myModel,another-custom-header=custom-value,grpc-accept-encoding=gzip,grpc-timeout=1999774u), CompositeByteBuf(ridx: 0, widx: 2000004, cap: 2000004, components=147) -``` - -A `PayloadProcessor` can see output data as `ByteBuf` like the one in this example: -```text -java.nio.HeapByteBuffer[pos=0 lim=65 cap=65] -``` - -A `PayloadProcessor` can be configured by means of a whitespace separated `String` of URIs. -In a URI like `logger:///*?pytorch1234#predict`: -* the scheme represents the type of processor, e.g., `logger` -* the query represents the model id to observe, e.g., `pytorch1234` -* the fragment represents the method to observe, e.g., `predict` - -Featured `PayloadProcessors`: -* `logger` : logs requests/responses payloads to `model-mesh` logs (_INFO_ level), e.g., use `logger://*` to log every `Payload` -* `http` : sends requests/responses payloads to a remote service (via _HTTP POST_), e.g., use `http://10.10.10.1:8080/consumer/kserve/v2` to send every `Payload` to the specified HTTP endpoint \ No newline at end of file From 7a88f1581fb2841523504984cf14833693392d9c Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Wed, 11 Oct 2023 16:27:42 -0400 Subject: [PATCH 5/8] Adds scaling/rollingupdate doc Signed-off-by: Rafael Vasquez --- docs/configuration/scaling.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 docs/configuration/scaling.md diff --git a/docs/configuration/scaling.md b/docs/configuration/scaling.md new file mode 100644 index 00000000..dd96bacb --- /dev/null +++ b/docs/configuration/scaling.md @@ -0,0 +1,31 @@ +ModelMesh relies on [Kubernetes for rolling updates](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/). For the sake of simplicity and elasticity, ModelMesh does not keep track of update states or so internally. + +## Scaling Up/Down + +ModelMesh follows the process below, skipping the termination/migration steps in the context of scaling up (adding new pods). + +1. A new Pod with updates starts. +2. Kubernetes awaits the new Pod to report `Ready` state. +3. If ready, it triggers termination of the old Pod. +4. Once the old Pod receives a termination signal from Kubernetes, it will begin to migrate its models to other instances. + +Asynchronously, ModelMesh will try to rebalance model distribution among all the pods with `Ready` state. + +## Fail Fast with Readiness Probes + +When an update triggers a cluster-wise failure, resulting in the failure to load existing models on new pods, fail fast protection will prevent old cluster from shutting down completely by using [Readiness Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes). + +ModelMesh achieves fail fast by collecting statistics about loading failures during the startup period. Specifically: + +1. Critical Failure - if this model loaded successfully on other pods, but cannot be loaded on this pod. +2. General Failure - if a new model cannot be loaded on this pod. + +However, this statistics are only collected during the startup period. The length of this period can be controlled by the environment variable `BOOTSTRAP_CLEARANCE_PERIOD_MS`. Once failure statistics exceed the threshold on certain pods, these pods will start to report a `NOT READY` state. This will prevent the old pods from terminating. + +The default `BOOTSTRAP_CLEARANCE_PERIOD_MS` is 3 minutes (180,000 ms). + +**Note**: you may also want to tweak the readiness probes' parameters as well. For example, increasing `initialDelaySeconds` may help slow down the shutdown old pods too early. + +## Rolling Update Configuration + +Specify `maxUnavailable` and `maxSurge` [as described here](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment) to control the rolling update process. \ No newline at end of file From 412346db4bbcedb93416df4d9c1dc5a3ed421a32 Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Fri, 13 Oct 2023 12:53:43 -0400 Subject: [PATCH 6/8] Adds metrics documentation Signed-off-by: Rafael Vasquez --- docs/metrics.md | 64 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 docs/metrics.md diff --git a/docs/metrics.md b/docs/metrics.md new file mode 100644 index 00000000..b999135c --- /dev/null +++ b/docs/metrics.md @@ -0,0 +1,64 @@ +ModelMesh publishes a variety of metrics related to model request rates and timings, model loading/unloading rates, times and sizes, internal queuing delays, capacity/usage, cache state/LRU, and more, which can be used in addition to the Kubernetes-level resource metrics. + +### Configuring metrics + +By default, metrics are pushed to Sysdig via the StatsD protocol on UDP port `8126` but Prometheus-based metrics publishing (pull, instead of push) is also supported and recommended over StatsD. It is not currently the default since there are some annotations which also need to be added to the ModelMesh pod spec before Sysdig will capture the metrics (see [below](#enabling-sysdig-capture-of-prometheus-metrics)). + +The `MM_METRICS` env variable can be used to configure or disable how metrics are published: + +- To disable metrics, set it to `disabled`. +- Otherwise, set to `[:param1=val1;param2=val2;...;paramN=valN]` where `` can be either `statsd` or `prometheus`, and `paramI=valI` are optional associated parameters from the table below: + +| | Purpose | Applies to | Default | +|:------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------:|:----------------------------------:| +| `port` | Port on which to send or serve metrics | statsd (UDP push), prometheus (HTTP/HTTPS serve) | 8126 (statsd), 2112 (prometheus) | +| `fq_names` | Whether to use fully-qualified method names in request metrics | statsd, prometheus | false | +| `legacy` | Whether to publish legacy flavour (non-Sysdig) statsd metrics. Note that the legacy metrics are equivalent but have different names to those in the table below | statsd | false | +| `scheme` | Protocol scheme to use for Prometheus metrics, can be http or https | prometheus | https | + + +### Capturing Prometheus metrics + +Sysdig will only capture Prometheus metrics from pods with the appropriate annotations set. In addition to configuring the `MM_METRICS` env var, the following annotations must be configured on the ModelMesh deployment's Pod spec: + +``` +prometheus.io/path: /metrics +prometheus.io/port: "2112" +prometheus.io/scheme: https +prometheus.io/scrape: "true" +``` + +### List of Exposed Metrics + +| Name | Type | Scope | Description | +|:--------------------------------------------:|:--------:|:---------------:|:--------------------------------------------------------------------------:| +| modelmesh_invoke_model | Count | (statsd only) | Count of internal model server inference requests | +| modelmesh_invoke_model_milliseconds | Timing | | Internal model server inference request time | +| modelmesh_api_request | Count | (statsd only) | Count of external inference requests | +| modelmesh_api_request_milliseconds | Timing | | External inference request time | +| modelmesh_request_size_bytes | Size | | Inference request payload size | +| modelmesh_response_size_bytes | Size | | Inference response payload size | +| modelmesh_cache_miss | Count | (statsd only) | Count of inference request cache misses | +| modelmesh_cache_miss_milliseconds | Timing | | Cache miss delay | +| modelmesh_loadmodel | Count | (statsd only) | Count of model loads | +| modelmesh_loadmodel_milliseconds | Timing | | Time taken to load model | +| modelmesh_loadmodel_failure | Count | | Model load failures | +| modelmesh_unloadmodel | Count | (statsd only) | Count of model unloads | +| modelmesh_unloadmodel_milliseconds | Timing | | Time taken to unload model | +| modelmesh_unloadmodel_failure | Count | | Unload model failures (not counting multiple attempts for same copy) | +| modelmesh_unloadmodel_attempt_failure | Count | | Unload model attempt failures | +| modelmesh_req_queue_delay_milliseconds | Timing | | Time spent in inference request queue | +| modelmesh_loading_queue_delay_milliseconds | Timing | | Time spent in model loading queue | +| modelmesh_model_sizing_milliseconds | Timing | | Time taken to perform model sizing | +| modelmesh_model_evicted | Count | (statsd only) | Count of model copy evictions | +| modelmesh_age_at_eviction_milliseconds | Age | | Time since model was last used when evicted | +| modelmesh_loaded_model_size_bytes | Size | | Reported size of loaded model | +| modelmesh_models_loaded_total | Gauge | Deployment | Total number of models with at least one loaded copy | +| modelmesh_models_with_failure_total | Gauge | Deployment | Total number of models with one or more recent load failures | +| modelmesh_models_managed_total | Gauge | Deployment | Total number of models managed | +| modelmesh_instance_lru_seconds | Gauge | Pod | Last used time of least recently used model in pod (in secs since epoch) | +| modelmesh_instance_lru_age_seconds | Gauge | Pod | Last used age of least recently used model in pod (secs ago) | +| modelmesh_instance_capacity_bytes | Gauge | Pod | Effective model capacity of pod excluding unload buffer | +| modelmesh_instance_used_bytes | Gauge | Pod | Amount of capacity currently in use by loaded models | +| modelmesh_instance_used_bps | Gauge | Pod | Amount of capacity used in basis points (100ths of percent) | +| modelmesh_instance_models_total | Gauge | Pod | Number of model copies loaded in pod | \ No newline at end of file From 10c412aa26b83e097d9d355a098b1f64ea05692d Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Fri, 13 Oct 2023 12:58:14 -0400 Subject: [PATCH 7/8] Make overview doc the docs readme Signed-off-by: Rafael Vasquez --- README.md | 2 +- docs/{overview.md => README.md} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename docs/{overview.md => README.md} (100%) diff --git a/README.md b/README.md index 0b81a3d5..10f31a55 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ For more information on supported features and design details, see [these charts ## Get Started -To get started with the ModelMesh framework, check out [this overview](/docs/overview.md). +To learn more about and get started with the ModelMesh framework, check out [the documentation](/docs). ## Developer guide diff --git a/docs/overview.md b/docs/README.md similarity index 100% rename from docs/overview.md rename to docs/README.md From 46f33f1ad94881646f191ca70ca081c4347b5549 Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Tue, 17 Oct 2023 14:30:13 -0400 Subject: [PATCH 8/8] Removes duplicate info and link to dev guide Signed-off-by: Rafael Vasquez --- docs/README.md | 32 +++----------------------------- 1 file changed, 3 insertions(+), 29 deletions(-) diff --git a/docs/README.md b/docs/README.md index f5415d08..c2dd834f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -10,9 +10,7 @@ For more information on supported features and design details, see [these charts In ModelMesh, a **model** refers to an abstraction of machine learning models. It is not aware of the underlying model format. There are two model types: model (regular) and vmodel. Regular models in ModelMesh are assumed and required to be immutable. VModels add a layer of indirection in front of the immutable models. See [VModels Reference](/docs/vmodels.md) for further reading. -## Usage - -### Implement a model runtime +## Implement a model runtime 1. Wrap your model-loading and invocation logic in this [model-runtime.proto](/src/main/proto/current/model-runtime.proto) gRPC service interface. - `runtimeStatus()` - called only during startup to obtain some basic configuration parameters from the runtime, such as version, capacity, model-loading timeout. @@ -27,30 +25,6 @@ In ModelMesh, a **model** refers to an abstraction of machine learning models. I - `registerModel()` and `unregisterModel()` for registering/removing models managed by the cluster - Any custom inferencing interface methods to make a runtime invocation of previously-registered model, making sure to set a `mm-model-id` or `mm-vmodel-id` metadata header (or `-bin` suffix equivalents for UTF-8 ids) -### Deployment and upgrades - -Prerequisites: - -- An `etcd` cluster (shared or otherwise) -- A Kubernetes namespace with the `etcd` cluster connection details configured as a secret key in [this json format](https://github.com/IBM/etcd-java/blob/master/etcd-json-schema.md) - - Note that if provided, the `root_prefix` attribute _is_ used as a key prefix for all of the framework's use of etcd - -From an operational standpoint, ModelMesh behaves just like any other homogeneous clustered microservice. This means it can be deployed, scaled, migrated and upgraded as a regular Kubernetes deployment without any special coordination needed, and without any impact to live service usage. - -In particular the procedure for live upgrading either the framework container or service runtime container is the same: change the image version in the deployment config yaml and then update it `kubectl apply -f model-mesh-deploy.yaml` - -### Build - -Sample build: - -```bash -GIT_COMMIT=$(git rev-parse HEAD) -BUILD_ID=$(date '+%Y%m%d')-$(git rev-parse HEAD | cut -c -5) -IMAGE_TAG_VERSION="dev" -IMAGE_TAG=${IMAGE_TAG_VERSION}-$(git branch --show-current)_${BUILD_ID} +## Development -docker build -t modelmesh:${IMAGE_TAG} \ - --build-arg imageVersion=${IMAGE_TAG} \ - --build-arg buildId=${BUILD_ID} \ - --build-arg commitSha=${GIT_COMMIT} . -``` \ No newline at end of file +Please see the [Developer Guide](/developer-guide.md) for details. \ No newline at end of file