feat: Enable inference serving capabilities on sagemaker endpoint #536

gwang111 · 2024-12-27T20:40:47Z

Description of changes:
Added source code to enable serving capabilities on SageMaker Endpoint.

when the serve command is passed on container startup, the inference server script will execute
it will then start a Tornado web server in either async or sync mode

Testing
Using a basic LCEL inference script:

logger = logging.getLogger("sagemaker_distribution.inference_server")

qa_prompt = ChatPromptTemplate.from_template("{question}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime"
)

text_gen_model = Bedrock(
    model_id="anthropic.claude-v2",
    client=bedrock_runtime
)
text_gen_model.guardrails.update(
    {
        "guardrailVersion": None,
        "guardrailIdentifier": None,
    }
)

lcel_chain = qa_prompt | text_gen_model

def handler(request):
    return lcel_chain.stream(request.body)

Deployed to SM Endpoint and tested with

invoke_endpoint() -> blocks until all text is received. Still functional however

invoke_endpoint_with_response_stream() -> streaming call works

Repeated with async iterator and works the same
Tested with regular/async invoke and all things are functional as well

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/server.py

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/serve

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/tornado_server/sync_server.py

GanAlps · 2024-12-31T20:05:38Z

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/utils/environment.py

+    CODE_DIRECTORY = "SAGEMAKER_INFERENCE_CODE_DIRECTORY"
+    CODE = "SAGEMAKER_INFERENCE_CODE"


lets discuss these inputs offline.

we will review these with Saurabh/PM since these will be customer facing.

will schedule a meeting next week

build_artifacts/v2/v2.2/v2.2.0/gpu.env.in

aws-tianquaw · 2025-01-06T19:26:11Z

Can you move your changes under "template" folder?

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

gwang111 · 2025-01-06T19:27:04Z

Can you move your changes under "template" folder?

https://github.com/aws/sagemaker-distribution/tree/main/template/v2

https://github.com/aws/sagemaker-distribution/tree/main/template/v3

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

Moved the code to the v3 template folder since this code will be launching in the next major version release

build_artifacts/v2/v2.2/v2.2.0/Dockerfile

template/v3/dirs/etc/sagemaker-inference-server/tornado_server/stream_handler.py

template/v3/dirs/etc/sagemaker-inference-server/tornado_server/async_handler.py

template/v3/dirs/etc/sagemaker-inference-server/tornado_server/stream_handler.py

template/v3/dirs/etc/sagemaker-inference-server/utils/environment.py

template/v3/dirs/etc/sagemaker-inference-server/tornado_server/server.py

template/v3/dirs/etc/sagemaker-inference-server/utils/environment.py

template/v3/dirs/etc/sagemaker-inference-server/serve.py

…nado

cj-zhang · 2025-01-15T21:40:49Z

LGTM! Could you add details about how this was tested though?

aws-tianquaw

Please make sure you have E2E integration tests ready for your entrypoint script. And before the automatic test is ready, can you share some manual test results and attach to this PR? For example, you may build an SMD images locally with your PR, and verify using it in sagemaker inference endpoint.

To build images with your changes, you can checkout release-3.0.0 branch and add your changes locally to folder https://github.com/aws/sagemaker-distribution/tree/release-3.0.0/build_artifacts/v3/v3.0/v3.0.0. Then, you can run the following commands to build your image:

conda env update --file environment.yml -n sagemaker-distribution
conda activate sagemaker-distribution
python ./src/main.py build --target-patch-version=3.0.0 --skip-tests

gwang111 changed the title ~~[Feat] Enable inference serving capabilities on sagemaker endpoint using tor…~~ Feat: Enable inference serving capabilities on sagemaker endpoint using tor… Dec 27, 2024

gwang111 changed the title ~~Feat: Enable inference serving capabilities on sagemaker endpoint using tor…~~ feat: Enable inference serving capabilities on sagemaker endpoint Dec 27, 2024

gwang111 force-pushed the inference-serving branch 3 times, most recently from 9cee77b to bb132ea Compare December 27, 2024 21:29

cj-zhang reviewed Dec 30, 2024

View reviewed changes

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/server.py Outdated Show resolved Hide resolved

cj-zhang reviewed Dec 30, 2024

View reviewed changes

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/server.py Outdated Show resolved Hide resolved

cj-zhang reviewed Dec 30, 2024

View reviewed changes

build_artifacts/v2/v2.2/v2.2.0/dirs/etc/inference-server/server.py Outdated Show resolved Hide resolved

GanAlps reviewed Dec 31, 2024

View reviewed changes

aws-tianquaw reviewed Jan 6, 2025

View reviewed changes

build_artifacts/v2/v2.2/v2.2.0/Dockerfile Outdated Show resolved Hide resolved

gwang111 force-pushed the inference-serving branch 11 times, most recently from 9865a75 to 4153ebb Compare January 10, 2025 22:41

cj-zhang reviewed Jan 10, 2025

View reviewed changes

template/v3/dirs/etc/sagemaker-inference-server/tornado_server/stream_handler.py Outdated Show resolved Hide resolved

GanAlps reviewed Jan 13, 2025

View reviewed changes

template/v3/dirs/etc/sagemaker-inference-server/utils/environment.py Show resolved Hide resolved

template/v3/dirs/etc/sagemaker-inference-server/serve.py Show resolved Hide resolved

gwang111 force-pushed the inference-serving branch from 4153ebb to f445e11 Compare January 14, 2025 23:11

Enable inference serving capabilities on sagemaker endpoint using tor…

141031a

…nado

gwang111 force-pushed the inference-serving branch from f445e11 to 141031a Compare January 14, 2025 23:50

cj-zhang approved these changes Jan 15, 2025

View reviewed changes

aws-tianquaw reviewed Jan 16, 2025

View reviewed changes

GanAlps approved these changes Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable inference serving capabilities on sagemaker endpoint #536

feat: Enable inference serving capabilities on sagemaker endpoint #536

gwang111 commented Dec 27, 2024 •

edited

Loading

GanAlps Dec 31, 2024

GanAlps Jan 6, 2025

gwang111 Jan 9, 2025

aws-tianquaw commented Jan 6, 2025

gwang111 commented Jan 6, 2025 •

edited

Loading

cj-zhang commented Jan 15, 2025

aws-tianquaw left a comment

		CODE_DIRECTORY = "SAGEMAKER_INFERENCE_CODE_DIRECTORY"
		CODE = "SAGEMAKER_INFERENCE_CODE"

feat: Enable inference serving capabilities on sagemaker endpoint #536

Are you sure you want to change the base?

feat: Enable inference serving capabilities on sagemaker endpoint #536

Conversation

gwang111 commented Dec 27, 2024 • edited Loading

GanAlps Dec 31, 2024

Choose a reason for hiding this comment

GanAlps Jan 6, 2025

Choose a reason for hiding this comment

gwang111 Jan 9, 2025

Choose a reason for hiding this comment

aws-tianquaw commented Jan 6, 2025

gwang111 commented Jan 6, 2025 • edited Loading

cj-zhang commented Jan 15, 2025

aws-tianquaw left a comment

Choose a reason for hiding this comment

gwang111 commented Dec 27, 2024 •

edited

Loading

gwang111 commented Jan 6, 2025 •

edited

Loading