Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable inference serving capabilities on sagemaker endpoint #536

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gwang111
Copy link

@gwang111 gwang111 commented Dec 27, 2024

Description of changes:
Added source code to enable serving capabilities on SageMaker Endpoint.

  • when the serve command is passed on container startup, the inference server script will execute
  • it will then start a Tornado web server in either async or sync mode

Testing
Using a basic LCEL inference script:

logger = logging.getLogger("sagemaker_distribution.inference_server")

qa_prompt = ChatPromptTemplate.from_template("{question}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime"
)

text_gen_model = Bedrock(
    model_id="anthropic.claude-v2",
    client=bedrock_runtime
)
text_gen_model.guardrails.update(
    {
        "guardrailVersion": None,
        "guardrailIdentifier": None,
    }
)

lcel_chain = qa_prompt | text_gen_model

def handler(request):
    return lcel_chain.stream(request.body)

  • Deployed to SM Endpoint and tested with
invoke_endpoint() -> blocks until all text is received. Still functional however

invoke_endpoint_with_response_stream() -> streaming call works
  • Repeated with async iterator and works the same
  • Tested with regular/async invoke and all things are functional as well

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@gwang111 gwang111 changed the title [Feat] Enable inference serving capabilities on sagemaker endpoint using tor… Feat: Enable inference serving capabilities on sagemaker endpoint using tor… Dec 27, 2024
@gwang111 gwang111 changed the title Feat: Enable inference serving capabilities on sagemaker endpoint using tor… feat: Enable inference serving capabilities on sagemaker endpoint Dec 27, 2024
@gwang111 gwang111 force-pushed the inference-serving branch 3 times, most recently from 9cee77b to bb132ea Compare December 27, 2024 21:29
Comment on lines 10 to 11
CODE_DIRECTORY = "SAGEMAKER_INFERENCE_CODE_DIRECTORY"
CODE = "SAGEMAKER_INFERENCE_CODE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets discuss these inputs offline.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will review these with Saurabh/PM since these will be customer facing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will schedule a meeting next week

build_artifacts/v2/v2.2/v2.2.0/gpu.env.in Outdated Show resolved Hide resolved
@aws-tianquaw
Copy link
Contributor

Can you move your changes under "template" folder?

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

@gwang111
Copy link
Author

gwang111 commented Jan 6, 2025

Can you move your changes under "template" folder?

For any changes/additions in cpu/gpu.env.in file, please create an issue (example) so that we'll add them to the next image major/minor releases

Moved the code to the v3 template folder since this code will be launching in the next major version release

@gwang111 gwang111 force-pushed the inference-serving branch 11 times, most recently from 9865a75 to 4153ebb Compare January 10, 2025 22:41
@cj-zhang
Copy link

LGTM! Could you add details about how this was tested though?

Copy link
Contributor

@aws-tianquaw aws-tianquaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure you have E2E integration tests ready for your entrypoint script. And before the automatic test is ready, can you share some manual test results and attach to this PR? For example, you may build an SMD images locally with your PR, and verify using it in sagemaker inference endpoint.

To build images with your changes, you can checkout release-3.0.0 branch and add your changes locally to folder https://github.com/aws/sagemaker-distribution/tree/release-3.0.0/build_artifacts/v3/v3.0/v3.0.0. Then, you can run the following commands to build your image:

conda env update --file environment.yml -n sagemaker-distribution
conda activate sagemaker-distribution
python ./src/main.py build --target-patch-version=3.0.0 --skip-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants