Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LBED: zETL Opensearch lab #107

Merged
merged 33 commits into from
Mar 29, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
2e6ab87
Added new lab - DynamoDB to OpenSearch zero-ETL
terhunej Feb 25, 2024
e1c317e
Update ddb-os-zetl-chapter01.en.md
terhunej Feb 25, 2024
de8a050
Update ddb-os-zetl2.jpg
terhunej Feb 25, 2024
1559bc7
fixed images, changed numbering
terhunej Feb 27, 2024
be42adc
fixed numbering
terhunej Feb 27, 2024
b337268
fixed numbering
terhunej Feb 27, 2024
eba531e
fixed numbering
terhunej Feb 27, 2024
1245e55
fixed numbering
terhunej Feb 27, 2024
890948b
fixed numbering
terhunej Feb 27, 2024
3bba7a2
Update ddb-os-zetl-chapter08.en.md
terhunej Feb 27, 2024
13e39d6
Merge branch 'aws-samples:master' into opensearch-lab
terhunej Mar 16, 2024
1609d5d
Formatting, Secrets Manager, Context
terhunej Mar 16, 2024
c51b667
warnings
terhunej Mar 16, 2024
3b2aed4
Split out non-critical package install to separate line
switch180 Mar 21, 2024
73b506f
Merge branch 'master' into opensearch-lab
switch180 Mar 21, 2024
0c635e1
Moved OS pipeline code into repo, building ZIP on deploy
switch180 Mar 21, 2024
0c0ab5f
Fixing zETL refs to S3 and in build system
switch180 Mar 21, 2024
7b0af78
Fixed broken image links
switch180 Mar 21, 2024
a5b1efe
Merge branch 'master' into opensearch-lab
switch180 Mar 22, 2024
b413847
Adding MIT license
switch180 Mar 22, 2024
2054ff6
Fixing formatting and images. Modified some steps.
switch180 Mar 22, 2024
eee6572
Updating w latest template view on stack create
switch180 Mar 22, 2024
8235b07
Implemented Requested Changes
terhunej Mar 28, 2024
4d13ed1
Fixed links, ordering
switch180 Mar 28, 2024
5ed047b
Re-ordered workshops
switch180 Mar 28, 2024
75c36c3
Moving LETL/zETL assets to own folder
switch180 Mar 28, 2024
3a3fb38
LEDA updates for new IAM console and region change to PDX
switch180 Mar 29, 2024
bba3885
LETL -> LBED and many reworks of the guide
switch180 Mar 29, 2024
1b5a38a
Finishing touches
switch180 Mar 29, 2024
d1d964b
Updating all labs getting started with LBED code
switch180 Mar 29, 2024
2a93b79
Swapping with correct diagram for service-config
switch180 Mar 29, 2024
2013de4
Removing extra C9 instructions
switch180 Mar 29, 2024
bad51ea
IDK
switch180 Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Exercise Overview"
menuTitle: "Exercise Overview"
date: 2024-02-23T00:00:00-00:00
weight: 10
---
In this module, you will create DynamoDB and OpenSearch Service resources, configure integrations, and execute example queries.
All the initial resources required are deployed via [Amazon CloudFormation](https://aws.amazon.com/cloudformation/) template.
There is one CloudFormation template used in this exercise, but it has a dependency on the Cloud9 environment deployed in **LHOL: Hands-on Labs for Amazon DynamoDB step 1. Getting Started**. The Cloudformation template will deploy following resources.

CloudFormation Template Resources:
- DynamoDB Table: DynamoDB table to store product descriptions. Has Point-in-time Recovery (PITR) and DynamoDB Streams enabled.
- Amazon OpenSearch Service Domain: Single-node OpenSearch Service cluster to recieve data from DynamoDB and act as a vector database.

Dependencies from Cloud9 CloudFormation Template:
- S3 Bucket: Used to store the initial export of DynamoDB data for the Zero-ETL Pipeline.
- IAM Role: Used to grant permissions for pipeline integration and queries.
- Cloud9 IDE: Console for executing commands, building integrations, and running sample queries.


![Final Deployment Architecture](/static/images/migration-environment.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: "Configure Environment"
menuTitle: "Configure Environment"
date: 2024-02-23T00:00:00-00:00
weight: 20
---
This chapter will create the environment on AWS as discussed in the Exercise Overview.
The CloudFormation template used below will create the DynamoDB Table and OpenSearch Domain as well as provide several Outputs to make organizing resource names easier.

1. Launch the CloudFormation template in US West 2 to deploy the resources in your account: [![CloudFormation](/static/images/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/new?stackName=ddbzetl&templateURL=:param{key="lhol_ddb_os_zetl_setup_yaml"})
terhunej marked this conversation as resolved.
Show resolved Hide resolved
1. *Optionally, download [the YAML template](:param{key="lhol_ddb_os_zetl_setup_yaml"}) and launch it your own way*
1. Click Next
1. Confirm the Stack Name *ddbzetl* and update parameters if necessary (leave the default options if at all possible)
![Final Deployment Architecture](/static/images/ddb-os-zetl1.jpg)
1. Click “Next” twice then check “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”
1. Click "Submit"
1. The CloudFormation stack will take about 15 minutes to build the environment
![Final Deployment Architecture](/static/images/ddb-os-zetl2.jpg)
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: "Configure OpenSearch Service Permissions"
menuTitle: "Configure OpenSearch Service Permissions"
date: 2024-02-23T00:00:00-00:00
weight: 30
---
The OpenSearch Service Domain deployed by the CloudFormation Template uses Fine-grained access control. Fine-grained access control offers additional ways of controlling access to your data on Amazon OpenSearch Service. In order to configure integrations between OpenSearch Service, DynamoDB, and Bedrock certain OpenSearch Service permissions will need to be mapped to the IAM Role being used.

Links to the OpenSearch Dashboards, credentials, and necissary values are provided in the Outputs of the CloudFormation Template. It is recommended that you leave Outputs open in one browser tab to easily refer to while following through the lab.

In a production environment as a best practice, you would configure roles with the least privilege required. For simplicity in this lab, we will use the "all_access" OpenSearch Service role.

::alert[_Do not continue unless the CloudFormation Template has finished deploying._]

1. Open the "Outputs" tab of your recently deployed Stack in the CloudFormation Console.
![CloudFormation Outputs](/static/images/ddb-os-zetl3.jpg)
1. Open the link for OSDashboardsURL in a new tab.
1. Login to Dashboards with the username and password provided in CloudFormation Outputs. The attributes named "OSMasterUserName" and "OSMasterUserPassword" provide the correct values.
![OpenSearch Service Dashboards](/static/images/ddb-os-zetl4.jpg)
1. Open the top left menu and select "Security".
![Security Settings](/static/images/ddb-os-zet5.jpg)
1. Open the "Roles" tab, then click on the "all_access" role.
![Roles Settings](/static/images/ddb-os-zet6.jpg)
1. Open the "Mapped users" tab, then select "Manage mapping".
![Mapping Settings](/static/images/ddb-os-zet7.jpg)
1. In the "Backend roles" field, enter the Arn provided in the CloudFormation Stack Outputs. The attribute named "Role" provides the correct Arn. Click "Map".
![ Settings](/static/images/ddb-os-zet8.jpg)
1. Verify that the "all_access" Role now has a "Backend role" listed.
![ Settings](/static/images/ddb-os-zet9.jpg)
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: "Enable Amazon Bedrock Models"
menuTitle: "Enable Amazon Bedrock Models"
date: 2024-02-23T00:00:00-00:00
weight: 40
---
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

In this application, Bedrock will be used to make natural language product recommendation queries using OpenSearch Service as a vector database.

Bedrock requires different FMs to be enabled before they are used.

1. Open [Amazon Bedrock Model Access](https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/modelaccess)
1. Click on "Manage model access"
![Manage model access](/static/images/ddb-os-zetl10.jpg)
1. Select "Titan Embeddings G1 - Text" and "Claude", then click "Save changes"
![Manage model access](/static/images/ddb-os-zetl11.jpg)
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Load DynamoDB Data"
menuTitle: "Load DynamoDB Data"
date: 2024-02-23T00:00:00-00:00
weight: 50
---
Next, you'll load example product data into your DynamoDB Table. Pipelines will move this data into OpenSearch Service in later steps.

1. Open the "Outputs" tab of your recently deployed Stack in the CloudFormation Console.
![CloudFormation Outputs](/static/images/ddb-os-zetl2.jpg)
1. Open the link for Cloud9IdeUrl in a new tab.
1. Download the zip containing the zip containing sample data and scripts.
```bash
wget https://s3.amazonaws.com/amazon-dynamodb-labs.com/assets/OpenSearchPipeline.yaml
```
1. Unzip the contents of the zip file.
```bash
unzip OpenSearchPipeline.zip
```
1. Change into the directory.
```bash
cd OpenSearchPipeline
```
1. Load the sample data into your DynamoDB Table.
```bash
aws dynamodb batch-write-item --request-items=file://product_en.json
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
title: "Configure Integrations"
menuTitle: "Load DynamoDB Data"
date: 2024-02-23T00:00:00-00:00
weight: 60
---
Next, you'll configure ML and Pipeline connectors in OpenSearch Service. These configurations are set up by a series of POST and PUT requests that are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. While in most cases an SDK abstracts away sig-v4 but in this case we will be building the requests ourselves with curl.

Building a sig-v4 signed request requires a session token, access key, and secret access key. You'll first retrieve these from your Cloud9 Instance metadata with the provided "credentials.sh" script which exports required values to environmental variables. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into listed commands.

1. Run the credentials.sh script to retrieve and export credentials. These credentials will be used to sign API requests to the OpenSearch cluster. Note the leading "." before "./credentials.sh", this must be included to ensure that the exported credentials are available in the currently running shell.
```bash
. ./credentials.sh
```
1. Next, export an environmental variable with the OpenSearch endpoint URL. This URL is listed in the CloudFormation Stack Outputs tab as "OSDomainEndpoint". This variable will be used in subsequent commands.
```bash
export OPENSEARCH_ENDPOINT="https://search-ddb-os-xxxx-xxxxxxxxxxxxx.us-west-2.es.amazonaws.com"
```
1. Execute the following curl command to register the OpenSearch ML model
```bash
curl --request POST \
${OPENSEARCH_ENDPOINT}'/_plugins/_ml/connectors/_create' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"name": "Amazon Bedrock Connector: embedding",
"description": "The connector to bedrock Titan embedding model",
"version": 1,
"protocol": "aws_sigv4",
"parameters": {
"region": "'${METADATA_AWS_REGION}'",
"service_name": "bedrock"
},
"credential": {
"roleArn": "'${METADATA_AWS_ROLE}'"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.'${METADATA_AWS_REGION}'.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
"pre_process_function": "\n StringBuilder builder = new StringBuilder();\n builder.append(\"\\\"\");\n String first = params.text_docs[0];\n builder.append(first);\n builder.append(\"\\\"\");\n def parameters = \"{\" +\"\\\"inputText\\\":\" + builder + \"}\";\n return \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
"post_process_function": "\n def name = \"sentence_embedding\";\n def dataType = \"FLOAT32\";\n if (params.embedding == null || params.embedding.length == 0) {\n return params.message;\n }\n def shape = [params.embedding.length];\n def json = \"{\" +\n \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n \"\\\"shape\\\":\" + shape + \",\" +\n \"\\\"data\\\":\" + params.embedding +\n \"}\";\n return json;\n "
}
]
}'
```
1. Note the "connector_id" returned in the previous command. Export it to an environmental variable for convenient substitution in future commands.
```bash
export CONNECTOR_ID='xxxxxxxxxxxxxx'
```
1. Run the next curl command to register the model group.
```bash
curl --request POST \
${OPENSEARCH_ENDPOINT}'/_plugins/_ml/model_groups/_register' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"name": "remote_model_group",
"description": "This is an example description"
}'
```
1. Note the "model_group_id" returned in the previous command. Export it to an environmental variable for later substitution.
```bash
export MODEL_GROUP_ID='xxxxxxxxxxxxx'
```
1. The next curl command registers the model.
```bash
curl --request POST \
${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/_register' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"name": "Bedrock embedding model",
"function_name": "remote",
"model_group_id": "'${MODEL_GROUP_ID}'",
"description": "embedding model",
"connector_id": "'${CONNECTOR_ID}'"
}'
```
1. Note the "model_id" and export it.
```bash
export MODEL_ID='xxxxxxxxxxxxx'
```
1. Note the "model_id" and export it.
```bash
echo -e "CONNECTOR_ID=${CONNECTOR_ID}\nMODEL_GROUP_ID=${MODEL_GROUP_ID}\nMODEL_ID=${MODEL_ID}"
```
1. Next, we'll deploy the model with the following curl.
```bash
curl --request POST \
${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_deploy' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}"
```
1. Now we can test the model. If you recieve results back with a "200" status code, everything is working properly.
```bash
curl --request POST \
${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_predict' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"parameters": {
"inputText": "What is the meaning of life?"
}
}'
```
1. Next, we'll create the Details table mapping pipeline.
```bash
curl --request PUT \
${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-en-nlp-ingest-pipeline' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"description": "A text embedding pipeline",
"processors": [
{
"script": {
"source": "def combined_field = \"ProductID: \" + ctx.ProductID + \", Description: \" + ctx.Description + \", ProductName: \" + ctx.ProductName + \", Category: \" + ctx.Category; ctx.combined_field = combined_field;"
}
},
{
"text_embedding": {
"model_id": "'${MODEL_ID}'",
"field_map": {
"combined_field": "product_embedding"
}
}
}
]
}'
```
1. Followed by the Reviews table mapping pipeline.
```bash
curl --request PUT \
${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-reviews-nlp-ingest-pipeline' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
--aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
--user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
--data-raw '{
"description": "A text embedding pipeline",
"processors": [
{
"script": {
"source": "def combined_field = \"ProductID: \" + ctx.ProductID + \", ProductName: \" + ctx.ProductName + \", Comment: \" + ctx.Comment + \", Timestamp: \" + ctx.Timestamp; ctx.combined_field = combined_field;"
}
},
{
"text_embedding": {
"model_id": "m6jIgowBXLzE-9O0CcNs",
"field_map": {
"combined_field": "product_reviews_embedding"
}
}
}
]
}'
```
Loading
Loading