aws-samples · switch180 · Mar 29, 2024 · Feb 25, 2024 · Feb 25, 2024 · Feb 25, 2024
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter01.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter01.en.md
@@ -0,0 +1,21 @@
+---
+title: "Exercise Overview"
+menuTitle: "Exercise Overview"
+date: 2024-02-23T00:00:00-00:00
+weight: 10
+---
+In this module, you will create DynamoDB and OpenSearch Service resources, configure integrations, and execute example queries.
+All the initial resources required are deployed via [Amazon CloudFormation](https://aws.amazon.com/cloudformation/) template.
+There is one CloudFormation template used in this exercise, but it has a dependency on the Cloud9 environment deployed in **LHOL: Hands-on Labs for Amazon DynamoDB step 1. Getting Started**. The Cloudformation template will deploy following resources.
+
+CloudFormation Template Resources:
+  - DynamoDB Table: DynamoDB table to store product descriptions. Has Point-in-time Recovery (PITR) and DynamoDB Streams enabled.
+  - Amazon OpenSearch Service Domain: Single-node OpenSearch Service cluster to recieve data from DynamoDB and act as a vector database.
+
+Dependencies from Cloud9 CloudFormation Template:
+  - S3 Bucket: Used to store the initial export of DynamoDB data for the Zero-ETL Pipeline.
+  - IAM Role: Used to grant permissions for pipeline integration and queries.
+  - Cloud9 IDE: Console for executing commands, building integrations, and running sample queries.
+
+
+![Final Deployment Architecture](/static/images/migration-environment.png)
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter02.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter02.en.md
@@ -0,0 +1,18 @@
+---
+title: "Configure Environment"
+menuTitle: "Configure Environment"
+date: 2024-02-23T00:00:00-00:00
+weight: 20
+---
+This chapter will create the environment on AWS as discussed in the Exercise Overview.
+The CloudFormation template used below will create the DynamoDB Table and OpenSearch Domain as well as provide several Outputs to make organizing resource names easier.
+
+ 1. Launch the CloudFormation template in US West 2 to deploy the resources in your account: [![CloudFormation](/static/images/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/new?stackName=ddbzetl&templateURL=:param{key="lhol_ddb_os_zetl_setup_yaml"})  
+    1. *Optionally, download [the YAML template](:param{key="lhol_ddb_os_zetl_setup_yaml"}) and launch it your own way*
+ 1. Click Next
+ 1. Confirm the Stack Name *ddbzetl* and update parameters if necessary (leave the default options if at all possible)
+   ![Final Deployment Architecture](/static/images/ddb-os-zetl1.jpg)
+ 1. Click “Next” twice then check “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”
+ 1. Click "Submit"
+ 1. The CloudFormation stack will take about 15 minutes to build the environment
+  ![Final Deployment Architecture](/static/images/ddb-os-zetl2.jpg)
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter03.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter03.en.md
@@ -0,0 +1,29 @@
+---
+title: "Configure OpenSearch Service Permissions"
+menuTitle: "Configure OpenSearch Service Permissions"
+date: 2024-02-23T00:00:00-00:00
+weight: 30
+---
+The OpenSearch Service Domain deployed by the CloudFormation Template uses Fine-grained access control. Fine-grained access control offers additional ways of controlling access to your data on Amazon OpenSearch Service. In order to configure integrations between OpenSearch Service, DynamoDB, and Bedrock certain OpenSearch Service permissions will need to be mapped to the IAM Role being used.
+
+Links to the OpenSearch Dashboards, credentials, and necissary values are provided in the Outputs of the CloudFormation Template. It is recommended that you leave Outputs open in one browser tab to easily refer to while following through the lab.
+
+In a production environment as a best practice, you would configure roles with the least privilege required. For simplicity in this lab, we will use the "all_access" OpenSearch Service role.
+
+::alert[_Do not continue unless the CloudFormation Template has finished deploying._]
+
+ 1. Open the "Outputs" tab of your recently deployed Stack in the CloudFormation Console.
+   ![CloudFormation Outputs](/static/images/ddb-os-zetl3.jpg)
+ 1. Open the link for OSDashboardsURL in a new tab.
+ 1. Login to Dashboards with the username and password provided in CloudFormation Outputs. The attributes named "OSMasterUserName" and "OSMasterUserPassword" provide the correct values.
+  ![OpenSearch Service Dashboards](/static/images/ddb-os-zetl4.jpg)
+ 1. Open the top left menu and select "Security".
+  ![Security Settings](/static/images/ddb-os-zet5.jpg) 
+ 1. Open the "Roles" tab, then click on the "all_access" role.
+  ![Roles Settings](/static/images/ddb-os-zet6.jpg) 
+ 1. Open the "Mapped users" tab, then select "Manage mapping".
+  ![Mapping Settings](/static/images/ddb-os-zet7.jpg)
+ 1. In the "Backend roles" field, enter the Arn provided in the CloudFormation Stack Outputs. The attribute named "Role" provides the correct Arn. Click "Map".
+  ![ Settings](/static/images/ddb-os-zet8.jpg)
+ 1. Verify that the "all_access" Role now has a "Backend role" listed.
+  ![ Settings](/static/images/ddb-os-zet9.jpg)
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter04.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter04.en.md
@@ -0,0 +1,17 @@
+---
+title: "Enable Amazon Bedrock Models"
+menuTitle: "Enable Amazon Bedrock Models"
+date: 2024-02-23T00:00:00-00:00
+weight: 40
+---
+Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
+
+In this application, Bedrock will be used to make natural language product recommendation queries using OpenSearch Service as a vector database.
+
+Bedrock requires different FMs to be enabled before they are used.
+
+ 1. Open [Amazon Bedrock Model Access](https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/modelaccess)
+ 1. Click on "Manage model access"
+  ![Manage model access](/static/images/ddb-os-zetl10.jpg)
+ 1. Select "Titan Embeddings G1 - Text" and "Claude", then click "Save changes"
+  ![Manage model access](/static/images/ddb-os-zetl11.jpg)
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter05.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter05.en.md
@@ -0,0 +1,27 @@
+---
+title: "Load DynamoDB Data"
+menuTitle: "Load DynamoDB Data"
+date: 2024-02-23T00:00:00-00:00
+weight: 50
+---
+Next, you'll load example product data into your DynamoDB Table. Pipelines will move this data into OpenSearch Service in later steps.
+
+ 1. Open the "Outputs" tab of your recently deployed Stack in the CloudFormation Console.
+   ![CloudFormation Outputs](/static/images/ddb-os-zetl2.jpg)
+ 1. Open the link for Cloud9IdeUrl in a new tab.
+ 1. Download the zip containing the zip containing sample data and scripts.
+    ```bash
+      wget https://s3.amazonaws.com/amazon-dynamodb-labs.com/assets/OpenSearchPipeline.yaml
+    ```
+ 1. Unzip the contents of the zip file.
+    ```bash
+      unzip OpenSearchPipeline.zip 
+    ```
+ 1. Change into the directory.
+    ```bash
+      cd OpenSearchPipeline
+    ```
+ 1. Load the sample data into your DynamoDB Table.
+    ```bash
+      aws dynamodb batch-write-item --request-items=file://product_en.json
+    ```
diff --git a/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter06.en.md b/content/hands-on-labs/dynamodb-opensearch-zetl/ddb-os-zetl-chapter06.en.md
@@ -0,0 +1,183 @@
+---
+title: "Configure Integrations"
+menuTitle: "Load DynamoDB Data"
+date: 2024-02-23T00:00:00-00:00
+weight: 60
+---
+Next, you'll configure ML and Pipeline connectors in OpenSearch Service. These configurations are set up by a series of POST and PUT requests that are authenticated with AWS Signature Version 4 (sig-v4). Sigv4 is the standard authentication mechanism used by AWS services. While in most cases an SDK abstracts away sig-v4 but in this case we will be building the requests ourselves with curl.
+
+Building a sig-v4 signed request requires a session token, access key, and secret access key. You'll first retrieve these from your Cloud9 Instance metadata with the provided "credentials.sh" script which exports required values to environmental variables. In the following steps, you'll also export other values to environmental variables to allow for easy substitution into listed commands.
+
+ 1. Run the credentials.sh script to retrieve and export credentials. These credentials will be used to sign API requests to the OpenSearch cluster. Note the leading "." before "./credentials.sh", this must be included to ensure that the exported credentials are available in the currently running shell.
+    ```bash
+      . ./credentials.sh 
+    ```
+ 1. Next, export an environmental variable with the OpenSearch endpoint URL. This URL is listed in the CloudFormation Stack Outputs tab as "OSDomainEndpoint". This variable will be used in subsequent commands.
+    ```bash
+      export OPENSEARCH_ENDPOINT="https://search-ddb-os-xxxx-xxxxxxxxxxxxx.us-west-2.es.amazonaws.com"
+    ```
+ 1. Execute the following curl command to register the OpenSearch ML model
+    ```bash
+      curl --request POST \
+        ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/connectors/_create' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+        "name": "Amazon Bedrock Connector: embedding",
+        "description": "The connector to bedrock Titan embedding model",
+        "version": 1,
+        "protocol": "aws_sigv4",
+        "parameters": {
+          "region": "'${METADATA_AWS_REGION}'",
+          "service_name": "bedrock"
+        },
+        "credential": {
+          "roleArn": "'${METADATA_AWS_ROLE}'"
+        },
+        "actions": [
+          {
+            "action_type": "predict",
+            "method": "POST",
+            "url": "https://bedrock-runtime.'${METADATA_AWS_REGION}'.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
+            "headers": {
+              "content-type": "application/json",
+              "x-amz-content-sha256": "required"
+            },
+            "request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
+            "pre_process_function": "\n    StringBuilder builder = new StringBuilder();\n    builder.append(\"\\\"\");\n    String first = params.text_docs[0];\n    builder.append(first);\n    builder.append(\"\\\"\");\n    def parameters = \"{\" +\"\\\"inputText\\\":\" + builder + \"}\";\n    return  \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
+            "post_process_function": "\n      def name = \"sentence_embedding\";\n      def dataType = \"FLOAT32\";\n      if (params.embedding == null || params.embedding.length == 0) {\n        return params.message;\n      }\n      def shape = [params.embedding.length];\n      def json = \"{\" +\n                 \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n                 \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n                 \"\\\"shape\\\":\" + shape + \",\" +\n                 \"\\\"data\\\":\" + params.embedding +\n                 \"}\";\n      return json;\n    "
+         }
+        ]
+      }'
+    ```
+ 1. Note the "connector_id" returned in the previous command. Export it to an environmental variable for convenient substitution in future commands.
+    ```bash
+      export CONNECTOR_ID='xxxxxxxxxxxxxx'
+    ```
+ 1. Run the next curl command to register the model group.
+    ```bash
+      curl --request POST \
+        ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/model_groups/_register' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+          "name": "remote_model_group",
+          "description": "This is an example description"
+      }'
+    ```
+ 1. Note the "model_group_id" returned in the previous command. Export it to an environmental variable for later substitution.
+    ```bash
+      export MODEL_GROUP_ID='xxxxxxxxxxxxx'
+    ```
+ 1. The next curl command registers the model.
+    ```bash
+      curl --request POST \
+        ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/_register' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+        "name": "Bedrock embedding model",
+        "function_name": "remote",
+        "model_group_id": "'${MODEL_GROUP_ID}'",
+        "description": "embedding model",
+        "connector_id": "'${CONNECTOR_ID}'"
+      }'
+    ```
+ 1. Note the "model_id" and export it.
+    ```bash
+      export MODEL_ID='xxxxxxxxxxxxx'
+    ```
+ 1. Note the "model_id" and export it.
+    ```bash
+      echo -e "CONNECTOR_ID=${CONNECTOR_ID}\nMODEL_GROUP_ID=${MODEL_GROUP_ID}\nMODEL_ID=${MODEL_ID}"
+    ```
+ 1. Next, we'll deploy the model with the following curl.
+    ```bash
+      curl --request POST \
+        ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_deploy' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}"
+    ```
+ 1. Now we can test the model. If you recieve results back with a "200" status code, everything is working properly.
+    ```bash
+      curl --request POST \
+        ${OPENSEARCH_ENDPOINT}'/_plugins/_ml/models/'${MODEL_ID}'/_predict' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+        "parameters": {
+          "inputText": "What is the meaning of life?"
+        }
+      }'
+    ```
+ 1. Next, we'll create the Details table mapping pipeline.
+    ```bash
+      curl --request PUT \
+        ${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-en-nlp-ingest-pipeline' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+        "description": "A text embedding pipeline",
+        "processors": [
+          {
+            "script": {
+              "source": "def combined_field = \"ProductID: \" + ctx.ProductID + \", Description: \" + ctx.Description + \", ProductName: \" + ctx.ProductName + \", Category: \" + ctx.Category; ctx.combined_field = combined_field;"
+            }
+          },
+          {
+            "text_embedding": {
+              "model_id": "'${MODEL_ID}'",
+              "field_map": {
+                "combined_field": "product_embedding"
+              }
+            }
+          }
+        ]
+      }'
+    ```
+ 1. Followed by the Reviews table mapping pipeline.
+    ```bash
+      curl --request PUT \
+        ${OPENSEARCH_ENDPOINT}'/_ingest/pipeline/product-reviews-nlp-ingest-pipeline' \
+        --header 'Content-Type: application/json' \
+        --header 'Accept: application/json' \
+        --header "x-amz-security-token: ${METADATA_AWS_SESSION_TOKEN}" \
+        --aws-sigv4 aws:amz:${METADATA_AWS_REGION}:es \
+        --user "${METADATA_AWS_ACCESS_KEY_ID}:${METADATA_AWS_SECRET_ACCESS_KEY}" \
+        --data-raw '{
+        "description": "A text embedding pipeline",
+        "processors": [
+          {
+            "script": {
+              "source": "def combined_field = \"ProductID: \" + ctx.ProductID + \", ProductName: \" + ctx.ProductName + \", Comment: \" + ctx.Comment + \", Timestamp: \" + ctx.Timestamp; ctx.combined_field = combined_field;"
+            }
+          },
+          {
+            "text_embedding": {
+              "model_id": "m6jIgowBXLzE-9O0CcNs",
+              "field_map": {
+                "combined_field": "product_reviews_embedding"
+              }
+            }
+          }
+        ]
+      }'
+    ```