diff --git a/sdk/python/generative-ai/rag/code_first/README.md b/sdk/python/generative-ai/rag/code_first/README.md
new file mode 100644
index 00000000000..a291b96d56f
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/README.md
@@ -0,0 +1,87 @@
+# AzureML MLIndex Asset creation
+
+MLIndex assets in AzureML represent a model used to generate embeddings from text and an index which can be searched using embedding vectors.
+Read more about their structure [here](./docs/mlindex.md).
+
+## Pre-requisites
+
+0. Install `azure-ai-ml` and `azureml-rag`:
+    - `pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/`
+    - `pip install -U 'azureml-rag[document_parsing,faiss,cognitive_search]>=0.2.0'`
+1. You have unstructured data.
+    - In one of [AzureMLs supported data sources](https://learn.microsoft.com/azure/machine-learning/concept-data?view=azureml-api-2): Blob, ADLSgen2, OneLake, S3, Git
+    - In any of these supported file formats: md, txt, py, pdf, ppt(x), doc(x)
+2. You have an embedding model.
+    - [Create an Azure OpenAI service + connection](https://learn.microsoft.com/azure/machine-learning/prompt-flow/concept-connections?view=azureml-api-2)
+    - Use a HuggingFace `sentence-transformer` model (you can just use it now, to leverage the MLIndex in PromptFlow a [Custom Runtime](https://promptflow.azurewebsites.net/how-to-guides/how-to-customize-environment-runtime.html) will be required)
+3. You have an Index to ingest data to.
+    - [Create an Azure Cognitive Search service + connection](https://learn.microsoft.com/azure/machine-learning/prompt-flow/concept-connections?view=azureml-api-2)
+    - Use a Faiss index (you can just use it now)
+
+## Let's Ingest and Index
+
+A DataIndex job is configured using the `azure-ai-ml` python sdk/cli, either directly in code or with a yaml file.
+
+### SDK
+
+The examples are runnable as Python scripts, assuming the pre-requisites have been acquired and configured in the script.  
+Opening them in vscode enables executing each block below a `# %%` comment like a jupyter notebook cell.
+
+#### Cloud Creation
+
+##### Process this documentation using Azure OpenAI and Azure Cognitive Search
+
+- [local_docs_to_acs_mlindex.py](./data_index_job/local_docs_to_acs_mlindex.py)
+
+##### Index data from S3 using OneLake
+
+- [s3_to_acs_mlindex.py](./data_index_job/s3_to_acs_mlindex.py)
+- [scheduled_s3_to_asc_mlindex.py](./data_index_job/scheduled_s3_to_asc_mlindex.py)
+
+##### Ingest Azure Search docs from GitHub into a Faiss Index
+
+- [cog_search_docs_faiss_mlindex.py](./data_index_job/cog_search_docs_faiss_mlindex.py)
+
+#### Local Creation
+
+##### Process this documentation using Azure OpenAI and Azure Cognitive Search
+
+- [local_docs_to_acs_aoai_mlindex.py](./mlindex_local/local_docs_to_acs_aoai_mlindex.py)
+
+##### Process this documentation using SentenceTransformers and Faiss
+
+- [local_docs_to_faiss_mlindex.py](./mlindex_local/local_docs_to_faiss_mlindex.py)
+- [local_docs_to_faiss_mlindex_with_promptflow.py](./mlindex_local/local_docs_to_faiss_mlindex_with_promptflow.py)
+    - Learn more about [Promptflow here](https://microsoft.github.io/promptflow/)
+
+##### Use a Langchain Documents to create an Index
+
+- [langchain_docs_to_mlindex.py](./mlindex_local/langchain_docs_to_mlindex.py)
+
+## Using the MLIndex asset
+
+More information about how to use MLIndex in various places [here]().
+
+## Appendix
+
+### Which Embeddings Model to use?
+
+There are currently two supported Embedding options: OpenAI's `text-embedding-ada-002` embedding model or HuggingFace embedding models. Here are some factors that might influence your decision:
+
+#### OpenAI
+
+OpenAI has [great documentation](https://platform.openai.com/docs/guides/embeddings) on their Embeddings model `text-embedding-ada-002`, it can handle up to 8191 tokens and can be accessed using [Azure OpenAI](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#embeddings-models) or OpenAI directly.
+If you have an existing Azure OpenAI Instance you can connect it to AzureML, if you don't AzureML provisions a default one for you called `Default_AzureOpenAI`.
+The main limitation when using `text-embedding-ada-002` is cost/quota available for the model. Otherwise it provides high quality embeddings across a wide array of text domains while being simple to use.
+
+#### HuggingFace
+
+HuggingFace hosts many different models capable of embedding text into single-dimensional vectors. The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) ranks the performance of embeddings models on a few axis, not all models ranked can be run locally (e.g. `text-embedding-ada-002` is on the list), though many can and there is a range of larger and smaller models. When embedding with HuggingFace the model is loaded locally for inference, this will potentially impact your choice of compute resources.
+
+**NOTE:** The default PromptFlow Runtime does not come with HuggingFace model dependencies installed, Indexes created using HuggingFace embeddings will not work in PromptFlow by default. **Pick OpenAI if you want to use PromptFlow**
+
+### Setting up OneLake and S3
+
+[Create a lakehouse with OneLake](https://learn.microsoft.com/fabric/onelake/create-lakehouse-onelake)
+
+[Setup a shortcut to S3](https://learn.microsoft.com/fabric/onelake/create-s3-shortcut)
diff --git a/sdk/python/generative-ai/rag/code_first/data_index_job/cog_search_docs_faiss_mlindex.py b/sdk/python/generative-ai/rag/code_first/data_index_job/cog_search_docs_faiss_mlindex.py
new file mode 100644
index 00000000000..a6bae38fde9
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/data_index_job/cog_search_docs_faiss_mlindex.py
@@ -0,0 +1,173 @@
+# %%[markdown]
+# # Local Documents to Azure Cognitive Search Index
+
+# %% Prerequisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[faiss]>=0.2.0'
+# %pip install 'promptflow[azure]' promptflow-tools promptflow-vectordb
+
+# %% Authenticate to you AzureML Workspace, download a `config.json` from the top right hand corner menu of the Workspace.
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(
+    credential=DefaultAzureCredential(), path="config.json"
+)
+
+# %% Create DataIndex configuration
+from azureml.rag.dataindex.entities import (
+    Data,
+    DataIndex,
+    IndexSource,
+    CitationRegex,
+    Embedding,
+    IndexStore,
+)
+
+asset_name = "azure_search_docs_aoai_faiss"
+
+data_index = DataIndex(
+    name=asset_name,
+    description="Azure Cognitive Search docs embedded with text-embedding-ada-002 and indexed in a Faiss Index.",
+    source=IndexSource(
+        input_data=Data(
+            type="uri_folder",
+            path="<This will be replaced later>",
+        ),
+        input_glob="articles/search/**/*",
+        citation_url="https://learn.microsoft.com/en-us/azure",
+        # Remove articles from the final citation url and remove the file extension so url points to hosted docs, not GitHub.
+        citation_url_replacement_regex=CitationRegex(
+            match_pattern="(.*)/articles/(.*)(\\.[^.]+)$", replacement_pattern="\\1/\\2"
+        ),
+    ),
+    embedding=Embedding(
+        model="text-embedding-ada-002",
+        connection="azureml-rag-oai",
+        cache_path=f"azureml://datastores/workspaceblobstore/paths/embeddings_cache/{asset_name}",
+    ),
+    index=IndexStore(type="faiss"),
+    # name is replaced with a unique value each time the job is run
+    path=f"azureml://datastores/workspaceblobstore/paths/indexes/{asset_name}/{{name}}",
+)
+
+# %% Use git_clone Component to clone Azure Search docs from github
+ml_registry = MLClient(credential=ml_client._credential, registry_name="azureml")
+
+git_clone_component = ml_registry.components.get("llm_rag_git_clone", label="latest")
+
+# %% Clone Git Repo and use as input to index_job
+from azure.ai.ml.dsl import pipeline
+from azureml.rag.dataindex.data_index import index_data
+
+
+@pipeline(default_compute="serverless")
+def git_to_faiss(
+    git_url,
+    branch_name="",
+    git_connection_id="",
+):
+    git_clone = git_clone_component(git_repository=git_url, branch_name=branch_name)
+    git_clone.environment_variables[
+        "AZUREML_WORKSPACE_CONNECTION_ID_GIT"
+    ] = git_connection_id
+
+    index_job = index_data(
+        description=data_index.description,
+        data_index=data_index,
+        input_data_override=git_clone.outputs.output_data,
+        ml_client=ml_client,
+    )
+
+    return index_job.outputs
+
+
+# %%
+git_index_job = git_to_faiss("https://github.com/MicrosoftDocs/azure-docs.git")
+# Ensure repo cloned each run to get latest, comment out to have first clone reused.
+git_index_job.settings.force_rerun = True
+
+# %% Submit the DataIndex Job
+git_index_run = ml_client.jobs.create_or_update(
+    git_index_job,
+    experiment_name=asset_name,
+)
+git_index_run
+
+# %% Wait for it to finish
+ml_client.jobs.stream(git_index_run.name)
+
+# %% Check the created asset, it is a folder on storage containing an MLIndex yaml file
+mlindex_docs_index_asset = ml_client.data.get(asset_name, label="latest")
+mlindex_docs_index_asset
+
+# %% Try it out with langchain by loading the MLIndex asset using the azureml-rag SDK
+from azureml.rag.mlindex import MLIndex
+
+mlindex = MLIndex(mlindex_docs_index_asset)
+
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("How can I enable Semantic Search on my Index?", k=5)
+docs
+
+# %% Take a look at those chunked docs
+import json
+
+for doc in docs:
+    print(json.dumps({"content": doc.page_content, **doc.metadata}, indent=2))
+
+# %% Try it out with Promptflow
+
+import promptflow
+
+pf = promptflow.PFClient()
+
+# %% List all the available connections
+for c in pf.connections.list():
+    print(c.name + " (" + c.type + ")")
+
+# %% Load index qna flow
+from pathlib import Path
+
+flow_path = Path.cwd().parent / "flows" / "bring_your_own_data_chat_qna"
+mlindex_path = mlindex_docs_index_asset.path
+
+# %% Put MLIndex uri into Vector DB Lookup tool inputs in [bring_your_own_data_chat_qna/flow.dag.yaml](../flows/bring_your_own_data_chat_qna/flow.dag.yaml)
+import re
+
+with open(flow_path / "flow.dag.yaml", "r") as f:
+    flow_yaml = f.read()
+    flow_yaml = re.sub(
+        r"path: (.*)# Index uri", f"path: {mlindex_path} # Index uri", flow_yaml, re.M
+    )
+with open(flow_path / "flow.dag.yaml", "w") as f:
+    f.write(flow_yaml)
+
+# %% Run qna flow
+output = pf.flows.test(
+    flow_path,
+    inputs={
+        "chat_history": [],
+        "chat_input": "How recently was Vector Search support added to Azure Cognitive Search?",
+    },
+)
+
+chat_output = output["chat_output"]
+for part in chat_output:
+    print(part, end="")
+
+# %% Run qna flow with multiple inputs
+data_path = Path.cwd().parent / "flows" / "data" / "azure_search_docs_questions.jsonl"
+
+column_mapping = {
+    "chat_history": "${data.chat_history}",
+    "chat_input": "${data.chat_input}",
+    "chat_output": "${data.chat_output}",
+}
+run = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
+pf.stream(run)
+
+print(f"{run}")
+
+
+# %%
diff --git a/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.py b/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.py
new file mode 100644
index 00000000000..8dd05f64d15
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.py
@@ -0,0 +1,45 @@
+# %%[markdown]
+# # Local Documents to Azure Cognitive Search Index
+
+# %% Prerequisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[cognitive_search]>=0.2.0'
+
+# %% Authenticate to you AzureML Workspace, download a `config.json` from the top right hand corner menu of the Workspace.
+from azure.ai.ml import MLClient, load_data
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(
+    credential=DefaultAzureCredential(), path="config.json"
+)
+
+# %% Load DataIndex configuration from file
+data_index = load_data("local_docs_to_acs_mlindex.yaml")
+print(data_index)
+
+# %% Submit the DataIndex Job
+index_job = ml_client.data.index_data(data_index=data_index)
+
+# %% Wait for it to finish
+ml_client.jobs.stream(index_job.name)
+
+# %% Check the created asset, it is a folder on storage containing an MLIndex yaml file
+mlindex_docs_index_asset = ml_client.data.get(data_index.name, label="latest")
+mlindex_docs_index_asset
+
+# %% Try it out with langchain by loading the MLIndex asset using the azureml-rag SDK
+from azureml.rag.mlindex import MLIndex
+
+mlindex = MLIndex(mlindex_docs_index_asset)
+
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("What is an MLIndex?", k=5)
+docs
+
+# %% Take a look at those chunked docs
+import json
+
+for doc in docs:
+    print(json.dumps({"content": doc.page_content, **doc.metadata}, indent=2))
+
+# %% Try it out with Promptflow
diff --git a/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.yaml b/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.yaml
new file mode 100644
index 00000000000..36ca2b81dbd
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/data_index_job/local_docs_to_acs_mlindex.yaml
@@ -0,0 +1,23 @@
+$schema: http://azureml/sdk-2-0/DataIndex.json
+type: uri_folder
+name: mlindex_docs_aoai_acs
+description: Python embedded with text-embedding-ada-002 and indexed in Azure Cognitive Search.
+
+source:
+    input_data:
+      type: uri_folder
+      path: ../
+    chunk_size: 200
+    citation_url: 'https://github.com/Azure/azureml-examples/tree/main/sdk/python/generative-ai/rag/refresh'
+
+embedding:
+    model: azure_open_ai://deployment/text-embedding-ada-002/model/text-embedding-ada-002
+    connection: azureml-rag-oai
+    cache_path: azureml://datastores/workspaceblobstore/paths/embeddings_cache/mlindex_docs_aoai_acs
+
+index:
+    type: acs
+    connection: azureml:azureml-rag-acs
+    name: mlindex_docs_aoai
+
+path: azureml://datastores/workspaceblobstore/paths/indexes/mlindex_docs_aoai_acs/{name}
diff --git a/sdk/python/generative-ai/rag/code_first/data_index_job/s3_to_acs_mlindex.py b/sdk/python/generative-ai/rag/code_first/data_index_job/s3_to_acs_mlindex.py
new file mode 100644
index 00000000000..b9e253e5b34
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/data_index_job/s3_to_acs_mlindex.py
@@ -0,0 +1,81 @@
+# %%[markdown]
+# # S3 via OneLake to Azure Cognitive Search Index
+
+# %% Prerequisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[cognitive_search]>=0.2.0'
+
+# %% Authenticate to an AzureML Workspace, you can download a `config.json` from the top-right-hand corner menu of a Workspace.
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(
+    credential=DefaultAzureCredential(), path="config.json"
+)
+
+# %% Create DataIndex configuration
+from azureml.rag.dataindex.entities import (
+    Data,
+    DataIndex,
+    IndexSource,
+    Embedding,
+    IndexStore,
+)
+
+asset_name = "s3_aoai_acs"
+
+data_index = DataIndex(
+    name=asset_name,
+    description="S3 data embedded with text-embedding-ada-002 and indexed in Azure Cognitive Search.",
+    source=IndexSource(
+        input_data=Data(
+            type="uri_folder",
+            path="abfss://9aa7b19e-c117-4a74-8654-cf1559ba9f4f@msit-onelake.dfs.fabric.microsoft.com/1606ee55-ec68-4658-8d6b-58bf8dd26636/Files/lupickup-test-s3",
+        ),
+        citation_url="s3://lupickup-test",
+    ),
+    embedding=Embedding(
+        model="text-embedding-ada-002",
+        connection="azureml-rag-oai",
+        cache_path=f"azureml://datastores/workspaceblobstore/paths/embeddings_cache/{asset_name}",
+    ),
+    index=IndexStore(
+        type="acs",
+        connection="azureml-rag-acs",
+    ),
+    # name is replaced with a unique value each time the job is run
+    path=f"azureml://datastores/workspaceblobstore/paths/indexes/{asset_name}/{{name}}",
+)
+
+# %% Create the DataIndex Job to be scheduled
+from azure.ai.ml import UserIdentityConfiguration
+
+index_job = ml_client.data.index_data(
+    data_index=data_index,
+    # The DataIndex Job will use the identity of the MLClient within the DataIndex Job to access source data.
+    identity=UserIdentityConfiguration(),
+)
+
+# %% Wait for it to finish
+ml_client.jobs.stream(index_job.name)
+
+# %% Check the created asset, it is a folder on storage containing an MLIndex yaml file
+mlindex_docs_index_asset = ml_client.data.get(data_index.name, label="latest")
+mlindex_docs_index_asset
+
+# %% Try it out with langchain by loading the MLIndex asset using the azureml-rag SDK
+from azureml.rag.mlindex import MLIndex
+
+mlindex = MLIndex(mlindex_docs_index_asset)
+
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("What is RAG?", k=5)
+docs
+
+# %% Take a look at those chunked docs
+import json
+
+for doc in docs:
+    print(json.dumps({"content": doc.page_content, **doc.metadata}, indent=2))
+
+# %%
diff --git a/sdk/python/generative-ai/rag/code_first/data_index_job/scheduled_s3_to_asc_mlindex.py b/sdk/python/generative-ai/rag/code_first/data_index_job/scheduled_s3_to_asc_mlindex.py
new file mode 100644
index 00000000000..f30bfa1cb33
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/data_index_job/scheduled_s3_to_asc_mlindex.py
@@ -0,0 +1,112 @@
+# %%[markdown]
+# # S3 via OneLake to Azure Cognitive Search Index
+
+# %% Prerequisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[cognitive_search]>=0.2.0'
+
+# %% Authenticate to an AzureML Workspace, you can download a `config.json` from the top-right-hand corner menu of a Workspace.
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(
+    credential=DefaultAzureCredential(), path="config.json"
+)
+
+# %% Create DataIndex configuration
+from azureml.rag.dataindex.entities import (
+    Data,
+    DataIndex,
+    IndexSource,
+    Embedding,
+    IndexStore,
+)
+
+asset_name = "s3_aoai_acs"
+
+data_index = DataIndex(
+    name=asset_name,
+    description="S3 data embedded with text-embedding-ada-002 and indexed in Azure Cognitive Search.",
+    source=IndexSource(
+        input_data=Data(
+            type="uri_folder",
+            path="<your path to onelake>",
+        ),
+        citation_url="s3://lupickup-test",
+    ),
+    embedding=Embedding(
+        model="text-embedding-ada-002",
+        connection="azureml-rag-oai",
+        cache_path=f"azureml://datastores/workspaceblobstore/paths/embeddings_cache/{asset_name}",
+    ),
+    index=IndexStore(
+        type="acs",
+        connection="azureml-rag-acs",
+    ),
+    # name is replaced with a unique value each time the job is run
+    path=f"azureml://datastores/workspaceblobstore/paths/indexes/{asset_name}/{{name}}",
+)
+
+# %% Create the DataIndex Job to be scheduled
+from azure.ai.ml import UserIdentityConfiguration
+
+index_job = ml_client.data.index_data(
+    data_index=data_index,
+    # The DataIndex Job will use the identity of the MLClient within the DataIndex Job to access source data.
+    identity=UserIdentityConfiguration(),
+    # Instead of submitting the Job and returning the Run a PipelineJob configuration is returned which can be used in with a Schedule.
+    submit_job=False,
+)
+
+# %% Create Schedule for DataIndex Job
+from azure.ai.ml.constants import TimeZone
+from azure.ai.ml.entities import JobSchedule, RecurrenceTrigger, RecurrencePattern
+from datetime import datetime, timedelta
+
+schedule_name = "onelake_s3_aoai_acs_mlindex_daily"
+
+schedule_start_time = datetime.utcnow() + timedelta(minutes=1)
+recurrence_trigger = RecurrenceTrigger(
+    frequency="day",
+    interval=1,
+    # schedule=RecurrencePattern(hours=16, minutes=[15]),
+    start_time=schedule_start_time,
+    time_zone=TimeZone.UTC,
+)
+
+job_schedule = JobSchedule(
+    name=schedule_name,
+    trigger=recurrence_trigger,
+    create_job=index_job,
+    properties=index_job.properties,
+)
+
+# %% Enable Schedule
+job_schedule_res = ml_client.schedules.begin_create_or_update(
+    schedule=job_schedule
+).result()
+job_schedule_res
+
+# %% Take a look at the schedule in Workpace Portal
+f"https://ml.azure.com/schedule/{schedule_name}/details/overview?wsid=/subscriptions/{ml_client.subscription_id}/resourceGroups/{ml_client.resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces/{ml_client.workspace_name}"
+
+# %% Get the MLIndex Asset
+onelake_s3_index_asset = ml_client.data.get(asset_name, label="latest")
+onelake_s3_index_asset
+
+## %% Try it out with langchain by loading the MLIndex asset using the azureml-rag SDK
+from azureml.rag.mlindex import MLIndex
+
+mlindex = MLIndex(onelake_s3_index_asset)
+
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("What is RAG?", k=5)
+docs
+
+# %% Take a look at those chunked docs
+import json
+
+for doc in docs:
+    print(json.dumps({"content": doc.page_content, **doc.metadata}, indent=2))
+
+# %%
diff --git a/sdk/python/generative-ai/rag/code_first/docs/mlindex.md b/sdk/python/generative-ai/rag/code_first/docs/mlindex.md
new file mode 100644
index 00000000000..e3e42682d8c
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/docs/mlindex.md
@@ -0,0 +1,35 @@
+# MLIndex
+
+An example MLIndex file:
+
+```yaml
+embeddings:
+  api_base: https://azureml-rag-oai.openai.azure.com
+  api_type: azure
+  api_version: 2023-03-15-preview
+  batch_size: "1"
+  connection:
+    id: Default_AzureOpenAI
+  connection_type: environment
+  deployment: text-embedding-ada-002
+  dimension: 1536
+  kind: open_ai
+  model: text-embedding-ada-002
+  schema_version: "2"
+index:
+  api_version: 2023-07-01-preview
+  connection:
+    id: /subs/<sub>/rgs/<rg>/wss/<ws>/conns/<conn>
+  connection_type: workspace_connection
+  endpoint: https://azureml-rag-acs.search.windows.net
+  engine: azure-sdk
+  field_mapping:
+    content: content
+    embedding: content_vector_open_ai
+    filename: sourcefile
+    metadata: meta_json_string
+    title: title
+    url: sourcepage
+  index: azure-docs-aoai-embeddings-rcts
+  kind: acs
+```
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/.promptflow/flow.tools.json b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/.promptflow/flow.tools.json
new file mode 100644
index 00000000000..ea7e81badcc
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/.promptflow/flow.tools.json
@@ -0,0 +1,175 @@
+{
+  "package": {
+    "promptflow.tools.embedding.embedding": {
+      "name": "Embedding",
+      "description": "Use Open AI's embedding model to create an embedding vector representing the input text.",
+      "type": "python",
+      "module": "promptflow.tools.embedding",
+      "function": "embedding",
+      "inputs": {
+        "connection": {
+          "type": [
+            "AzureOpenAIConnection",
+            "OpenAIConnection"
+          ]
+        },
+        "deployment_name": {
+          "type": [
+            "string"
+          ],
+          "enabled_by": "connection",
+          "enabled_by_type": [
+            "AzureOpenAIConnection"
+          ],
+          "capabilities": {
+            "completion": false,
+            "chat_completion": false,
+            "embeddings": true
+          },
+          "model_list": [
+            "text-embedding-ada-002",
+            "text-search-ada-doc-001",
+            "text-search-ada-query-001"
+          ]
+        },
+        "model": {
+          "type": [
+            "string"
+          ],
+          "enabled_by": "connection",
+          "enabled_by_type": [
+            "OpenAIConnection"
+          ],
+          "enum": [
+            "text-embedding-ada-002",
+            "text-search-ada-doc-001",
+            "text-search-ada-query-001"
+          ]
+        },
+        "input": {
+          "type": [
+            "string"
+          ]
+        }
+      },
+      "package": "promptflow-tools",
+      "package_version": "0.1.0b5"
+    },
+    "promptflow_vectordb.tool.vector_index_lookup.VectorIndexLookup.search": {
+      "name": "Vector Index Lookup",
+      "description": "Search text or vector based query from AzureML Vector Index.",
+      "type": "python",
+      "module": "promptflow_vectordb.tool.vector_index_lookup",
+      "class_name": "VectorIndexLookup",
+      "function": "search",
+      "inputs": {
+        "path": {
+          "type": [
+            "string"
+          ]
+        },
+        "query": {
+          "type": [
+            "object"
+          ]
+        },
+        "top_k": {
+          "default": "3",
+          "type": [
+            "int"
+          ]
+        }
+      },
+      "package": "promptflow-vectordb",
+      "package_version": "0.1.1"
+    }
+  },
+  "code": {
+    "generate_prompt_context.py": {
+      "type": "python",
+      "inputs": {
+        "search_result": {
+          "type": [
+            "object"
+          ]
+        }
+      },
+      "source": "generate_prompt_context.py",
+      "function": "generate_prompt_context"
+    },
+    "Prompt_variants.jinja2": {
+      "type": "prompt",
+      "inputs": {
+        "contexts": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_history": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_input": {
+          "type": [
+            "string"
+          ]
+        }
+      },
+      "source": "Prompt_variants.jinja2"
+    },
+    "Prompt_variants__variant_1.jinja2": {
+      "type": "prompt",
+      "inputs": {
+        "contexts": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_history": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_input": {
+          "type": [
+            "string"
+          ]
+        }
+      },
+      "source": "Prompt_variants__variant_1.jinja2"
+    },
+    "Prompt_variants__variant_2.jinja2": {
+      "type": "prompt",
+      "inputs": {
+        "contexts": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_history": {
+          "type": [
+            "string"
+          ]
+        },
+        "chat_input": {
+          "type": [
+            "string"
+          ]
+        }
+      },
+      "source": "Prompt_variants__variant_2.jinja2"
+    },
+    "chat_with_context.jinja2": {
+      "type": "llm",
+      "inputs": {
+        "prompt_text": {
+          "type": [
+            "string"
+          ]
+        }
+      },
+      "source": "chat_with_context.jinja2"
+    }
+  }
+}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants.jinja2 b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants.jinja2
new file mode 100644
index 00000000000..45baa7d7fc9
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants.jinja2
@@ -0,0 +1,16 @@
+system:
+* You are an AI system designed to answer questions from users in a designated context. When presented with a scenario, you must reply with accuracy to inquirers' inquiries using only descriptors provided in that same context. If there is ever a situation where you are unsure of the potential answers, simply respond with "I don't know.
+Please add citation after each sentence when possible in a form "(Source: citation)".
+context: {{contexts}}
+
+chat history:
+{% for item in chat_history %}
+user:
+{{ item.inputs.chat_input }}
+assistant:
+{{ item.outputs.chat_output }}
+
+{% endfor %}
+
+user question:
+{{ chat_input }}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_1.jinja2 b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_1.jinja2
new file mode 100644
index 00000000000..00cde5ddec6
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_1.jinja2
@@ -0,0 +1,16 @@
+system:
+* You are an AI agent tasked with helping users by responding with relevant and accurate answers based on the available context. Display your skills by creating a thoughtful response that reflects the provided information. Unleash your creativity!
+Please add citation after each sentence when possible in a form "(Source: citation)". 
+context: {{contexts}}
+
+chat history:
+{% for item in chat_history %}
+user:
+{{ item.inputs.chat_input }}
+assistant:
+{{ item.outputs.chat_output }}
+
+{% endfor %}
+
+user question:
+{{ chat_input }}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_2.jinja2 b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_2.jinja2
new file mode 100644
index 00000000000..f72b46960e7
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/Prompt_variants__variant_2.jinja2
@@ -0,0 +1,16 @@
+system:
+* You are an AI assistant for helping users answering question given a specific context.You are given a context and you'll be asked a question based on the context.Your answer should be as precise as possible and answer should be only from the context.Your answer should be succinct.
+Please add citation after each sentence when possible in a form "(Source: citation)".
+context: {{contexts}}
+
+chat history:
+{% for item in chat_history %}
+user:
+{{ item.inputs.chat_input }}
+assistant:
+{{ item.outputs.chat_output }}
+
+{% endfor %}
+
+user question:
+{{ chat_input }}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/chat_with_context.jinja2 b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/chat_with_context.jinja2
new file mode 100644
index 00000000000..b42d4e0465e
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/chat_with_context.jinja2
@@ -0,0 +1,2 @@
+{{prompt_text}}
+
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.dag.yaml b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.dag.yaml
new file mode 100644
index 00000000000..ab07576765e
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.dag.yaml
@@ -0,0 +1,96 @@
+inputs:
+  chat_history:
+    type: list
+  chat_input:
+    type: string
+    is_chat_input: true
+outputs:
+  chat_output:
+    type: string
+    reference: ${chat_with_context.output}
+    is_chat_output: true
+nodes:
+- name: embed_the_question
+  type: python
+  source:
+    type: package
+    tool: promptflow.tools.embedding.embedding
+  inputs:
+    connection: azureml-rag-oai
+    input: ${flow.chat_input}
+    deployment_name: text-embedding-ada-002
+- name: search_question_from_indexed_docs
+  type: python
+  source:
+    type: package
+    tool: promptflow_vectordb.tool.vector_index_lookup.VectorIndexLookup.search
+  inputs:
+    path: # Index uri
+    query: ${embed_the_question.output}
+    top_k: '2'
+- name: generate_prompt_context
+  type: python
+  source:
+    type: code
+    path: generate_prompt_context.py
+  inputs:
+    search_result: ${search_question_from_indexed_docs.output}
+- name: Prompt_variants
+  use_variants: true
+- name: chat_with_context
+  type: llm
+  source:
+    type: code
+    path: chat_with_context.jinja2
+  inputs:
+    deployment_name: gpt-35-turbo-16k
+    temperature: '0'
+    top_p: '1.0'
+    stop: ''
+    max_tokens: '1000'
+    presence_penalty: '0'
+    frequency_penalty: '0'
+    logit_bias: ''
+    prompt_text: ${Prompt_variants.output}
+  provider: AzureOpenAI
+  connection: azureml-rag-oai
+  api: chat
+  module: promptflow.tools.aoai
+node_variants:
+  Prompt_variants:
+    default_variant_id: variant_0
+    variants:
+      variant_0:
+        node:
+          type: prompt
+          source:
+            type: code
+            path: Prompt_variants.jinja2
+          inputs:
+            contexts: ${generate_prompt_context.output}
+            chat_history: ${inputs.chat_history}
+            chat_input: ${inputs.chat_input}
+      variant_1:
+        node:
+          type: prompt
+          source:
+            type: code
+            path: Prompt_variants__variant_1.jinja2
+          inputs:
+            chat_input: ${inputs.chat_input}
+            contexts: ${generate_prompt_context.output}
+            chat_history: ${inputs.chat_history}
+      variant_2:
+        node:
+          type: prompt
+          source:
+            type: code
+            path: Prompt_variants__variant_2.jinja2
+          inputs:
+            contexts: ${generate_prompt_context.output}
+            chat_history: ${inputs.chat_history}
+            chat_input: ${inputs.chat_input}
+id: bring_your_own_data_chat_qna
+name: Bring Your Own Data Chat QnA
+environment:
+    python_requirements_txt: requirements.txt
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.meta.yaml b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.meta.yaml
new file mode 100644
index 00000000000..334bcd778e5
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/flow.meta.yaml
@@ -0,0 +1,11 @@
+$schema: https://azuremlschemas.azureedge.net/latest/flow.schema.json
+name: bring_your_own_data_chat_qna
+display_name: Bring Your Own Data Chat QnA
+type: chat
+path: ./flow.dag.yaml
+description: Create flow for multi-round Q&A with GPT3.5 using data from your own indexed files to make the answer more grounded for enterprise chat scenarios.
+properties:
+  promptflow.stage: prod
+  promptflow.details.type: markdown
+  promptflow.details.source: README.md
+  promptflow.batch_inputs: samples.json
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/generate_prompt_context.py b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/generate_prompt_context.py
new file mode 100644
index 00000000000..30d25ca34e5
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/generate_prompt_context.py
@@ -0,0 +1,27 @@
+from typing import List
+from promptflow import tool
+from promptflow_vectordb.core.contracts import SearchResultEntity
+
+
+@tool
+def generate_prompt_context(search_result: List[dict]) -> str:
+    def format_doc(doc: dict):
+        return f"Content: {doc['Content']}\nSource: {doc['Source']}"
+
+    SOURCE_KEY = "source"
+    URL_KEY = "url"
+
+    retrieved_docs = []
+    for item in search_result:
+        entity = SearchResultEntity.from_dict(item)
+        content = entity.text or ""
+
+        source = ""
+        if entity.metadata is not None:
+            if SOURCE_KEY in entity.metadata:
+                if URL_KEY in entity.metadata[SOURCE_KEY]:
+                    source = entity.metadata[SOURCE_KEY][URL_KEY] or ""
+
+        retrieved_docs.append({"Content": content, "Source": source})
+    doc_string = "\n\n".join([format_doc(doc) for doc in retrieved_docs])
+    return doc_string
diff --git a/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/requirements.txt b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/requirements.txt
new file mode 100644
index 00000000000..5f2b966424f
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/bring_your_own_data_chat_qna/requirements.txt
@@ -0,0 +1,4 @@
+promptflow[azure]
+promptflow-tools
+azureml-rag[faiss]
+azure-ai-ml
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/.env.example b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/.env.example
new file mode 100644
index 00000000000..3eec21a1809
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/.env.example
@@ -0,0 +1,17 @@
+# Azure OpenAI, uncomment below section if you want to use Azure OpenAI
+# Note: EMBEDDING_MODEL_DEPLOYMENT_NAME and CHAT_MODEL_DEPLOYMENT_NAME are deployment names for Azure OpenAI
+OPENAI_API_TYPE=azure
+OPENAI_API_BASE=<your_AOAI_endpoint>
+OPENAI_API_KEY=<your_AOAI_key>
+OPENAI_API_VERSION=2023-05-15
+CHAT_MODEL_DEPLOYMENT_NAME=gpt-35-turbo
+
+# OpenAI, uncomment below section if you want to use OpenAI
+# Note: EMBEDDING_MODEL_DEPLOYMENT_NAME and CHAT_MODEL_DEPLOYMENT_NAME are model names for OpenAI
+#OPENAI_API_KEY=<your_openai_key>
+#OPENAI_ORG_ID=<your_openai_org_id> # this is optional
+#CHAT_MODEL_DEPLOYMENT_NAME=gpt-3.5-turbo
+
+PROMPT_TOKEN_LIMIT=2000
+MAX_COMPLETION_TOKENS=256
+VERBOSE=True
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/README.md b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/README.md
new file mode 100644
index 00000000000..2b5e8d30b78
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/README.md
@@ -0,0 +1,47 @@
+# Chat with MLIndex
+
+This is a simple flow that allow you to ask questions about the content of an MLIndex and get answers.
+You can run the flow with a URL to an MLIndex and question as argument.
+When you ask a question, it will look up the index to retrieve relevant content and post the question with the relevant content to OpenAI chat model (gpt-3.5-turbo or gpt4) to get an answer.
+
+Tools used in this flow：
+- custom `python` Tool
+
+## Prerequisites
+
+Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+
+## Get started
+### Create connection in this folder
+
+```bash
+# create connection needed by flow
+if pf connection list | grep open_ai_connection; then
+    echo "open_ai_connection already exists"
+else
+    pf connection create --file ./azure_openai.yml --name open_ai_connection --set api_key=<your_api_key> api_base=<your_api_base>
+fi
+```
+
+### SDK Example
+
+Refer to [local_docs_to_faiss_mlindex_with_promptfow.py](../../mlindex_local/local_docs_to_faiss_mlindex_with_promptfow.py)
+
+### CLI Example
+
+```bash
+# test with flow inputs, you need local or remote MLIndex (refer to SDK examples to create them)
+pf flow test --flow . --inputs question="" mlindex_uri="../../mlindex_local/mlindex_docs_aoai_faiss"
+
+# (Optional) create a random run name
+run_name="doc_questions_"$(openssl rand -hex 12)
+
+# run with multiline data, --name is optional
+pf run create --flow . --data ../data/rag_docs_questions.jsonl --stream --column-mapping question='${data.chat_input}' mlindex_uri='../../mlindex_local/mlindex_docs_aoai_faiss' chat_history='${data.chat_history}' config='{"CHAT_MODEL_DEPLOYMENT_NAME": "gpt-35-turbo", "PROMPT_TOKEN_LIMIT": "2000", "MAX_COMPLETION_TOKENS": "256", "VERBOSE": "True"}' --name $run_name
+
+# visualize run output details
+pf run visualize --name $run_name
+```
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/__init__.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/__init__.py
new file mode 100644
index 00000000000..8a8c8f356bc
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/__init__.py
@@ -0,0 +1,6 @@
+import sys
+import os
+
+sys.path.append(
+    os.path.join(os.path.dirname(os.path.abspath(__file__)), "chat_with_index")
+)
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/azure_openai.yml b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/azure_openai.yml
new file mode 100644
index 00000000000..acfccefff4b
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/azure_openai.yml
@@ -0,0 +1,6 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/AzureOpenAIConnection.schema.json
+name: open_ai_connection
+type: azure_open_ai
+api_key: "<user-input>"
+api_base: "aoai-api-endpoint"
+api_type: "azure"
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/batch_run.yaml b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/batch_run.yaml
new file mode 100644
index 00000000000..69ce41308d1
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/batch_run.yaml
@@ -0,0 +1,14 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Run.schema.json
+#name: chat_with_docs_default_20230820_162219_559000
+flow: .
+data: ../data/rag_docs_questions.jsonl
+#run: <Uncomment to select a run input>
+column_mapping:
+  chat_history: ${data.chat_history}
+  mlindex_uri: ${data.mlindex_uri}
+  question: ${data.question}
+  config: 
+    CHAT_MODEL_DEPLOYMENT_NAME: gpt-35-turbo
+    PROMPT_TOKEN_LIMIT: 3000
+    MAX_COMPLETION_TOKENS: 256
+    VERBOSE: true
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/chat_with_index_tool.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/chat_with_index_tool.py
new file mode 100644
index 00000000000..782822af43c
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/chat_with_index_tool.py
@@ -0,0 +1,37 @@
+from promptflow import tool
+from src.main import chat_with_index
+
+
+@tool
+def chat_with_index_tool(question: str, mlindex_uri: str, history: list, ready: str):
+    history = convert_chat_history_to_chatml_messages(history)
+
+    stream, context = chat_with_index(question, mlindex_uri, history)
+
+    answer = ""
+    for str in stream:
+        answer = answer + str + ""
+
+    return {"answer": answer, "context": context}
+
+
+def convert_chat_history_to_chatml_messages(history):
+    messages = []
+    for item in history:
+        messages.append({"role": "user", "content": item["inputs"]["question"]})
+        messages.append({"role": "assistant", "content": item["outputs"]["answer"]})
+
+    return messages
+
+
+def convert_chatml_messages_to_chat_history(messages):
+    history = []
+    for i in range(0, len(messages), 2):
+        history.append(
+            {
+                "inputs": {"question": messages[i]["content"]},
+                "outputs": {"answer": messages[i + 1]["content"]},
+            }
+        )
+
+    return history
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/find_context_tool.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/find_context_tool.py
new file mode 100644
index 00000000000..f906c495cf8
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/find_context_tool.py
@@ -0,0 +1,9 @@
+from promptflow import tool
+from src.find_context import find_context
+
+
+@tool
+def find_context_tool(question: str, mlindex_uri: str):
+    prompt, documents = find_context(question, mlindex_uri)
+
+    return {"prompt": prompt, "context": [d.page_content for d in documents]}
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/flow.dag.yaml b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/flow.dag.yaml
new file mode 100644
index 00000000000..0105acc714f
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/flow.dag.yaml
@@ -0,0 +1,61 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
+inputs:
+  chat_history:
+    type: list
+    default: []
+  mlindex_uri:
+    type: string
+    default: ""
+  question:
+    type: string
+    is_chat_input: true
+    default: What is an MLIndex?
+  config:
+    type: object
+    default:
+      CHAT_MODEL_DEPLOYMENT_NAME: gpt-35-turbo # change to gpt-3.5-turbo when using openai
+      PROMPT_TOKEN_LIMIT: 3000
+      MAX_COMPLETION_TOKENS: 256
+      VERBOSE: true
+outputs:
+  answer:
+    type: string
+    is_chat_output: true
+    reference: ${qna_tool.output.answer}
+  context:
+    type: string
+    reference: ${find_context_tool.output.context}
+nodes:
+- name: setup_env
+  type: python
+  source:
+    type: code
+    path: setup_env.py
+  inputs:
+    connection: azureml-rag-oai
+    config: ${inputs.config}
+- name: rewrite_question_tool
+  type: python
+  source:
+    type: code
+    path: rewrite_question_tool.py
+  inputs:
+    question: ${inputs.question}
+    history: ${inputs.chat_history}
+    env_ready_signal: ${setup_env.output}
+- name: find_context_tool
+  type: python
+  source:
+    type: code
+    path: find_context_tool.py
+  inputs:
+    mlindex_uri: ${inputs.mlindex_uri}
+    question: ${rewrite_question_tool.output}
+- name: qna_tool
+  type: python
+  source:
+    type: code
+    path: qna_tool.py
+  inputs:
+    prompt: ${find_context_tool.output.prompt}
+    history: ${inputs.chat_history}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/qna_tool.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/qna_tool.py
new file mode 100644
index 00000000000..7b6ebc6210f
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/qna_tool.py
@@ -0,0 +1,22 @@
+from promptflow import tool
+from src.qna import qna
+
+
+@tool
+def qna_tool(prompt: str, history: list):
+    stream = qna(prompt, convert_chat_history_to_chatml_messages(history))
+
+    answer = ""
+    for str in stream:
+        answer = answer + str + ""
+
+    return {"answer": answer}
+
+
+def convert_chat_history_to_chatml_messages(history):
+    messages = []
+    for item in history:
+        messages.append({"role": "user", "content": item["inputs"]["question"]})
+        messages.append({"role": "assistant", "content": item["outputs"]["answer"]})
+
+    return messages
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/requirements.txt b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/requirements.txt
new file mode 100644
index 00000000000..04f393cf5a6
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/requirements.txt
@@ -0,0 +1,8 @@
+PyPDF2
+azureml-rag[faiss]
+openai
+jinja2
+python-dotenv
+tiktoken
+promptflow[azure]
+promptflow-tools
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/rewrite_question_tool.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/rewrite_question_tool.py
new file mode 100644
index 00000000000..cfcb622039c
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/rewrite_question_tool.py
@@ -0,0 +1,7 @@
+from promptflow import tool
+from src.rewrite_question import rewrite_question
+
+
+@tool
+def rewrite_question_tool(question: str, history: list, env_ready_signal: str):
+    return rewrite_question(question, history)
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/setup_env.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/setup_env.py
new file mode 100644
index 00000000000..46685df8a62
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/setup_env.py
@@ -0,0 +1,27 @@
+import os
+from typing import Union
+
+from promptflow import tool
+from promptflow.connections import AzureOpenAIConnection, OpenAIConnection
+
+
+@tool
+def setup_env(connection: Union[AzureOpenAIConnection, OpenAIConnection], config: dict):
+    if not connection or not config:
+        return
+
+    if isinstance(connection, AzureOpenAIConnection):
+        os.environ["OPENAI_API_TYPE"] = "azure"
+        os.environ["OPENAI_API_BASE"] = connection.api_base
+        os.environ["OPENAI_API_KEY"] = connection.api_key
+        os.environ["OPENAI_API_VERSION"] = connection.api_version
+
+    if isinstance(connection, OpenAIConnection):
+        os.environ["OPENAI_API_KEY"] = connection.api_key
+        if connection.organization is not None:
+            os.environ["OPENAI_ORG_ID"] = connection.organization
+
+    for key in config:
+        os.environ[key] = str(config[key])
+
+    return "Ready"
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/README.md b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/README.md
new file mode 100644
index 00000000000..e1affbcff81
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/README.md
@@ -0,0 +1,22 @@
+# Chat with Index
+This is a simple Python application that allow you to ask questions about the content of an MLIndex and get answers.
+It's a console application that you start with a URI to an MLINdex as argument. When you ask a question, it will look up the index to retrieve relevant content and post the question with the relevant content to OpenAI chat model (gpt-3.5-turbo or gpt4) to get an answer.
+
+## How it works?
+
+## Get started
+### Create .env file in this folder with below content
+```
+OPENAI_API_BASE=<AOAI_endpoint>
+OPENAI_API_KEY=<AOAI_key>
+CHAT_MODEL_DEPLOYMENT_NAME=gpt-35-turbo
+PROMPT_TOKEN_LIMIT=3000
+MAX_COMPLETION_TOKENS=256
+VERBOSE=false
+```
+Note: CHAT_MODEL_DEPLOYMENT_NAME should point to a chat model like gpt-3.5-turbo or gpt-4
+
+### Run the command line
+```shell
+python main.py <uri-to-mlindex>
+```
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/__init__.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/__init__.py
new file mode 100644
index 00000000000..96a36c3a66e
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/__init__.py
@@ -0,0 +1,4 @@
+import sys
+import os
+
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/find_context.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/find_context.py
new file mode 100644
index 00000000000..1f6e2b5e360
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/find_context.py
@@ -0,0 +1,31 @@
+from jinja2 import Environment, FileSystemLoader
+import os
+
+from utils.oai import render_with_token_limit
+from utils.logging import log
+
+from azureml.rag.mlindex import MLIndex
+
+
+def find_context(question: str, index_path: str):
+    mlindex = MLIndex(index_path)
+    index = mlindex.as_native_index_client()
+    snippets = index.similarity_search(question, k=5)
+
+    template = Environment(
+        loader=FileSystemLoader(os.path.dirname(os.path.abspath(__file__)))
+    ).get_template("qna_prompt.md")
+    token_limit = int(os.environ.get("PROMPT_TOKEN_LIMIT"))
+
+    # Try to render the template with token limit and reduce snippet count if it fails
+    while True:
+        try:
+            prompt = render_with_token_limit(
+                template, token_limit, question=question, context=enumerate(snippets)
+            )
+            break
+        except ValueError:
+            snippets = snippets[:-1]
+            log(f"Reducing snippet count to {len(snippets)} to fit token limit")
+
+    return prompt, snippets
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/main.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/main.py
new file mode 100644
index 00000000000..4bff2780981
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/main.py
@@ -0,0 +1,66 @@
+import argparse
+from dotenv import load_dotenv
+import os
+
+from qna import qna
+from find_context import find_context
+from rewrite_question import rewrite_question
+
+# from build_index import create_faiss_index
+# from utils.lock import acquire_lock
+
+
+def chat_with_index(question: str, mlindex_uri: str, history: list):
+    # with acquire_lock("create_folder.lock"):
+    #     if not os.path.exists(".mlindex"):
+    #         os.makedirs(".mlindex")
+
+    # index_path = create_faiss_index(pdf_path)
+    q = rewrite_question(question, history)
+    prompt, context = find_context(q, mlindex_uri)
+    stream = qna(prompt, history)
+
+    return stream, context
+
+
+def print_stream_and_return_full_answer(stream):
+    answer = ""
+    for str in stream:
+        print(str, end="", flush=True)
+        answer = answer + str + ""
+    print(flush=True)
+
+    return answer
+
+
+def main_loop(mlindex_uri: str):
+    load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
+
+    history = []
+    while True:
+        question = input("\033[92m" + "$User (type q! to quit): " + "\033[0m")
+        if question == "q!":
+            break
+
+        stream, context = chat_with_index(question, mlindex_uri, history)
+
+        print("\033[92m" + "$Bot: " + "\033[0m", end=" ", flush=True)
+        answer = print_stream_and_return_full_answer(stream)
+        history = history + [
+            {"role": "user", "content": question},
+            {"role": "assistant", "content": answer},
+        ]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Ask questions about the contents of an MLIndex."
+    )
+    parser.add_argument("mlindex_uri", help="URI to MLIndex")
+    args = parser.parse_args()
+
+    main_loop(args.mlindex_uri)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna.py
new file mode 100644
index 00000000000..fc2da77218a
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna.py
@@ -0,0 +1,15 @@
+import os
+
+from utils.oai import OAIChat
+
+
+def qna(prompt: str, history: list):
+    max_completion_tokens = int(os.environ.get("MAX_COMPLETION_TOKENS"))
+
+    chat = OAIChat()
+    stream = chat.stream(
+        messages=history + [{"role": "user", "content": prompt}],
+        max_tokens=max_completion_tokens,
+    )
+
+    return stream
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna_prompt.md b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna_prompt.md
new file mode 100644
index 00000000000..054b7b589eb
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/qna_prompt.md
@@ -0,0 +1,15 @@
+You're a smart assistant can answer questions based on provided context and previous conversation history between you and human.
+
+Use the context to answer the question at the end, note that the context has order and importance - e.g. context #1 is more important than #2.
+
+Try as much as you can to answer based on the provided the context, if you cannot derive the answer from the context, you should say you don't know.
+Answer in the same language as the question.
+
+# Context
+{% for i, c in context %}
+## Context #{{i+1}}
+{{c.page_content}}
+{% endfor %}
+
+# Question
+{{question}}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question.py
new file mode 100644
index 00000000000..6e299a6bf04
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question.py
@@ -0,0 +1,31 @@
+from jinja2 import Environment, FileSystemLoader
+import os
+from utils.logging import log
+from utils.oai import OAIChat, render_with_token_limit
+
+
+def rewrite_question(question: str, history: list):
+    template = Environment(
+        loader=FileSystemLoader(os.path.dirname(os.path.abspath(__file__)))
+    ).get_template("rewrite_question_prompt.md")
+    token_limit = int(os.environ["PROMPT_TOKEN_LIMIT"])
+    max_completion_tokens = int(os.environ["MAX_COMPLETION_TOKENS"])
+
+    # Try to render the prompt with token limit and reduce the history count if it fails
+    while True:
+        try:
+            prompt = render_with_token_limit(
+                template, token_limit, question=question, history=history
+            )
+            break
+        except ValueError:
+            history = history[:-1]
+            log(f"Reducing chat history count to {len(history)} to fit token limit")
+
+    chat = OAIChat()
+    rewritten_question = chat.generate(
+        messages=[{"role": "user", "content": prompt}], max_tokens=max_completion_tokens
+    )
+    log(f"Rewritten question: {rewritten_question}")
+
+    return rewritten_question
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question_prompt.md b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question_prompt.md
new file mode 100644
index 00000000000..d9c0073d8c4
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/rewrite_question_prompt.md
@@ -0,0 +1,33 @@
+You are able to reason from previous conversation and the recent question, to come up with a rewrite of the question which is concise but with enough information that people without knowledge of previous conversation can understand the question.
+
+A few examples:
+
+# Example 1
+## Previous conversation
+user: Who is Bill Clinton?
+assistant: Bill Clinton is an American politician who served as the 42nd President of the United States from 1993 to 2001. 
+## Question
+user: When was he born?
+## Rewritten question 
+When was Bill Clinton born?
+
+# Example 2
+## Previous conversation
+user: What is BERT?
+assistant: BERT stands for "Bidirectional Encoder Representations from Transformers." It is a natural language processing (NLP) model developed by Google. 
+user: What data was used for its training?
+assistant: The BERT (Bidirectional Encoder Representations from Transformers) model was trained on a large corpus of publicly available text from the internet. It was trained on a combination of books, articles, websites, and other sources to learn the language patterns and relationships between words.
+## Question
+user: What NLP tasks can it perform well?
+## Rewritten question
+What NLP tasks can BERT perform well?
+
+Now comes the actual work - please respond with the rewritten question in the same language as the question, nothing else.
+
+## Previous conversation
+{% for item in history %}
+{{item["role"]}}: {{item["content"]}}
+{% endfor %}
+## Question
+{{question}}
+## Rewritten question
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/test.ipynb b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/test.ipynb
new file mode 100644
index 00000000000..edc8adff5e5
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/test.ipynb
@@ -0,0 +1,55 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from main import chat_with_index, print_stream_and_return_full_answer\n",
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "load_dotenv()\n",
+    "\n",
+    "mlindex_uri = \"../../../mlindex_local/mlindex_docs_aoai_faiss\"\n",
+    "questions = [\n",
+    "    \"What is an MLIndex?\",\n",
+    "    \"What are somes examples I can run which use MLIndex?\",\n",
+    "]\n",
+    "\n",
+    "history = []\n",
+    "for q in questions:\n",
+    "    stream, context = chat_with_index(q, mlindex_uri, history)\n",
+    "    print(\"User: \" + q, flush=True)\n",
+    "    print(\"Bot: \", end=\"\", flush=True)\n",
+    "    answer = print_stream_and_return_full_answer(stream)\n",
+    "    history = history + [\n",
+    "        {\"role\": \"user\", \"content\": q},\n",
+    "        {\"role\": \"assistant\", \"content\": answer},\n",
+    "    ]"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "pfsdk",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  },
+  "stage": "development"
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/__init__.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/__init__.py
new file mode 100644
index 00000000000..d55ccad1f57
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/__init__.py
@@ -0,0 +1 @@
+__path__ = __import__("pkgutil").extend_path(__path__, __name__)  # type: ignore
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/logging.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/logging.py
new file mode 100644
index 00000000000..0122c0791f1
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/logging.py
@@ -0,0 +1,7 @@
+import os
+
+
+def log(message: str):
+    verbose = os.environ.get("VERBOSE")
+    if verbose.lower() == "true":
+        print(message, flush=True)
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/oai.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/oai.py
new file mode 100644
index 00000000000..3df08bd4c4d
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/oai.py
@@ -0,0 +1,146 @@
+from typing import List
+import openai
+import os
+import tiktoken
+from jinja2 import Template
+
+from .retry import (
+    retry_and_handle_exceptions,
+    retry_and_handle_exceptions_for_generator,
+)
+from .logging import log
+
+
+def extract_delay_from_rate_limit_error_msg(text):
+    import re
+
+    pattern = r"retry after (\d+)"
+    match = re.search(pattern, text)
+    if match:
+        retry_time_from_message = match.group(1)
+        return float(retry_time_from_message)
+    else:
+        return 5  # default retry time
+
+
+class OAI:
+    def __init__(self):
+        if os.getenv("OPENAI_API_TYPE") is not None:
+            openai.api_type = os.getenv("OPENAI_API_TYPE")
+        if os.getenv("OPENAI_API_BASE") is not None:
+            openai.api_base = os.environ.get("OPENAI_API_BASE")
+        if os.getenv("OPENAI_API_VERSION") is not None:
+            openai.api_version = os.environ.get("OPENAI_API_VERSION")
+        if os.getenv("OPENAI_ORG_ID") is not None:
+            openai.organization = os.environ.get("OPENAI_ORG_ID")
+        if os.getenv("OPENAI_API_KEY") is None:
+            raise ValueError("OPENAI_API_KEY is not set in environment variables")
+
+        openai.api_key = os.environ.get("OPENAI_API_KEY")
+
+        # A few sanity checks
+        if openai.api_type == "azure" and openai.api_base is None:
+            raise ValueError(
+                "OPENAI_API_BASE is not set in environment variables, this is required when api_type==azure"
+            )
+        if openai.api_type == "azure" and openai.api_version is None:
+            raise ValueError(
+                "OPENAI_API_VERSION is not set in environment variables, this is required when api_type==azure"
+            )
+        if openai.api_type == "azure" and openai.api_key.startswith("sk-"):
+            raise ValueError(
+                "OPENAI_API_KEY should not start with sk- when api_type==azure, are you using openai key by mistake?"
+            )
+
+
+class OAIChat(OAI):
+    @retry_and_handle_exceptions(
+        exception_to_check=(
+            openai.error.RateLimitError,
+            openai.error.APIError,
+            KeyError,
+        ),
+        max_retries=5,
+        extract_delay_from_error_message=extract_delay_from_rate_limit_error_msg,
+    )
+    def generate(self, messages: list, **kwargs) -> List[float]:
+        if openai.api_type == "azure":
+            return openai.ChatCompletion.create(
+                engine=os.environ.get("CHAT_MODEL_DEPLOYMENT_NAME"),
+                messages=messages,
+                **kwargs,
+            )["choices"][0]["message"]["content"]
+        else:
+            return openai.ChatCompletion.create(
+                model=os.environ.get("CHAT_MODEL_DEPLOYMENT_NAME"),
+                messages=messages,
+                **kwargs,
+            )["choices"][0]["message"]["content"]
+
+    @retry_and_handle_exceptions_for_generator(
+        exception_to_check=(
+            openai.error.RateLimitError,
+            openai.error.APIError,
+            KeyError,
+        ),
+        max_retries=5,
+        extract_delay_from_error_message=extract_delay_from_rate_limit_error_msg,
+    )
+    def stream(self, messages: list, **kwargs):
+        if openai.api_type == "azure":
+            response = openai.ChatCompletion.create(
+                engine=os.environ.get("CHAT_MODEL_DEPLOYMENT_NAME"),
+                messages=messages,
+                stream=True,
+                **kwargs,
+            )
+        else:
+            response = openai.ChatCompletion.create(
+                model=os.environ.get("CHAT_MODEL_DEPLOYMENT_NAME"),
+                messages=messages,
+                stream=True,
+                **kwargs,
+            )
+
+        for chunk in response:
+            if "choices" not in chunk or len(chunk["choices"]) == 0:
+                continue
+            delta = chunk["choices"][0]["delta"]
+            if "content" in delta:
+                yield delta["content"]
+
+
+class OAIEmbedding(OAI):
+    @retry_and_handle_exceptions(
+        exception_to_check=openai.error.RateLimitError,
+        max_retries=5,
+        extract_delay_from_error_message=extract_delay_from_rate_limit_error_msg,
+    )
+    def generate(self, text: str) -> List[float]:
+        if openai.api_type == "azure":
+            return openai.Embedding.create(
+                input=text, engine=os.environ.get("EMBEDDING_MODEL_DEPLOYMENT_NAME")
+            )["data"][0]["embedding"]
+        else:
+            return openai.Embedding.create(
+                input=text, model=os.environ.get("EMBEDDING_MODEL_DEPLOYMENT_NAME")
+            )["data"][0]["embedding"]
+
+
+def count_token(text: str) -> int:
+    encoding = tiktoken.get_encoding("cl100k_base")
+    return len(encoding.encode(text))
+
+
+def render_with_token_limit(template: Template, token_limit: int, **kwargs) -> str:
+    text = template.render(**kwargs)
+    token_count = count_token(text)
+    if token_count > token_limit:
+        message = f"token count {token_count} exceeds limit {token_limit}"
+        log(message)
+        raise ValueError(message)
+    return text
+
+
+if __name__ == "__main__":
+    print(count_token("hello world"))
diff --git a/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/retry.py b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/retry.py
new file mode 100644
index 00000000000..652467cc3c2
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/chat-with-index/src/utils/retry.py
@@ -0,0 +1,92 @@
+from typing import Tuple, Union, Optional
+import functools
+import time
+import random
+
+
+def retry_and_handle_exceptions(
+    exception_to_check: Union[Exception, Tuple[Exception]],
+    max_retries: int = 3,
+    initial_delay: float = 1,
+    exponential_base: float = 2,
+    jitter: bool = False,
+    extract_delay_from_error_message: Optional[any] = None,
+):
+    def deco_retry(func):
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            delay = initial_delay
+            for i in range(max_retries):
+                try:
+                    return func(*args, **kwargs)
+                except exception_to_check as e:
+                    if i == max_retries - 1:
+                        raise Exception(
+                            "Func execution failed after {0} retries: {1}".format(
+                                max_retries, e
+                            )
+                        )
+                    delay *= exponential_base * (1 + jitter * random.random())
+                    delay_from_error_message = None
+                    if extract_delay_from_error_message is not None:
+                        delay_from_error_message = extract_delay_from_error_message(
+                            str(e)
+                        )
+                    final_delay = (
+                        delay_from_error_message if delay_from_error_message else delay
+                    )
+                    print(
+                        "Func execution failed. Retrying in {0} seconds: {1}".format(
+                            final_delay, e
+                        )
+                    )
+                    time.sleep(final_delay)
+
+        return wrapper
+
+    return deco_retry
+
+
+def retry_and_handle_exceptions_for_generator(
+    exception_to_check: Union[Exception, Tuple[Exception]],
+    max_retries: int = 3,
+    initial_delay: float = 1,
+    exponential_base: float = 2,
+    jitter: bool = False,
+    extract_delay_from_error_message: Optional[any] = None,
+):
+    def deco_retry(func):
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            delay = initial_delay
+            for i in range(max_retries):
+                try:
+                    for value in func(*args, **kwargs):
+                        yield value
+                    break
+                except exception_to_check as e:
+                    if i == max_retries - 1:
+                        raise Exception(
+                            "Func execution failed after {0} retries: {1}".format(
+                                max_retries, e
+                            )
+                        )
+                    delay *= exponential_base * (1 + jitter * random.random())
+                    delay_from_error_message = None
+                    if extract_delay_from_error_message is not None:
+                        delay_from_error_message = extract_delay_from_error_message(
+                            str(e)
+                        )
+                    final_delay = (
+                        delay_from_error_message if delay_from_error_message else delay
+                    )
+                    print(
+                        "Func execution failed. Retrying in {0} seconds: {1}".format(
+                            final_delay, e
+                        )
+                    )
+                    time.sleep(final_delay)
+
+        return wrapper
+
+    return deco_retry
diff --git a/sdk/python/generative-ai/rag/code_first/flows/data/azure_search_docs_questions.jsonl b/sdk/python/generative-ai/rag/code_first/flows/data/azure_search_docs_questions.jsonl
new file mode 100644
index 00000000000..de4cada1fad
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/data/azure_search_docs_questions.jsonl
@@ -0,0 +1,3 @@
+{"chat_history":[], "chat_input": "Hi", "chat_output": "Hello! How can I assist you today?"}
+{"chat_history":[], "chat_input": "When was Vector Search introduced to Azure Cognitive Search?", "chat_output": ""}
+{"chat_history":[], "chat_input": "How do I configure my index ot use Vector Search", "chat_output": ""}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/flows/data/rag_docs_questions.jsonl b/sdk/python/generative-ai/rag/code_first/flows/data/rag_docs_questions.jsonl
new file mode 100644
index 00000000000..465b05198d4
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/flows/data/rag_docs_questions.jsonl
@@ -0,0 +1,3 @@
+{"chat_history":[], "chat_input": "Hi", "answer": "Hello! How can I assist you today?"}
+{"chat_history":[], "chat_input": "What is an MLIndex?", "answer": "MLIndex is a representation of a model used to generate embeddings from text and an index which can be searched using embedding vectors. It contains information such as the API base, API type, connection type, deployment, model, schema version, endpoint, field mapping, and more."}
+{"chat_history":[], "chat_input": "What are somes examples I can run which use MLIndex?", "answer": ""}
\ No newline at end of file
diff --git a/sdk/python/generative-ai/rag/code_first/mlindex_local/langchain_docs_to_mlindex.py b/sdk/python/generative-ai/rag/code_first/mlindex_local/langchain_docs_to_mlindex.py
new file mode 100644
index 00000000000..b3674d5f2c3
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/mlindex_local/langchain_docs_to_mlindex.py
@@ -0,0 +1,50 @@
+# %%[markdown]
+# # Build an ACS Index using langchain data loaders and MLIndex SDK
+
+# %% Pre-requisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[cognitive_search]>=0.2.0'
+# %pip install wikipedia
+
+# %% Get Azure Cognitive Search Connection
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(
+    credential=DefaultAzureCredential(), path="config.json"
+)
+
+acs_connection = ml_client.connections.get("azureml-rag-acs")
+aoai_connection = ml_client.connections.get("azureml-rag-oai")
+
+# %% https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/wikipedia.html
+from langchain.document_loaders import WikipediaLoader
+
+docs = WikipediaLoader(query="HUNTER X HUNTER", load_max_docs=10).load()
+len(docs)
+
+# %%
+from langchain.text_splitter import MarkdownTextSplitter
+
+split_docs = MarkdownTextSplitter.from_tiktoken_encoder(
+    chunk_size=1024
+).split_documents(docs)
+
+# %%
+from azureml.rag.mlindex import MLIndex
+
+# Process data into FAISS Index using HuggingFace embeddings
+mlindex = MLIndex.from_documents(
+    documents=split_docs,
+    embeddings_model="azure_open_ai://deployment/text-embedding-ada-002/model/text-embedding-ada-002",
+    embeddings_connection=aoai_connection,
+    embeddings_container="./.embeddings_cache/hunter_x_hunter_aoai_acs",
+    index_type="acs",
+    index_connection=acs_connection,
+    index_config={"index_name": "hunter_x_hunter_aoai_acs"},
+)
+
+# %% Query documents, use with inferencing framework
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("What is bungie gum?", k=5)
+print(docs)
diff --git a/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_acs_aoai_mlindex.py b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_acs_aoai_mlindex.py
new file mode 100644
index 00000000000..82fe8585542
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_acs_aoai_mlindex.py
@@ -0,0 +1,42 @@
+# %%[markdown]
+# # Build an ACS Index using MLIndex SDK
+
+# %% Pre-requisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[document_parsing,cognitive_search]>=0.2.0'
+
+# %% Get Azure Cognitive Search Connection
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(credential=DefaultAzureCredential())
+
+acs_connection = ml_client.connections.get("azureml-rag-acs")
+aoai_connection = ml_client.connections.get("azureml-rag-oai")
+
+# %%
+from azureml.rag.mlindex import MLIndex
+
+# Process data into FAISS Index using HuggingFace embeddings
+mlindex = MLIndex.from_files(
+    source_uri="../",
+    source_glob="**/*",
+    chunk_size=200,
+    embeddings_model="azure_open_ai://deployment/text-embedding-ada-002/model/text-embedding-ada-002",
+    embeddings_connection=aoai_connection,
+    embeddings_container="./.embeddings_cache/mlindex_docs_aoai_acs",
+    index_type="acs",
+    index_connection=acs_connection,
+    index_config={"index_name": "mlindex_docs_aoai_acs"},
+    output_path="./acs_open_ai_index",
+)
+
+# %% Load MLIndex from local
+from azureml.rag.mlindex import MLIndex
+
+mlindex = MLIndex("./acs_open_ai_index")
+
+# %% Query documents, use with inferencing framework
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("Topic in my data.", k=5)
+print(docs)
diff --git a/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex.py b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex.py
new file mode 100644
index 00000000000..fbde424b966
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex.py
@@ -0,0 +1,29 @@
+# %%[markdown]
+# # Build a Faiss Index using MLIndex SDK
+
+# %% Pre-requisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[document_parsing,faiss,hugging_face]>=0.2.0'
+
+# %%
+from azureml.rag.mlindex import MLIndex
+
+# Process data into FAISS Index using HuggingFace embeddings
+mlindex = MLIndex.from_files(
+    source_uri="../",
+    source_glob="**/*",
+    chunk_size=200,
+    # embeddings_model=sentence_transformers.SentenceTransformer('sentence-transformers/all-mpnet-base-v2'),
+    embeddings_model="hugging_face://model/sentence-transformers/all-mpnet-base-v2",
+    embeddings_container="./.embeddings_cache/mlindex_docs_mpnet_faiss",
+    index_type="faiss",
+)
+
+# %% Query documents, use with inferencing framework
+index = mlindex.as_langchain_vectorstore()
+docs = index.similarity_search("Topic in my data.", k=5)
+print(docs)
+
+# %% Save for later
+mlindex.save("./different_index_path")
+mlindex = MLIndex("./different_index_path")
diff --git a/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex_with_promptflow.py b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex_with_promptflow.py
new file mode 100644
index 00000000000..a4c01631eb5
--- /dev/null
+++ b/sdk/python/generative-ai/rag/code_first/mlindex_local/local_docs_to_faiss_mlindex_with_promptflow.py
@@ -0,0 +1,87 @@
+# %% Pre-requisites
+# %pip install 'azure-ai-ml==1.10.0a20230825006' --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
+# %pip install 'azureml-rag[document_parsing,faiss]>=0.2.0'
+# %pip install -U 'promptflow[azure]' promptflow-tools promptflow-vectordb
+
+# %% Get Azure Cognitive Search Connection
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential
+
+ml_client = MLClient.from_config(credential=DefaultAzureCredential())
+
+aoai_connection = ml_client.connections.get("azureml-rag-oai")
+
+# %% Build MLIndex
+from azureml.rag.mlindex import MLIndex
+
+# Process data into FAISS Index using Azure OpenAI embeddings
+mlindex_name = "mlindex_docs_aoai_faiss"
+mlindex_local_path = f"./{mlindex_name}"
+
+mlindex = MLIndex.from_files(
+    source_uri="../",
+    source_glob="**/*",
+    chunk_size=200,
+    embeddings_model="azure_open_ai://deployment/text-embedding-ada-002/model/text-embedding-ada-002",
+    embeddings_connection=aoai_connection,
+    embeddings_container=f"./.embeddings_cache/{mlindex_name}",
+    index_type="faiss",
+    output_path=mlindex_local_path,
+)
+
+# %% Get Promptflow client
+import promptflow
+
+pf = promptflow.PFClient()
+
+# %% List all the available connections
+for c in pf.connections.list():
+    print(c.name + " (" + c.type + ")")
+
+# %% Load index qna flow
+from pathlib import Path
+
+flow_path = Path.cwd().parent / "flows" / "chat-with-index"
+
+
+# %% Run qna flow
+output = pf.flows.test(
+    flow_path,
+    inputs={
+        "chat_history": [],
+        "mlindex_uri": str(Path.cwd() / mlindex_local_path),
+        "question": "what is an MLIndex?",
+    },
+)
+
+answer = output["answer"]
+for part in answer:
+    print(part, end="")
+
+print(output["context"])
+
+# %% Run qna flow with multiple inputs
+data_path = Path.cwd().parent / "flows" / "data" / "rag_docs_questions.jsonl"
+
+config = {
+    "CHAT_MODEL_DEPLOYMENT_NAME": "gpt-35-turbo",
+    "PROMPT_TOKEN_LIMIT": 2000,
+    "MAX_COMPLETION_TOKENS": 256,
+    "VERBOSE": True,
+}
+
+column_mapping = {
+    "chat_history": "${data.chat_history}",
+    "mlindex_uri": str(
+        Path.cwd() / mlindex_local_path,
+    ),
+    "question": "${data.chat_input}",
+    "answer": "${data.answer}",
+    "config": config,
+}
+run = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
+pf.stream(run)
+
+print(f"{run}")
+
+# %%