diff --git a/_freeze/embeddings/applications/execute-results/html.json b/_freeze/embeddings/applications/execute-results/html.json
index d2cb0b0..081fbf7 100644
--- a/_freeze/embeddings/applications/execute-results/html.json
+++ b/_freeze/embeddings/applications/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "c7c6aaa88acff3e085f8bcd0144e88ab",
+  "hash": "4dcd8d01a2de3b1e2e0f0407e336d87c",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Applications\nformat:\n  html:\n    code-fold: true\n---\n\n",
+    "markdown": "---\ntitle: Applications\nformat:\n  html:\n    code-fold: true\n---\n\nBuild a bot that can answer questions based on documents!\nResource: https://platform.openai.com/docs/tutorials/web-qa-embeddings\n\n",
     "supporting": [
       "applications_files"
     ],
diff --git a/_freeze/llm/gpt_api/execute-results/html.json b/_freeze/llm/gpt_api/execute-results/html.json
index 556f51d..313a3ee 100644
--- a/_freeze/llm/gpt_api/execute-results/html.json
+++ b/_freeze/llm/gpt_api/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "dc7e1b2db90c90886a913f41ce9fd215",
+  "hash": "b873702b4a61fcaca14b721868545856",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: The OpenAI API\nformat:\n  html:\n    code-fold: false\n---\n\nResource: [OpenAI API docs](https://platform.openai.com/docs/introduction){.external}\n\n\nLet's get started with the OpenAI API for GPT. \n\n\n### Authentication\n\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. \nOnce you've registered, you'll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. \nThis key is essential for ensuring secure communication between your application and OpenAI's servers. \nWithout proper authentication, your requests will be rejected.\nYou can create your own account, but for the seminar we will provide the client with the credential within the Jupyterlab (TODO: Link).\n\n::: {#a9cb3d89 .cell execution_count=1}\n``` {.python .cell-code}\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n```\n:::\n\n\n### Requesting Completions\n\nMost interaction with GPT and other models consist in generating completions for certain tasks (TODO: Link to completions)\n\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. \nThese requests are structured to include various parameters that guide the generation of text completions. \nThe most fundamental parameter is the prompt text, which sets the context for the completion. \nAdditionally, you can specify the desired model configuration, such as the engine to use (e.g., \"gpt-4\"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n::: {#662daa55 .cell execution_count=2}\n``` {.python .cell-code}\n# creating a completion\nchat_completion = client.chat.completions.create(\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"How old is the earth?\",\n        }\n    ],\n    model=\"gpt-3.5-turbo\"\n)\n```\n:::\n\n\n### Processing\n\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. \nThis process involves analyzing the context provided by the prompt and leveraging the model's pre-trained knowledge to generate text completions. \nThe model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. \nBy drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n### Response\n\nAfter processing your request, the OpenAI API returns a JSON-formatted response containing the generated text completions. \nDepending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as a confidence score indicating the model's level of certainty in the generated text. \nThis response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application's behavior accordingly.\n\n### Error Handling\n\nWhile interacting with the OpenAI API, it's crucial to implement robust error handling mechanisms to gracefully manage any potential issues that may arise. \nCommon errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. B\ny handling errors effectively, you can ensure the reliability and resilience of your application, minimizing disruptions to the user experience and maintaining smooth operation under varying conditions. \nImplementing proper error handling practices is essential for building robust and dependable applications that leverage the capabilities of the OpenAI Chat Completions API effectively.\n\n",
+    "markdown": "---\ntitle: The OpenAI API\nformat:\n  html:\n    code-fold: false\n---\n\n::: {.callout-note}\nResource: [OpenAI API docs](https://platform.openai.com/docs/introduction){.external}\n:::\n\n\n\nLet's get started with the OpenAI API for GPT. \n\n\n### Authentication\n\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. \nOnce you've registered, you'll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. \nThis key is essential for ensuring secure communication between your application and OpenAI's servers. \nWithout proper authentication, your requests will be rejected.\nYou can create your own account, but for the seminar we will provide the client with the credential within the Jupyterlab (TODO: Link).\n\n::: {#1c41ead3 .cell execution_count=1}\n``` {.python .cell-code}\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n```\n:::\n\n\n### Requesting Completions\n\nMost interaction with GPT and other models consist in generating completions for certain tasks (TODO: Link to completions)\n\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. \nThese requests are structured to include various parameters that guide the generation of text completions. \nThe most fundamental parameter is the prompt text, which sets the context for the completion. \nAdditionally, you can specify the desired model configuration, such as the engine to use (e.g., \"gpt-4\"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n::: {#a7e7ff6f .cell execution_count=2}\n``` {.python .cell-code}\n# creating a completion\nchat_completion = client.chat.completions.create(\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"How old is the earth?\",\n        }\n    ],\n    model=\"gpt-3.5-turbo\"\n)\n```\n:::\n\n\n### Processing\n\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. \nThis process involves analyzing the context provided by the prompt and leveraging the model's pre-trained knowledge to generate text completions. \nThe model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. \nBy drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n### Response\n\nAfter processing your request, the OpenAI API returns a JSON-formatted response containing the generated text completions. \nDepending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as a confidence score indicating the model's level of certainty in the generated text. \nThis response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application's behavior accordingly.\n\n### Error Handling\n\nWhile interacting with the OpenAI API, it's crucial to implement robust error handling mechanisms to gracefully manage any potential issues that may arise. \nCommon errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. B\ny handling errors effectively, you can ensure the reliability and resilience of your application, minimizing disruptions to the user experience and maintaining smooth operation under varying conditions. \nImplementing proper error handling practices is essential for building robust and dependable applications that leverage the capabilities of the OpenAI Chat Completions API effectively.\n\n",
     "supporting": [
       "gpt_api_files"
     ],
diff --git a/_freeze/llm/parameterization/execute-results/html.json b/_freeze/llm/parameterization/execute-results/html.json
index ffb6c73..28f7be4 100644
--- a/_freeze/llm/parameterization/execute-results/html.json
+++ b/_freeze/llm/parameterization/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "bc2adb1e48802eac607894bda896dc2e",
+  "hash": "a20b274126bd9f16f0ae718708e16f43",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Parameterization of GPT\nformat:\n  html:\n    code-fold: true\n---\n\n- **Temperature**: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It's often used to balance between generating safe, conservative responses and more novel, imaginative ones.\n\n- **Max Tokens**: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.\n\n- **Top P (Nucleus Sampling)**: Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It's particularly useful for generating diverse and contextually relevant responses.\n\n- **Frequency Penalty**: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.\n\n- **Presence Penalty**: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It's useful for generating more creative and novel outputs that are not directly predictable from the input.\n\n- **Stop Sequence**: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.\n\n- **Repetition Penalty**: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It's particularly useful for generating coherent and diverse text without excessive redundancy.\n\n- **Length Penalty**: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.\n\n\n\n## Roles: \n\n::: {#3a526819 .cell execution_count=1}\n``` {.python .cell-code}\nfrom openai import OpenAI\nclient = OpenAI()\n\ncompletion = client.chat.completions.create(\n  model=\"gpt-3.5-turbo\",\n  messages=[\n    {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n    {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n  ]\n)\n\nprint(completion.choices[0].message)\n```\n:::\n\n\n## Function calling: \nhttps://platform.openai.com/docs/guides/function-calling\n\n",
+    "markdown": "---\ntitle: Parameterization of GPT\nformat:\n  html:\n    code-fold: false\n    code-wrap: true\n---\n\nThe GPT models provided by OpenAI provide a variety of parameters that can change the way the language model responds. \nBelow you can find a list of the most important ones.\n\n- **Temperature**: Temperature (`temperaure`) is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It's often used to balance between generating safe, conservative responses and more novel, imaginative ones.\n\n- **Max Tokens**: Max Tokens (`max_tokens`) limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.\n\n- **Top P (Nucleus Sampling)**: Top P (`top_p`), also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It's particularly useful for generating diverse and contextually relevant responses.\n\n- **Frequency Penalty**: Frequency Penalty (`frequency_penalty`) penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.\n\n- **Presence Penalty**: Presence Penalty (`presence_penalty`) penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It's useful for generating more creative and novel outputs that are not directly predictable from the input.\n\n- **Stop Sequence**: Stop Sequence (`stop`) specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.\n\n\n## Roles: \n\nIn order to cover most tasks you want to perform using a chat format, the OpenAI API let's you define different `roles` in the chat. \nThe available roles are `system`, `assistant`, `user` and `tools`. \nYou should already be familiar with two of them by now: \nThe `user` role corresponds to the actual user prompting the language model, all answers are given with the `assisstant` role.\n\nThe `system` role can now be given to provide some additional general instructions to the language model that are typically not a user input, for example, the style in which the model responds. \nIn this case, an example is better than any explanation.\n\n::: {#875ab431 .cell execution_count=1}\n``` {.python .cell-code}\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\ncompletion = client.chat.completions.create(\n  model=\"MODEL\",\n  messages=[\n    {\"role\": \"system\", \"content\": \"You are an annoyed technician working in a help center for dish washers, who answers in short, unfriendly bursts.\"},\n    {\"role\": \"user\", \"content\": \"My dish washer does not clean the dishes, what could be the reason.\"}\n  ]\n)\n\nprint(completion.choices[0].message.content)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nCould be anything. Blocked spray arm. Clogged filter. Faulty pump. Detergent issue. Check all that.\n```\n:::\n:::\n\n\n## Function calling: {#sec-test} \n\nAs we have seen, most interactions with a language model happen in form of a chat with almost \"free\" question or instructions and answers.\nWhile this seems the most natural in most cases, it is not always a practical format if we want to use a language model for very specific purposes.\nThis happens particularly often when we want to employ a language model in business situations, where we require a consistent output of the model.\n\nAs an example, let us try to use GPT for sentiment analysis (see also [here](../nlp/overview.qmd#sec-sentiment-analysis)).\nLet's say we want GPT to classify a text into one of the following four categories: \n\n::: {#ce80e6f9 .cell execution_count=2}\n``` {.python .cell-code}\nsentiment_categories = [\n    \"positive\", \n    \"negative\",\n    \"neutral\",\n    \"mixed\"\n]\n```\n:::\n\n\nWe could do the following:\n\n\n\n::: {#40c825ed .cell execution_count=4}\n``` {.python .cell-code}\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I really did not like the movie.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL\n)\n\nprint(f\"Response: '{response.choices[0].message.content}'\")\n```\n:::\n\n\n::: {#c6ffeb88 .cell execution_count=5}\n\n::: {.cell-output .cell-output-stdout}\n```\nResponse: 'Category: Negative'\n```\n:::\n:::\n\n\nIt is easy to spot the problem: GPT does not necessarily answer in the way we expect or want it to. \nIn this case, instead of simply returning the correct category, it also returns the string `Category: ` alongside it (and capitalized `Negative`).\nSo if we were to use the answer in a program or data base, we'd now again have to use some NLP techniques to parse it in order eventually retrieve **exactly** the category we were looking for: `negative`. \nWhat we need instead is a way to constrain GPT to a specific way of answering, and this is where `functions` or `tools` come into play (see also [Function calling](https://platform.openai.com/docs/guides/function-calling){.external} and [Function calling (cookbook)](https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models){.external}).\n\nThis concept allows us to specify the exact output format we expect to receive from GPT (it is called functions since ideally we want to call a function directly on the output of GPT so it has to be in a specific format). \n\n::: {#b78f10d6 .cell execution_count=6}\n``` {.python .cell-code}\n# this looks intimidating but isn't that complicated\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"analyze_sentiment\",\n            \"description\": \"Analyze the sentiment in a given text.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"sentiment\": {\n                        \"type\": \"string\",\n                        \"enum\": sentiment_categories,\n                        \"description\": f\"The sentiment of the text.\"\n                    }\n                },\n                \"required\": [\"sentiment\"],\n            }\n        }\n    }\n]\n```\n:::\n\n\n::: {#e1e40f2a .cell execution_count=7}\n``` {.python .cell-code}\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I really did not like the movie.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL,\n    tools=tools,\n    tool_choice={\n        \"type\": \"function\", \n        \"function\": {\"name\": \"analyze_sentiment\"}}\n)\n\nprint(f\"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nResponse: '{\n\"sentiment\": \"negative\"\n}'\n```\n:::\n:::\n\n\nWe can now easily extract what we need: \n\n::: {#5e3c869b .cell execution_count=8}\n``` {.python .cell-code}\nimport json \nresult = json.loads(response.choices[0].message.tool_calls[0].function.arguments) # remember that the answer is a string\nprint(result[\"sentiment\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nnegative\n```\n:::\n:::\n\n\nWe can also include multiple function parameters if our desired output has multiple components.\nLet's try to include another parameter which includes the `reason` for the sentiment.\n\n::: {#9c902d9f .cell execution_count=9}\n``` {.python .cell-code}\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"analyze_sentiment\",\n            \"description\": \"Analyze the sentiment in a given text.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"sentiment\": {\n                        \"type\": \"string\",\n                        \"enum\": sentiment_categories,\n                        \"description\": f\"The sentiment of the text.\"\n                    },\n                    \"reason\": {\n                        \"type\": \"string\",\n                        \"description\": \"The reason for the sentiment in few words. If there is no information, do not make assumptions and leave blank.\"\n                    }\n                },\n                \"required\": [\"sentiment\", \"reason\"],\n            }\n        }\n    }\n]\n```\n:::\n\n\n::: {#d77c3b61 .cell execution_count=10}\n``` {.python .cell-code}\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}. If you can, also extract the reason.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I loved the movie, Johnny Depp is a great actor.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL,\n    tools=tools,\n    tool_choice={\n        \"type\": \"function\", \n        \"function\": {\"name\": \"analyze_sentiment\"}}\n)\n\nprint(f\"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nResponse: '{\n\"sentiment\": \"positive\",\n\"reason\": \"Appreciation for the movie and actor\"\n}'\n```\n:::\n:::\n\n\nHere, again, we could also constrain the possibilities for the `reason` to a certain set. \nHence, functions are great to have more consistent answers of the language model such that we can use it in applications.\n\n",
     "supporting": [
       "parameterization_files"
     ],
diff --git a/_freeze/llm/prompting/execute-results/html.json b/_freeze/llm/prompting/execute-results/html.json
index bb0b0f3..0a75cb9 100644
--- a/_freeze/llm/prompting/execute-results/html.json
+++ b/_freeze/llm/prompting/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "d550bb699b724b045b6d42c1602ebf1c",
+  "hash": "c0652d9a13ad67e81575d6e1d131ce24",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Prompting\nformat:\n  html:\n    code-fold: true\n---\n\n**Resources:** \n- https://platform.openai.com/docs/guides/prompt-engineering\n- \n\n",
+    "markdown": "---\ntitle: Prompting\nformat:\n  html:\n    code-fold: true\n---\n\nLearning prompting is a science for itself. \nThe difficulty lies in the probabilistic nature of the language models. \nThat means, small changes to your prompt (that you might even find insignificant) can have a large impact on the result/the answer.\nIn particular, the changes do not have to be \"logical\", i.e., depend on your changes in a comprehensible or reproducible way. \nThis can sometimes be frustrating, but can also be avoided in many cases when following the right instructions for prompting. \nTo do so, let's best follow the creators.\n\n\n::: {.callout-note}\n_The following is taken from the [OpenAI Guide](https://platform.openai.com/docs/guides/prompt-engineering){.external}_\n:::\n\n#### Write clear instructions\nThese models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.\n\nTactics:\n\n- Include details in your query to get more relevant answers\n- Ask the model to adopt a persona\n- Use delimiters to clearly indicate distinct parts of the input\n- Specify the steps required to complete a task\n- Provide examples\n- Specify the desired length of the output\n<br/><br/>\n\n#### Provide reference text\nLanguage models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.\n\nTactics:\n\n- Instruct the model to answer using a reference text\n- Instruct the model to answer with citations from a reference text\n<br/><br/>\n\n#### Split complex tasks into simpler subtasks\nJust as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks - submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.\n\nTactics:\n\n- Use intent classification to identify the most relevant instructions for a user query\n- For dialogue applications that require very long conversations, summarize or filter previous dialogue\n- Summarize long documents piecewise and construct a full summary recursively\n<br/><br/>\n\n#### Give the model time to \"think\"\nIf asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a \"chain of thought\" before an answer can help the model reason its way toward correct answers more reliably.\n\nTactics:\n\n- Instruct the model to work out its own solution before rushing to a conclusion\n- Use inner monologue or a sequence of queries to hide the model's reasoning process\n- Ask the model if it missed anything on previous passes\n<br/><br/>\n\n#### Use external tools\nCompensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI's Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.\n\nTactics:\n\n- Use embeddings-based search to implement efficient knowledge retrieval\n- Use code execution to perform more accurate calculations or call external APIs\n- Give the model access to specific functions\n<br/><br/>\n\n#### Test changes systematically\nImproving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an \"eval\").\n\nTactic:\n\n- Evaluate model outputs with reference to gold-standard answers\n\n",
     "supporting": [
       "prompting_files"
     ],
diff --git a/_freeze/nlp/overview/execute-results/html.json b/_freeze/nlp/overview/execute-results/html.json
index f49bb38..867a963 100644
--- a/_freeze/nlp/overview/execute-results/html.json
+++ b/_freeze/nlp/overview/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "511ca1f1badee45c381c211cb06bcc5e",
+  "hash": "f5b2521cd9ea65125fb9f9725dfe0627",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Overview of NLP\nformat:\n  html:\n    code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n  \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n<details>\n<summary>Code example</summary>\n\n::: {#d87df8e1 .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n    print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The       | POS Tag: DET  \nToken: sun       | POS Tag: NOUN \nToken: sets      | POS Tag: VERB \nToken: behind    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: mountains | POS Tag: NOUN \nToken: ,         | POS Tag: PUNCT\nToken: casting   | POS Tag: VERB \nToken: a         | POS Tag: DET  \nToken: golden    | POS Tag: ADJ  \nToken: glow      | POS Tag: NOUN \nToken: across    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: sky       | POS Tag: NOUN \nToken: .         | POS Tag: PUNCT\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n<details>\n<summary>Code example</summary>\n\n::: {#acc47b23 .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n    print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple                | Label: ORG\nEntity: U.K.                 | Label: GPE\nEntity: London               | Label: GPE\nEntity: $1 billion           | Label: MONEY\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n<details>\n<summary>Code example</summary>\n\n::: {#02b6acb2 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n    sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n    sentiment_label = \"Negative\"\nelse:\n    sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n</details>\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n<details>\n<summary>Code example</summary>\n\n::: {#356c23f3 .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n    \"I love this product!\",\n    \"This product is terrible.\",\n    \"Great service, highly recommended.\",\n    \"I had a bad experience with this company.\",\n]\nlabels = [\n    \"Positive\",\n    \"Negative\",\n    \"Positive\",\n    \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n</details>\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n",
+    "markdown": "---\ntitle: Overview of NLP\nformat:\n  html:\n    code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pre-trained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n  \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n<details>\n<summary>Code example</summary>\n\n::: {#1de29527 .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n    print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The       | POS Tag: DET  \nToken: sun       | POS Tag: NOUN \nToken: sets      | POS Tag: VERB \nToken: behind    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: mountains | POS Tag: NOUN \nToken: ,         | POS Tag: PUNCT\nToken: casting   | POS Tag: VERB \nToken: a         | POS Tag: DET  \nToken: golden    | POS Tag: ADJ  \nToken: glow      | POS Tag: NOUN \nToken: across    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: sky       | POS Tag: NOUN \nToken: .         | POS Tag: PUNCT\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n<details>\n<summary>Code example</summary>\n\n::: {#e9d253d7 .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n    print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple                | Label: ORG\nEntity: U.K.                 | Label: GPE\nEntity: London               | Label: GPE\nEntity: $1 billion           | Label: MONEY\n```\n:::\n:::\n\n\n</details>\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis {#sec-sentiment-analysis}\n\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n<details>\n<summary>Code example</summary>\n\n::: {#8ee69b14 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n    sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n    sentiment_label = \"Negative\"\nelse:\n    sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n</details>\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n<details>\n<summary>Code example</summary>\n\n::: {#b55c09cb .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n    \"I love this product!\",\n    \"This product is terrible.\",\n    \"Great service, highly recommended.\",\n    \"I had a bad experience with this company.\",\n]\nlabels = [\n    \"Positive\",\n    \"Negative\",\n    \"Positive\",\n    \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n</details>\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n",
     "supporting": [
       "overview_files"
     ],
diff --git a/_freeze/nlp/tokenization/execute-results/html.json b/_freeze/nlp/tokenization/execute-results/html.json
index 9be1827..686e0e3 100644
--- a/_freeze/nlp/tokenization/execute-results/html.json
+++ b/_freeze/nlp/tokenization/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "015b9b99e573753e3289e6d5f046eca5",
+  "hash": "b9faa2cc27aa2ac93360727c228d3d38",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Tokenization\nformat:\n  html:\n    code-fold: false\n---\n\nTODO: Some introductory sentence.\n\n## Simple word tokenization\nA key element for a computer to understand the words we speak or type is the concept of word tokenization. \nFor a human, the sentence \n\n::: {#4561108d .cell execution_count=1}\n``` {.python .cell-code}\nsentence = \"I love reading science fiction books or books about science.\"\n```\n:::\n\n\nis easy to understand since we are able to split the sentence into its individual parts in order to figure out the meaning of the full sentence.\nFor a computer, the sentence is just a simple string of characters, like any other word or longer text.\nIn order to make a computer understand the meaning of a sentence, we need to help break it down into its relevant parts.\n\nSimply put, word tokenization is the process of breaking down a piece of text into individual words or so-called tokens. \nIt is like taking a sentence and splitting it into smaller pieces, where each piece represents a word.\nWord tokenization involves analyzing the text character by character and identifying boundaries between words. \nIt uses various rules and techniques to decide where one word ends and the next one begins. \nFor example, spaces, punctuation marks, and special characters often serve as natural boundaries between words.\n\nSo let's start breaking down the sentence into its individual parts.\n\n::: {#36ccc15c .cell execution_count=2}\n``` {.python .cell-code}\ntokenized_sentence = sentence.split(\" \")\nprint(tokenized_sentence)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science.']\n```\n:::\n:::\n\n\nOnce we have tokenized the sentence, we can start anaylzing it with some simple statistical methods. \nFor example, in order to figure out what the sentence might be about, we could count the most frequent words. \n\n::: {#4b5bb732 .cell execution_count=3}\n``` {.python .cell-code}\nfrom collections import Counter\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[('books', 2), ('I', 1)]\n```\n:::\n:::\n\n\nUnfortunately, we already realize that we have not done the best job with our \"tokenizer\": The second occurence of the word `science` is missing do to the punctuation. \nWhile this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let's get rid of it. \n\n::: {#54b1654a .cell execution_count=4}\n``` {.python .cell-code}\ntokenized_sentence = sentence.replace(\".\", \" \").split(\" \")\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[('science', 2), ('books', 2)]\n```\n:::\n:::\n\n\nSo that worked.\nAs you can imagine, tokenization can get increasingly difficult when we have to deal with all sorts of situations in larger corpora of texts (see also the exercise). \nSo it is great that there are already all sorts of libraries available that can help us with this process. \n\n::: {#7a06e431 .cell execution_count=5}\n``` {.python .cell-code}\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\n\ntokenized_sentence = wordpunct_tokenize(sentence)\ntokenized_sentence = [t for t in tokenized_sentence if t not in punctuation]\nprint(tokenized_sentence)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science']\n```\n:::\n:::\n\n\n## Advanced word tokenization\n\nTODO: Write\n\n\nFrom the docs: \n\nhttps://platform.openai.com/tokenizer\n\nA helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).\n\n",
+    "markdown": "---\ntitle: Tokenization\nformat:\n  html:\n    code-fold: false\n---\n\nTODO: Some introductory sentence.\n\n## Simple word tokenization\nA key element for a computer to understand the words we speak or type is the concept of word tokenization. \nFor a human, the sentence \n\n::: {#706b1324 .cell execution_count=1}\n``` {.python .cell-code}\nsentence = \"I love reading science fiction books or books about science.\"\n```\n:::\n\n\nis easy to understand since we are able to split the sentence into its individual parts in order to figure out the meaning of the full sentence.\nFor a computer, the sentence is just a simple string of characters, like any other word or longer text.\nIn order to make a computer understand the meaning of a sentence, we need to help break it down into its relevant parts.\n\nSimply put, word tokenization is the process of breaking down a piece of text into individual words or so-called tokens. \nIt is like taking a sentence and splitting it into smaller pieces, where each piece represents a word.\nWord tokenization involves analyzing the text character by character and identifying boundaries between words. \nIt uses various rules and techniques to decide where one word ends and the next one begins. \nFor example, spaces, punctuation marks, and special characters often serve as natural boundaries between words.\n\nSo let's start breaking down the sentence into its individual parts.\n\n::: {#650aa91e .cell execution_count=2}\n``` {.python .cell-code}\ntokenized_sentence = sentence.split(\" \")\nprint(tokenized_sentence)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science.']\n```\n:::\n:::\n\n\nOnce we have tokenized the sentence, we can start analyzing it with some simple statistical methods. \nFor example, in order to figure out what the sentence might be about, we could count the most frequent words. \n\n::: {#f5e22bc0 .cell execution_count=3}\n``` {.python .cell-code}\nfrom collections import Counter\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[('books', 2), ('I', 1)]\n```\n:::\n:::\n\n\nUnfortunately, we already realize that we have not done the best job with our \"tokenizer\": The second occurrence of the word `science` is missing do to the punctuation. \nWhile this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let's get rid of it. \n\n::: {#4547c10a .cell execution_count=4}\n``` {.python .cell-code}\ntokenized_sentence = sentence.replace(\".\", \" \").split(\" \")\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[('science', 2), ('books', 2)]\n```\n:::\n:::\n\n\nSo that worked.\nAs you can imagine, tokenization can get increasingly difficult when we have to deal with all sorts of situations in larger corpora of texts (see also the exercise). \nSo it is great that there are already all sorts of libraries available that can help us with this process. \n\n::: {#f635468e .cell execution_count=5}\n``` {.python .cell-code}\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\n\ntokenized_sentence = wordpunct_tokenize(sentence)\ntokenized_sentence = [t for t in tokenized_sentence if t not in punctuation]\nprint(tokenized_sentence)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science']\n```\n:::\n:::\n\n\n## Advanced word tokenization\n\nThe above ideas illustrate well the idea of tokenization of splitting text into smaller chunks that we can feed to a language model.\nIn practice, especially in models like GPT, a critical component is the vocabulary or the set of unique words or tokens the model understands.\nTraditional approaches use fixed-size vocabularies, which means every unique word in the corpus has its own representation (index or embedding) in the model's vocabulary. \nHowever, as the vocabulary size increases (for example, by including more languages), so does the memory requirement, which can be impractical for large-scale language models. \nOne solution is the so-called bit-pair encoding.\nBit pair encoding is a data compression technique specifically designed to tackle the issue of large vocabularies in language models. \nInstead of assigning a unique index or embedding to each token, bit pair encoding identifies frequent pairs of characters (bits) within the corpus and represents them as a single token. \nThis effectively reduces the size of the vocabulary while preserving the essential information needed for language modeling tasks.\n\n\n### How Bit Pair Encoding Works:\n\n1. **Tokenization**: The first step in bit pair encoding is tokenization, where the text corpus is broken down into individual tokens. These tokens could be characters, subwords, or words, depending on the tokenization strategy used.\n\n2. **Pair Identification**: Next, the algorithm identifies pairs of characters (bits) that occur frequently within the corpus. These pairs are typically consecutive characters in the text.\n\n3. **Replacement with Single Token**: Once frequent pairs are identified, they are replaced with a single token. This effectively reduces the number of unique tokens in the vocabulary.\n\n4. **Iterative Process**: The process of identifying frequent pairs and replacing them with single tokens is iterative. It continues until a predefined stopping criterion is met, such as reaching a target vocabulary size or when no more frequent pairs can be found.\n\n5. **Vocabulary Construction**: After the iterative process, a vocabulary is constructed, consisting of the single tokens generated through pair replacement, along with any remaining tokens from the original tokenization process.\n\n6. **Encoding and Decoding**: During training and inference, text data is encoded using the constructed vocabulary, where each token is represented by its corresponding index in the vocabulary. During decoding, the indices are mapped back to their respective tokens.\n\n\n::: {.callout-tip}\nIt is very illustrative to use the the OpenAI [tokenizer](https://platform.openai.com/tokenizer){.external} to see how a sentence is split up into different token.\nTry mixing languages and standard as well as more rare words and observe how they are split up.\n\nAnother detailed example can be found [here](https://www.geeksforgeeks.org/byte-pair-encoding-bpe-in-nlp/){.external}.\n:::\n\n\n\n### Advantages of Bit Pair Encoding:\n\n1. **Efficient Memory Usage**: Bit pair encoding significantly reduces the size of the vocabulary, leading to more efficient memory usage, especially in large-scale language models.\n\n2. **Retains Information**: Despite reducing the vocabulary size, bit pair encoding retains important linguistic information by capturing frequent character pairs.\n\n3. **Flexible**: Bit pair encoding is flexible and can be adapted to different tokenization strategies and corpus characteristics.\n\n\n### Limitations and Considerations:\n\n1. **Computational Overhead**: The iterative nature of bit pair encoding can be computationally intensive, especially for large corpora.\n\n2. **Loss of Granularity**: While bit pair encoding reduces vocabulary size, it may lead to a loss of granularity, especially for rare or out-of-vocabulary words.\n\n3. **Tokenization Strategy**: The effectiveness of bit pair encoding depends on the tokenization strategy used and the characteristics of the corpus.\n\n\n\n::: {.callout-tip}\n__From the [OpenAI Guide](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them){.external}__:\n\nA helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).\n:::\n\n",
     "supporting": [
       "tokenization_files"
     ],
diff --git a/_quarto.yml b/_quarto.yml
index 5a734cd..60085ee 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -62,6 +62,7 @@ website:
             - llm/prompting.qmd
             - llm/parameterization.qmd
             - llm/exercises/ex_gpt_parameterization.ipynb
+            - llm/exercises/ex_gpt_ner_with_function_calls.ipynb
 
         - section: "Embeddings"
           contents: 
diff --git a/docs/about/assignment.html b/docs/about/assignment.html
index b9c2e54..5f31739 100644
--- a/docs/about/assignment.html
+++ b/docs/about/assignment.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/about/projects.html b/docs/about/projects.html
index 046bd99..6588e86 100644
--- a/docs/about/projects.html
+++ b/docs/about/projects.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/about/schedule.html b/docs/about/schedule.html
index c5776cd..4cc751f 100644
--- a/docs/about/schedule.html
+++ b/docs/about/schedule.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/embeddings/applications.html b/docs/embeddings/applications.html
index 9c270ce..33545f0 100644
--- a/docs/embeddings/applications.html
+++ b/docs/embeddings/applications.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -416,6 +422,7 @@ <h1 class="title">Applications</h1>
 </header>
 
 
+<p>Build a bot that can answer questions based on documents! Resource: https://platform.openai.com/docs/tutorials/web-qa-embeddings</p>
 
 
 
diff --git a/docs/embeddings/clustering.html b/docs/embeddings/clustering.html
index 5970247..b0aad0a 100644
--- a/docs/embeddings/clustering.html
+++ b/docs/embeddings/clustering.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/embeddings/embeddings.html b/docs/embeddings/embeddings.html
index 6bab2fc..2fc83ed 100644
--- a/docs/embeddings/embeddings.html
+++ b/docs/embeddings/embeddings.html
@@ -31,7 +31,7 @@
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
 <link href="../embeddings/exercises/ex_emb_similarity.html" rel="next">
-<link href="../llm/exercises/ex_gpt_parameterization.html" rel="prev">
+<link href="../llm/exercises/ex_gpt_ner_with_function_calls.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -841,8 +847,8 @@ <h1 class="title">Embeddings</h1>
 </script>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
-      <a href="../llm/exercises/ex_gpt_parameterization.html" class="pagination-link" aria-label="Exercise: GPT Parameterization">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: GPT Parameterization</span>
+      <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="pagination-link" aria-label="Exercise: NER with tool calling">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: NER with tool calling</span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
diff --git a/docs/embeddings/exercises/ex_emb_similarity.html b/docs/embeddings/exercises/ex_emb_similarity.html
index bc0f2b1..b505e21 100644
--- a/docs/embeddings/exercises/ex_emb_similarity.html
+++ b/docs/embeddings/exercises/ex_emb_similarity.html
@@ -319,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/embeddings/visualization.html b/docs/embeddings/visualization.html
index e8faf5f..6c2fa72 100644
--- a/docs/embeddings/visualization.html
+++ b/docs/embeddings/visualization.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/ethics/bias.html b/docs/ethics/bias.html
index 062c5f5..0774dbc 100644
--- a/docs/ethics/bias.html
+++ b/docs/ethics/bias.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/ethics/data_privacy.html b/docs/ethics/data_privacy.html
index 5d6645e..5670e92 100644
--- a/docs/ethics/data_privacy.html
+++ b/docs/ethics/data_privacy.html
@@ -284,6 +284,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/index.html b/docs/index.html
index 7465b08..cc2ce73 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -284,6 +284,12 @@
   <a href="./llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/llm/exercises/ex_gpt_chatbot.html b/docs/llm/exercises/ex_gpt_chatbot.html
index d65ce5f..6954db9 100644
--- a/docs/llm/exercises/ex_gpt_chatbot.html
+++ b/docs/llm/exercises/ex_gpt_chatbot.html
@@ -319,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/llm/exercises/ex_gpt_ner_with_function_calls.html b/docs/llm/exercises/ex_gpt_ner_with_function_calls.html
new file mode 100644
index 0000000..afd3121
--- /dev/null
+++ b/docs/llm/exercises/ex_gpt_ner_with_function_calls.html
@@ -0,0 +1,989 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
+
+<meta charset="utf-8">
+<meta name="generator" content="quarto-1.4.551">
+
+<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
+
+
+<title>Exercise: NER with tool calling</title>
+<style>
+code{white-space: pre-wrap;}
+span.smallcaps{font-variant: small-caps;}
+div.columns{display: flex; gap: min(4vw, 1.5em);}
+div.column{flex: auto; overflow-x: auto;}
+div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+ul.task-list{list-style: none;}
+ul.task-list li input[type="checkbox"] {
+  width: 0.8em;
+  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
+  vertical-align: middle;
+}
+/* CSS for syntax highlighting */
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+  }
+pre.numberSource { margin-left: 3em;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+</style>
+
+
+<script src="../../site_libs/quarto-nav/quarto-nav.js"></script>
+<script src="../../site_libs/quarto-nav/headroom.min.js"></script>
+<script src="../../site_libs/clipboard/clipboard.min.js"></script>
+<script src="../../site_libs/quarto-search/autocomplete.umd.js"></script>
+<script src="../../site_libs/quarto-search/fuse.min.js"></script>
+<script src="../../site_libs/quarto-search/quarto-search.js"></script>
+<meta name="quarto:offset" content="../../">
+<link href="../../embeddings/embeddings.html" rel="next">
+<link href="../../llm/exercises/ex_gpt_parameterization.html" rel="prev">
+<script src="../../site_libs/quarto-html/quarto.js"></script>
+<script src="../../site_libs/quarto-html/popper.min.js"></script>
+<script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
+<script src="../../site_libs/quarto-html/anchor.min.js"></script>
+<link href="../../site_libs/quarto-html/tippy.css" rel="stylesheet">
+<link href="../../site_libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<script src="../../site_libs/bootstrap/bootstrap.min.js"></script>
+<link href="../../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
+<link href="../../site_libs/bootstrap/bootstrap.min.css" rel="stylesheet" id="quarto-bootstrap" data-mode="light">
+<script id="quarto-search-options" type="application/json">{
+  "location": "navbar",
+  "copy-button": false,
+  "collapse-after": 3,
+  "panel-placement": "end",
+  "type": "overlay",
+  "limit": 50,
+  "keyboard-shortcut": [
+    "f",
+    "/",
+    "s"
+  ],
+  "show-item-context": false,
+  "language": {
+    "search-no-results-text": "No results",
+    "search-matching-documents-text": "matching documents",
+    "search-copy-link-title": "Copy link to search",
+    "search-hide-matches-text": "Hide additional matches",
+    "search-more-match-text": "more match in this document",
+    "search-more-matches-text": "more matches in this document",
+    "search-clear-button-title": "Clear",
+    "search-text-placeholder": "",
+    "search-detached-cancel-button-title": "Cancel",
+    "search-submit-button-title": "Submit",
+    "search-label": "Search"
+  }
+}</script>
+
+
+</head>
+
+<body class="nav-sidebar docked nav-fixed fullcontent">
+
+<div id="quarto-search-results"></div>
+  <header id="quarto-header" class="headroom fixed-top">
+    <nav class="navbar navbar-expand-lg " data-bs-theme="dark">
+      <div class="navbar-container container-fluid">
+      <div class="navbar-brand-container mx-auto">
+    <a class="navbar-brand" href="../../index.html">
+    <span class="navbar-title">Sprint: Large Language Models</span>
+    </a>
+  </div>
+            <div id="quarto-search" class="" title="Search"></div>
+          <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
+  <span class="navbar-toggler-icon"></span>
+</button>
+          <div class="collapse navbar-collapse" id="navbarCollapse">
+            <ul class="navbar-nav navbar-nav-scroll me-auto">
+  <li class="nav-item">
+    <a class="nav-link active" href="../../index.html" aria-current="page"> 
+<span class="menu-text">Seminar</span></a>
+  </li>  
+  <li class="nav-item">
+    <a class="nav-link" href="../../resources.html"> 
+<span class="menu-text">Resources</span></a>
+  </li>  
+</ul>
+          </div> <!-- /navcollapse -->
+          <div class="quarto-navbar-tools">
+    <div class="dropdown">
+      <a href="" title="" id="quarto-navigation-tool-dropdown-0" class="quarto-navigation-tool dropdown-toggle px-1" data-bs-toggle="dropdown" aria-expanded="false" aria-label=""><i class="bi bi-github"></i></a>
+      <ul class="dropdown-menu" aria-labelledby="quarto-navigation-tool-dropdown-0">
+          <li>
+            <a class="dropdown-item quarto-navbar-tools-item" href="https://github.com/Kubus42/llm_seminar">
+            Source Code
+            </a>
+          </li>
+          <li>
+            <a class="dropdown-item quarto-navbar-tools-item" href="https://github.com/Kubus42/llm_seminar/issues">
+            Report a Bug
+            </a>
+          </li>
+      </ul>
+    </div>
+</div>
+      </div> <!-- /container-fluid -->
+    </nav>
+  <nav class="quarto-secondary-nav">
+    <div class="container-fluid d-flex">
+      <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
+        <i class="bi bi-layout-text-sidebar-reverse"></i>
+      </button>
+        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html">Exercise: NER with tool calling</a></li></ol></nav>
+        <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
+        </a>
+    </div>
+  </nav>
+</header>
+<!-- content -->
+<div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
+<!-- sidebar -->
+  <nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation docked overflow-auto">
+    <div class="sidebar-menu-container"> 
+    <ul class="list-unstyled mt-1">
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" aria-expanded="false">
+ <span class="menu-text">About</span></a>
+          <a class="sidebar-item-toggle text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" aria-expanded="false" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 ">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../index.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">The Sprint</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../about/schedule.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Schedule</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../about/projects.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Projects</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../about/assignment.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Academic assignment</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" aria-expanded="false">
+ <span class="menu-text">Natural Language Processing</span></a>
+          <a class="sidebar-item-toggle text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" aria-expanded="false" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 ">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Overview of NLP</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/tokenization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Tokenization</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_tokenization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Tokenization</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_word_matching.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Word matching</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/fuzzy_matching.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Fuzzy matching</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_fuzzy_matching.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Fuzzy matching</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/statistical_text_analysis.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Statistical text analysis</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_tfidf.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: TF-IDF</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" aria-expanded="true">
+ <span class="menu-text">Large Language Models</span></a>
+          <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" aria-expanded="true" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-3" class="collapse list-unstyled sidebar-section depth1 show">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Introduction to LLM</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/gpt.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">GPT</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/gpt_api.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">The OpenAI API</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_start.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: OpenAI - Getting started</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_chatbot.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: GPT Chatbot</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/prompting.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Prompting</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/parameterization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Parameterization of GPT</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: GPT Parameterization</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" aria-expanded="false">
+ <span class="menu-text">Embeddings</span></a>
+          <a class="sidebar-item-toggle text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" aria-expanded="false" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-4" class="collapse list-unstyled sidebar-section depth1 ">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/embeddings.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Embeddings</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/exercises/ex_emb_similarity.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Embedding similarity</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/visualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Visualization</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/clustering.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Clustering</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/applications.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Applications</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" aria-expanded="false">
+ <span class="menu-text">Ethical Considerations</span></a>
+          <a class="sidebar-item-toggle text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" aria-expanded="false" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-5" class="collapse list-unstyled sidebar-section depth1 ">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../ethics/bias.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Bias</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../ethics/data_privacy.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Data Privacy</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+        <li class="px-0"><hr class="sidebar-divider hi "></li>
+        <li class="sidebar-item sidebar-item-section">
+      <div class="sidebar-item-container"> 
+            <a class="sidebar-item-text sidebar-link text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" aria-expanded="false">
+ <span class="menu-text">Exercises</span></a>
+          <a class="sidebar-item-toggle text-start collapsed" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" aria-expanded="false" aria-label="Toggle section">
+            <i class="bi bi-chevron-right ms-2"></i>
+          </a> 
+      </div>
+      <ul id="quarto-sidebar-section-6" class="collapse list-unstyled sidebar-section depth1 ">  
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_tokenization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Tokenization</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_word_matching.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Word matching</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../nlp/exercises/ex_fuzzy_matching.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Fuzzy matching</span></a>
+  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../embeddings/exercises/ex_emb_similarity.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: Embedding similarity</span></a>
+  </div>
+</li>
+      </ul>
+  </li>
+    </ul>
+    </div>
+</nav>
+<div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div>
+<!-- margin-sidebar -->
+    
+<!-- main -->
+<main class="content" id="quarto-document-content">
+
+<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html">Exercise: NER with tool calling</a></li></ol></nav>
+<div class="quarto-title">
+<h1 class="title">Exercise: NER with tool calling</h1>
+</div>
+
+
+
+<div class="quarto-title-meta">
+
+    
+  
+    
+  </div>
+  
+
+
+</header>
+
+
+<p><strong>Task:</strong> Create a small script that uses tool (or function calling) to extract the following named entities from a given text: <code>City</code>, <code>State</code>, <code>Person</code>.</p>
+<p><strong>Instructions:</strong></p>
+<ul>
+<li>Define an OpenAI <code>tool</code> with a function <code>named_entity_recognition</code>.</li>
+<li>Choose an appropriate output format, for example: <code>{"named_entities": [{"entity": "Mike", "label": "Person}, {"entity": "Münster", "label": "City"}]}</code></li>
+<li>Define a matching prompt in the role <code>system</code> and the text input for the role <code>user</code>.</li>
+<li>Extract the result.</li>
+</ul>
+<div id="cell-2" class="cell" data-execution_count="1">
+<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># prerequisites</span></span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> llm_utils.client <span class="im">import</span> get_openai_client</span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>MODEL <span class="op">=</span> <span class="st">"gpt4"</span></span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> get_openai_client(</span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>    config_path<span class="op">=</span>os.environ.get(<span class="st">"CONFIG_PATH"</span>)</span>
+<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="co"># here goes your code</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<details>
+<summary>
+Show solution
+</summary>
+<div id="cell-4" class="cell" data-execution_count="19">
+<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>tools <span class="op">=</span> [</span>
+<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>    {</span>
+<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>        <span class="st">"type"</span>: <span class="st">"function"</span>,</span>
+<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>        <span class="st">"function"</span>: {</span>
+<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>            <span class="st">"name"</span>: <span class="st">"named_entity_recognition"</span>,</span>
+<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>            <span class="st">"description"</span>: <span class="st">"Extract the named entities from the given text."</span>,</span>
+<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>            <span class="st">"parameters"</span>: {</span>
+<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>                <span class="st">"type"</span>: <span class="st">"object"</span>,</span>
+<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>                <span class="st">"properties"</span>: {</span>
+<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>                    <span class="st">"named_entities"</span>: {</span>
+<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"type"</span>: <span class="st">"array"</span>,</span>
+<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"description"</span>: <span class="st">"A list of all extracted named entities in form of dictionaries containing the entity name and the label"</span>,</span>
+<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"items"</span>: {</span>
+<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>                            <span class="st">"type"</span>: <span class="st">"object"</span>,</span>
+<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>                            <span class="st">"properties"</span>: {</span>
+<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>                                <span class="st">"entity"</span>: {<span class="st">"type"</span>: <span class="st">"string"</span>}, </span>
+<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a>                                <span class="st">"label"</span>: {<span class="st">"type"</span>: <span class="st">"string"</span>}</span>
+<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a>                            },</span>
+<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a>                            <span class="st">"required"</span>: [<span class="st">"entity"</span>, <span class="st">"label"</span>]</span>
+<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a>                        }</span>
+<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a>                    },</span>
+<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a>                },</span>
+<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a>                <span class="st">"required"</span>: [<span class="st">"named_entities"</span>],</span>
+<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a>            },</span>
+<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a>        }</span>
+<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a>    }</span>
+<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<div id="cell-5" class="cell" data-execution_count="20">
+<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># define the prompts</span></span>
+<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>messages <span class="op">=</span> []</span>
+<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>messages.append({<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="st">"Extract all named entities from the provided text. Possible labels are 'City', 'State' or 'Person'. If no named entities are contained in the text, do not make assumptions and return nothing."</span>})</span>
+<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>messages.append({<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"Leonard Hoffstaedter lives in Pasadena, CA."</span>})</span>
+<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>response <span class="op">=</span> client.chat.completions.create(</span>
+<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>    messages<span class="op">=</span>messages,</span>
+<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>    tools<span class="op">=</span>tools,</span>
+<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>    tool_choice<span class="op">=</span>{<span class="st">"type"</span>: <span class="st">"function"</span>, <span class="st">"function"</span>: {<span class="st">"name"</span>: <span class="st">"named_entity_recognition"</span>}}</span>
+<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a>response</span>
+<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-display" data-execution_count="20">
+<pre><code>ChatCompletion(id='chatcmpl-99ALw7LjaBzZ63s5CMt9wDGn3aWhM', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_1aw75NLIUiEpdYMztdXRDZEh', function=Function(arguments='{\n"named_entities": [\n  {\n    "entity": "Leonard Hoffstaedter",\n    "label": "Person"\n  },\n  {\n    "entity": "Pasadena",\n    "label": "City"\n  },\n  {\n    "entity": "CA",\n    "label": "State"\n  }\n]\n}', name='named_entity_recognition'), type='function')]), content_filter_results={})], created=1711971776, model='gpt-4', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=68, prompt_tokens=142, total_tokens=210), prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])</code></pre>
+</div>
+</div>
+<div id="cell-6" class="cell" data-execution_count="29">
+<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># retrieve the result</span></span>
+<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> json </span>
+<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>result <span class="op">=</span> json.loads(response.choices[<span class="dv">0</span>].message.tool_calls[<span class="dv">0</span>].function.arguments)</span>
+<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> named_entity <span class="kw">in</span> result[<span class="st">"named_entities"</span>]: </span>
+<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>named_entity[<span class="st">'entity'</span>]<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>named_entity[<span class="st">'label'</span>]<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>Leonard Hoffstaedter: Person
+Pasadena: City
+CA: State</code></pre>
+</div>
+</div>
+</details>
+
+
+
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
+<script id="quarto-html-after-body" type="application/javascript">
+window.document.addEventListener("DOMContentLoaded", function (event) {
+  const toggleBodyColorMode = (bsSheetEl) => {
+    const mode = bsSheetEl.getAttribute("data-mode");
+    const bodyEl = window.document.querySelector("body");
+    if (mode === "dark") {
+      bodyEl.classList.add("quarto-dark");
+      bodyEl.classList.remove("quarto-light");
+    } else {
+      bodyEl.classList.add("quarto-light");
+      bodyEl.classList.remove("quarto-dark");
+    }
+  }
+  const toggleBodyColorPrimary = () => {
+    const bsSheetEl = window.document.querySelector("link#quarto-bootstrap");
+    if (bsSheetEl) {
+      toggleBodyColorMode(bsSheetEl);
+    }
+  }
+  toggleBodyColorPrimary();  
+  const icon = "";
+  const anchorJS = new window.AnchorJS();
+  anchorJS.options = {
+    placement: 'right',
+    icon: icon
+  };
+  anchorJS.add('.anchored');
+  const isCodeAnnotation = (el) => {
+    for (const clz of el.classList) {
+      if (clz.startsWith('code-annotation-')) {                     
+        return true;
+      }
+    }
+    return false;
+  }
+  const clipboard = new window.ClipboardJS('.code-copy-button', {
+    text: function(trigger) {
+      const codeEl = trigger.previousElementSibling.cloneNode(true);
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
+        }
+      }
+      return codeEl.innerText;
+    }
+  });
+  clipboard.on('success', function(e) {
+    // button target
+    const button = e.trigger;
+    // don't keep focus
+    button.blur();
+    // flash "checked"
+    button.classList.add('code-copy-button-checked');
+    var currentTitle = button.getAttribute("title");
+    button.setAttribute("title", "Copied!");
+    let tooltip;
+    if (window.bootstrap) {
+      button.setAttribute("data-bs-toggle", "tooltip");
+      button.setAttribute("data-bs-placement", "left");
+      button.setAttribute("data-bs-title", "Copied!");
+      tooltip = new bootstrap.Tooltip(button, 
+        { trigger: "manual", 
+          customClass: "code-copy-button-tooltip",
+          offset: [0, -8]});
+      tooltip.show();    
+    }
+    setTimeout(function() {
+      if (tooltip) {
+        tooltip.hide();
+        button.removeAttribute("data-bs-title");
+        button.removeAttribute("data-bs-toggle");
+        button.removeAttribute("data-bs-placement");
+      }
+      button.setAttribute("title", currentTitle);
+      button.classList.remove('code-copy-button-checked');
+    }, 1000);
+    // clear code selection
+    e.clearSelection();
+  });
+    var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
+    var mailtoRegex = new RegExp(/^mailto:/);
+      var filterRegex = new RegExp('/' + window.location.host + '/');
+    var isInternal = (href) => {
+        return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
+    }
+    // Inspect non-navigation links and adorn them if external
+ 	var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool)');
+    for (var i=0; i<links.length; i++) {
+      const link = links[i];
+      if (!isInternal(link.href)) {
+        // undo the damage that might have been done by quarto-nav.js in the case of
+        // links that we want to consider external
+        if (link.dataset.originalHref !== undefined) {
+          link.href = link.dataset.originalHref;
+        }
+      }
+    }
+  function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
+    const config = {
+      allowHTML: true,
+      maxWidth: 500,
+      delay: 100,
+      arrow: false,
+      appendTo: function(el) {
+          return el.parentElement;
+      },
+      interactive: true,
+      interactiveBorder: 10,
+      theme: 'quarto',
+      placement: 'bottom-start',
+    };
+    if (contentFn) {
+      config.content = contentFn;
+    }
+    if (onTriggerFn) {
+      config.onTrigger = onTriggerFn;
+    }
+    if (onUntriggerFn) {
+      config.onUntrigger = onUntriggerFn;
+    }
+    window.tippy(el, config); 
+  }
+  const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
+  for (var i=0; i<noterefs.length; i++) {
+    const ref = noterefs[i];
+    tippyHover(ref, function() {
+      // use id or data attribute instead here
+      let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
+      try { href = new URL(href).hash; } catch {}
+      const id = href.replace(/^#\/?/, "");
+      const note = window.document.getElementById(id);
+      if (note) {
+        return note.innerHTML;
+      } else {
+        return "";
+      }
+    });
+  }
+  const xrefs = window.document.querySelectorAll('a.quarto-xref');
+  const processXRef = (id, note) => {
+    // Strip column container classes
+    const stripColumnClz = (el) => {
+      el.classList.remove("page-full", "page-columns");
+      if (el.children) {
+        for (const child of el.children) {
+          stripColumnClz(child);
+        }
+      }
+    }
+    stripColumnClz(note)
+    if (id === null || id.startsWith('sec-')) {
+      // Special case sections, only their first couple elements
+      const container = document.createElement("div");
+      if (note.children && note.children.length > 2) {
+        container.appendChild(note.children[0].cloneNode(true));
+        for (let i = 1; i < note.children.length; i++) {
+          const child = note.children[i];
+          if (child.tagName === "P" && child.innerText === "") {
+            continue;
+          } else {
+            container.appendChild(child.cloneNode(true));
+            break;
+          }
+        }
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(container);
+        }
+        return container.innerHTML
+      } else {
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(note);
+        }
+        return note.innerHTML;
+      }
+    } else {
+      // Remove any anchor links if they are present
+      const anchorLink = note.querySelector('a.anchorjs-link');
+      if (anchorLink) {
+        anchorLink.remove();
+      }
+      if (window.Quarto?.typesetMath) {
+        window.Quarto.typesetMath(note);
+      }
+      // TODO in 1.5, we should make sure this works without a callout special case
+      if (note.classList.contains("callout")) {
+        return note.outerHTML;
+      } else {
+        return note.innerHTML;
+      }
+    }
+  }
+  for (var i=0; i<xrefs.length; i++) {
+    const xref = xrefs[i];
+    tippyHover(xref, undefined, function(instance) {
+      instance.disable();
+      let url = xref.getAttribute('href');
+      let hash = undefined; 
+      if (url.startsWith('#')) {
+        hash = url;
+      } else {
+        try { hash = new URL(url).hash; } catch {}
+      }
+      if (hash) {
+        const id = hash.replace(/^#\/?/, "");
+        const note = window.document.getElementById(id);
+        if (note !== null) {
+          try {
+            const html = processXRef(id, note.cloneNode(true));
+            instance.setContent(html);
+          } finally {
+            instance.enable();
+            instance.show();
+          }
+        } else {
+          // See if we can fetch this
+          fetch(url.split('#')[0])
+          .then(res => res.text())
+          .then(html => {
+            const parser = new DOMParser();
+            const htmlDoc = parser.parseFromString(html, "text/html");
+            const note = htmlDoc.getElementById(id);
+            if (note !== null) {
+              const html = processXRef(id, note);
+              instance.setContent(html);
+            } 
+          }).finally(() => {
+            instance.enable();
+            instance.show();
+          });
+        }
+      } else {
+        // See if we can fetch a full url (with no hash to target)
+        // This is a special case and we should probably do some content thinning / targeting
+        fetch(url)
+        .then(res => res.text())
+        .then(html => {
+          const parser = new DOMParser();
+          const htmlDoc = parser.parseFromString(html, "text/html");
+          const note = htmlDoc.querySelector('main.content');
+          if (note !== null) {
+            // This should only happen for chapter cross references
+            // (since there is no id in the URL)
+            // remove the first header
+            if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
+              note.children[0].remove();
+            }
+            const html = processXRef(null, note);
+            instance.setContent(html);
+          } 
+        }).finally(() => {
+          instance.enable();
+          instance.show();
+        });
+      }
+    }, function(instance) {
+    });
+  }
+      let selectedAnnoteEl;
+      const selectorForAnnotation = ( cell, annotation) => {
+        let cellAttr = 'data-code-cell="' + cell + '"';
+        let lineAttr = 'data-code-annotation="' +  annotation + '"';
+        const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
+        return selector;
+      }
+      const selectCodeLines = (annoteEl) => {
+        const doc = window.document;
+        const targetCell = annoteEl.getAttribute("data-target-cell");
+        const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
+        const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
+        const lines = annoteSpan.getAttribute("data-code-lines").split(",");
+        const lineIds = lines.map((line) => {
+          return targetCell + "-" + line;
+        })
+        let top = null;
+        let height = null;
+        let parent = null;
+        if (lineIds.length > 0) {
+            //compute the position of the single el (top and bottom and make a div)
+            const el = window.document.getElementById(lineIds[0]);
+            top = el.offsetTop;
+            height = el.offsetHeight;
+            parent = el.parentElement.parentElement;
+          if (lineIds.length > 1) {
+            const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
+            const bottom = lastEl.offsetTop + lastEl.offsetHeight;
+            height = bottom - top;
+          }
+          if (top !== null && height !== null && parent !== null) {
+            // cook up a div (if necessary) and position it 
+            let div = window.document.getElementById("code-annotation-line-highlight");
+            if (div === null) {
+              div = window.document.createElement("div");
+              div.setAttribute("id", "code-annotation-line-highlight");
+              div.style.position = 'absolute';
+              parent.appendChild(div);
+            }
+            div.style.top = top - 2 + "px";
+            div.style.height = height + 4 + "px";
+            div.style.left = 0;
+            let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
+            if (gutterDiv === null) {
+              gutterDiv = window.document.createElement("div");
+              gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
+              gutterDiv.style.position = 'absolute';
+              const codeCell = window.document.getElementById(targetCell);
+              const gutter = codeCell.querySelector('.code-annotation-gutter');
+              gutter.appendChild(gutterDiv);
+            }
+            gutterDiv.style.top = top - 2 + "px";
+            gutterDiv.style.height = height + 4 + "px";
+          }
+          selectedAnnoteEl = annoteEl;
+        }
+      };
+      const unselectCodeLines = () => {
+        const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
+        elementsIds.forEach((elId) => {
+          const div = window.document.getElementById(elId);
+          if (div) {
+            div.remove();
+          }
+        });
+        selectedAnnoteEl = undefined;
+      };
+        // Handle positioning of the toggle
+    window.addEventListener(
+      "resize",
+      throttle(() => {
+        elRect = undefined;
+        if (selectedAnnoteEl) {
+          selectCodeLines(selectedAnnoteEl);
+        }
+      }, 10)
+    );
+    function throttle(fn, ms) {
+    let throttle = false;
+    let timer;
+      return (...args) => {
+        if(!throttle) { // first call gets through
+            fn.apply(this, args);
+            throttle = true;
+        } else { // all the others get throttled
+            if(timer) clearTimeout(timer); // cancel #2
+            timer = setTimeout(() => {
+              fn.apply(this, args);
+              timer = throttle = false;
+            }, ms);
+        }
+      };
+    }
+      // Attach click handler to the DT
+      const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
+      for (const annoteDlNode of annoteDls) {
+        annoteDlNode.addEventListener('click', (event) => {
+          const clickedEl = event.target;
+          if (clickedEl !== selectedAnnoteEl) {
+            unselectCodeLines();
+            const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
+            if (activeEl) {
+              activeEl.classList.remove('code-annotation-active');
+            }
+            selectCodeLines(clickedEl);
+            clickedEl.classList.add('code-annotation-active');
+          } else {
+            // Unselect the line
+            unselectCodeLines();
+            clickedEl.classList.remove('code-annotation-active');
+          }
+        });
+      }
+  const findCites = (el) => {
+    const parentEl = el.parentElement;
+    if (parentEl) {
+      const cites = parentEl.dataset.cites;
+      if (cites) {
+        return {
+          el,
+          cites: cites.split(' ')
+        };
+      } else {
+        return findCites(el.parentElement)
+      }
+    } else {
+      return undefined;
+    }
+  };
+  var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
+  for (var i=0; i<bibliorefs.length; i++) {
+    const ref = bibliorefs[i];
+    const citeInfo = findCites(ref);
+    if (citeInfo) {
+      tippyHover(citeInfo.el, function() {
+        var popup = window.document.createElement('div');
+        citeInfo.cites.forEach(function(cite) {
+          var citeDiv = window.document.createElement('div');
+          citeDiv.classList.add('hanging-indent');
+          citeDiv.classList.add('csl-entry');
+          var biblioDiv = window.document.getElementById('ref-' + cite);
+          if (biblioDiv) {
+            citeDiv.innerHTML = biblioDiv.innerHTML;
+          }
+          popup.appendChild(citeDiv);
+        });
+        return popup.innerHTML;
+      });
+    }
+  }
+});
+</script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../llm/exercises/ex_gpt_parameterization.html" class="pagination-link" aria-label="Exercise: GPT Parameterization">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: GPT Parameterization</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../embeddings/embeddings.html" class="pagination-link" aria-label="Embeddings">
+        <span class="nav-page-text">Embeddings</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
+</div> <!-- /content -->
+<footer class="footer">
+  <div class="nav-footer">
+    <div class="nav-footer-left">
+      &nbsp;
+    </div>   
+    <div class="nav-footer-center">
+      &nbsp;
+    </div>
+    <div class="nav-footer-right">
+<p>Copyright 2024, Julian Rasch</p>
+</div>
+  </div>
+</footer>
+
+
+
+
+</body></html>
\ No newline at end of file
diff --git a/docs/llm/exercises/ex_gpt_parameterization.html b/docs/llm/exercises/ex_gpt_parameterization.html
index 89e3bbe..f40f3eb 100644
--- a/docs/llm/exercises/ex_gpt_parameterization.html
+++ b/docs/llm/exercises/ex_gpt_parameterization.html
@@ -20,6 +20,40 @@
   margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
   vertical-align: middle;
 }
+/* CSS for syntax highlighting */
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+  }
+pre.numberSource { margin-left: 3em;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
 </style>
 
 
@@ -30,7 +64,7 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
-<link href="../../embeddings/embeddings.html" rel="next">
+<link href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" rel="next">
 <link href="../../llm/parameterization.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
@@ -285,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link active">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -418,10 +458,26 @@ <h1 class="title">Exercise: GPT Parameterization</h1>
 
 <p><strong>Task:</strong> Explore the parameterization possibilities of the OpenAI API for GPT.</p>
 <p><strong>Instructions:</strong></p>
+<p>Some possibilities are:</p>
 <ul>
-<li>Some instructions</li>
+<li>Use the <code>system</code> role in order to give instructions to the language model before the interaction with the user starts in order to change the response style of the model.</li>
+<li>Change the <code>temparature</code> <strong>or</strong> <code>top_p</code> parameters and explore the effect on your prompts.</li>
+<li>Use the</li>
 </ul>
-<p>TODO: Finalize this!</p>
+<div id="cell-2" class="cell" data-execution_count="2">
+<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> llm_utils.client <span class="im">import</span> get_openai_client</span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>MODEL <span class="op">=</span> <span class="st">"gpt4"</span></span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> get_openai_client(</span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    config_path<span class="op">=</span>os.environ.get(<span class="st">"CONFIG_PATH"</span>)</span>
+<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<div id="cell-3" class="cell">
+<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># here goes your code</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
 
 
 
@@ -842,8 +898,8 @@ <h1 class="title">Exercise: GPT Parameterization</h1>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
-      <a href="../../embeddings/embeddings.html" class="pagination-link" aria-label="Embeddings">
-        <span class="nav-page-text">Embeddings</span> <i class="bi bi-arrow-right-short"></i>
+      <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="pagination-link" aria-label="Exercise: NER with tool calling">
+        <span class="nav-page-text">Exercise: NER with tool calling</span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/docs/llm/exercises/ex_gpt_start.html b/docs/llm/exercises/ex_gpt_start.html
index 20c09d8..1f355dd 100644
--- a/docs/llm/exercises/ex_gpt_start.html
+++ b/docs/llm/exercises/ex_gpt_start.html
@@ -285,6 +285,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/llm/gpt.html b/docs/llm/gpt.html
index 0039f80..4dabfb7 100644
--- a/docs/llm/gpt.html
+++ b/docs/llm/gpt.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/llm/gpt_api.html b/docs/llm/gpt_api.html
index ef3646c..30fde58 100644
--- a/docs/llm/gpt_api.html
+++ b/docs/llm/gpt_api.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -450,12 +456,24 @@ <h1 class="title">The OpenAI API</h1>
 </header>
 
 
+<div class="callout callout-style-default callout-note callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Note
+</div>
+</div>
+<div class="callout-body-container callout-body">
 <p>Resource: <a href="https://platform.openai.com/docs/introduction" class="external">OpenAI API docs</a></p>
+</div>
+</div>
 <p>Let’s get started with the OpenAI API for GPT.</p>
 <section id="authentication" class="level3">
 <h3 class="anchored" data-anchor-id="authentication">Authentication</h3>
 <p>Getting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the Jupyterlab (TODO: Link).</p>
-<div id="a9cb3d89" class="cell" data-execution_count="1">
+<div id="1c41ead3" class="cell" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># setting up the client in Python</span></span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
@@ -470,7 +488,7 @@ <h3 class="anchored" data-anchor-id="authentication">Authentication</h3>
 <h3 class="anchored" data-anchor-id="requesting-completions">Requesting Completions</h3>
 <p>Most interaction with GPT and other models consist in generating completions for certain tasks (TODO: Link to completions)</p>
 <p>To request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)</p>
-<div id="662daa55" class="cell" data-execution_count="2">
+<div id="a7e7ff6f" class="cell" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># creating a completion</span></span>
 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>chat_completion <span class="op">=</span> client.chat.completions.create(</span>
 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    messages<span class="op">=</span>[</span>
diff --git a/docs/llm/intro.html b/docs/llm/intro.html
index b6a3775..e0c0482 100644
--- a/docs/llm/intro.html
+++ b/docs/llm/intro.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/llm/parameterization.html b/docs/llm/parameterization.html
index fe59998..73e7807 100644
--- a/docs/llm/parameterization.html
+++ b/docs/llm/parameterization.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -450,39 +456,191 @@ <h1 class="title">Parameterization of GPT</h1>
 </header>
 
 
+<p>The GPT models provided by OpenAI provide a variety of parameters that can change the way the language model responds. Below you can find a list of the most important ones.</p>
 <ul>
-<li><p><strong>Temperature</strong>: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It’s often used to balance between generating safe, conservative responses and more novel, imaginative ones.</p></li>
-<li><p><strong>Max Tokens</strong>: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.</p></li>
-<li><p><strong>Top P (Nucleus Sampling)</strong>: Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It’s particularly useful for generating diverse and contextually relevant responses.</p></li>
-<li><p><strong>Frequency Penalty</strong>: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.</p></li>
-<li><p><strong>Presence Penalty</strong>: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It’s useful for generating more creative and novel outputs that are not directly predictable from the input.</p></li>
-<li><p><strong>Stop Sequence</strong>: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.</p></li>
-<li><p><strong>Repetition Penalty</strong>: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It’s particularly useful for generating coherent and diverse text without excessive redundancy.</p></li>
-<li><p><strong>Length Penalty</strong>: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.</p></li>
+<li><p><strong>Temperature</strong>: Temperature (<code>temperaure</code>) is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It’s often used to balance between generating safe, conservative responses and more novel, imaginative ones.</p></li>
+<li><p><strong>Max Tokens</strong>: Max Tokens (<code>max_tokens</code>) limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.</p></li>
+<li><p><strong>Top P (Nucleus Sampling)</strong>: Top P (<code>top_p</code>), also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It’s particularly useful for generating diverse and contextually relevant responses.</p></li>
+<li><p><strong>Frequency Penalty</strong>: Frequency Penalty (<code>frequency_penalty</code>) penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.</p></li>
+<li><p><strong>Presence Penalty</strong>: Presence Penalty (<code>presence_penalty</code>) penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It’s useful for generating more creative and novel outputs that are not directly predictable from the input.</p></li>
+<li><p><strong>Stop Sequence</strong>: Stop Sequence (<code>stop</code>) specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.</p></li>
 </ul>
 <section id="roles" class="level2">
 <h2 class="anchored" data-anchor-id="roles">Roles:</h2>
-<div id="3a526819" class="cell" data-execution_count="1">
-<details class="code-fold">
-<summary>Code</summary>
-<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> openai <span class="im">import</span> OpenAI</span>
-<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> OpenAI()</span>
+<p>In order to cover most tasks you want to perform using a chat format, the OpenAI API let’s you define different <code>roles</code> in the chat. The available roles are <code>system</code>, <code>assistant</code>, <code>user</code> and <code>tools</code>. You should already be familiar with two of them by now: The <code>user</code> role corresponds to the actual user prompting the language model, all answers are given with the <code>assisstant</code> role.</p>
+<p>The <code>system</code> role can now be given to provide some additional general instructions to the language model that are typically not a user input, for example, the style in which the model responds. In this case, an example is better than any explanation.</p>
+<div id="875ab431" class="cell" data-execution_count="1">
+<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> llm_utils.client <span class="im">import</span> get_openai_client</span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>completion <span class="op">=</span> client.chat.completions.create(</span>
-<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  model<span class="op">=</span><span class="st">"gpt-3.5-turbo"</span>,</span>
-<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  messages<span class="op">=</span>[</span>
-<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="st">"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."</span>},</span>
-<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"Compose a poem that explains the concept of recursion in programming."</span>}</span>
-<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>  ]</span>
-<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(completion.choices[<span class="dv">0</span>].message)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-</details>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>MODEL <span class="op">=</span> <span class="st">"gpt4"</span></span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> get_openai_client(</span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    config_path<span class="op">=</span>os.environ.get(<span class="st">"CONFIG_PATH"</span>)</span>
+<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>completion <span class="op">=</span> client.chat.completions.create(</span>
+<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>  model<span class="op">=</span><span class="st">"MODEL"</span>,</span>
+<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>  messages<span class="op">=</span>[</span>
+<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="st">"You are an annoyed technician working in a help center for dish washers, who answers in short, unfriendly bursts."</span>},</span>
+<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"My dish washer does not clean the dishes, what could be the reason."</span>}</span>
+<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>  ]</span>
+<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(completion.choices[<span class="dv">0</span>].message.content)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>Could be anything. Blocked spray arm. Clogged filter. Faulty pump. Detergent issue. Check all that.</code></pre>
+</div>
 </div>
 </section>
-<section id="function-calling" class="level2">
-<h2 class="anchored" data-anchor-id="function-calling">Function calling:</h2>
-<p>https://platform.openai.com/docs/guides/function-calling</p>
+<section id="sec-test" class="level2">
+<h2 class="anchored" data-anchor-id="sec-test">Function calling:</h2>
+<p>As we have seen, most interactions with a language model happen in form of a chat with almost “free” question or instructions and answers. While this seems the most natural in most cases, it is not always a practical format if we want to use a language model for very specific purposes. This happens particularly often when we want to employ a language model in business situations, where we require a consistent output of the model.</p>
+<p>As an example, let us try to use GPT for sentiment analysis (see also <a href="../nlp/overview.html#sec-sentiment-analysis">here</a>). Let’s say we want GPT to classify a text into one of the following four categories:</p>
+<div id="ce80e6f9" class="cell" data-execution_count="2">
+<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>sentiment_categories <span class="op">=</span> [</span>
+<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="st">"positive"</span>, </span>
+<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">"negative"</span>,</span>
+<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">"neutral"</span>,</span>
+<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">"mixed"</span></span>
+<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<p>We could do the following:</p>
+<div id="40c825ed" class="cell" data-execution_count="4">
+<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>messages <span class="op">=</span> []</span>
+<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="ss">f"Classify the given text into one of the following sentiment categories: </span><span class="sc">{</span>sentiment_categories<span class="sc">}</span><span class="ss">."</span>}</span>
+<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"I really did not like the movie."</span>}</span>
+<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>response <span class="op">=</span> client.chat.completions.create(</span>
+<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>    messages<span class="op">=</span>messages,</span>
+<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL</span>
+<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Response: '</span><span class="sc">{</span>response<span class="sc">.</span>choices[<span class="dv">0</span>]<span class="sc">.</span>message<span class="sc">.</span>content<span class="sc">}</span><span class="ss">'"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<div id="c6ffeb88" class="cell" data-execution_count="5">
+<div class="cell-output cell-output-stdout">
+<pre><code>Response: 'Category: Negative'</code></pre>
+</div>
+</div>
+<p>It is easy to spot the problem: GPT does not necessarily answer in the way we expect or want it to. In this case, instead of simply returning the correct category, it also returns the string <code>Category:</code> alongside it (and capitalized <code>Negative</code>). So if we were to use the answer in a program or data base, we’d now again have to use some NLP techniques to parse it in order eventually retrieve <strong>exactly</strong> the category we were looking for: <code>negative</code>. What we need instead is a way to constrain GPT to a specific way of answering, and this is where <code>functions</code> or <code>tools</code> come into play (see also <a href="https://platform.openai.com/docs/guides/function-calling" class="external">Function calling</a> and <a href="https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models" class="external">Function calling (cookbook)</a>).</p>
+<p>This concept allows us to specify the exact output format we expect to receive from GPT (it is called functions since ideally we want to call a function directly on the output of GPT so it has to be in a specific format).</p>
+<div id="b78f10d6" class="cell" data-execution_count="6">
+<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># this looks intimidating but isn't that complicated</span></span>
+<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>tools <span class="op">=</span> [</span>
+<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    {</span>
+<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>        <span class="st">"type"</span>: <span class="st">"function"</span>,</span>
+<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>        <span class="st">"function"</span>: {</span>
+<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>            <span class="st">"name"</span>: <span class="st">"analyze_sentiment"</span>,</span>
+<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>            <span class="st">"description"</span>: <span class="st">"Analyze the sentiment in a given text."</span>,</span>
+<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>            <span class="st">"parameters"</span>: {</span>
+<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>                <span class="st">"type"</span>: <span class="st">"object"</span>,</span>
+<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>                <span class="st">"properties"</span>: {</span>
+<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>                    <span class="st">"sentiment"</span>: {</span>
+<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"type"</span>: <span class="st">"string"</span>,</span>
+<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"enum"</span>: sentiment_categories,</span>
+<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"description"</span>: <span class="ss">f"The sentiment of the text."</span></span>
+<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a>                    }</span>
+<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a>                },</span>
+<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a>                <span class="st">"required"</span>: [<span class="st">"sentiment"</span>],</span>
+<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a>            }</span>
+<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a>        }</span>
+<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a>    }</span>
+<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<div id="e1e40f2a" class="cell" data-execution_count="7">
+<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>messages <span class="op">=</span> []</span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="ss">f"Classify the given text into one of the following sentiment categories: </span><span class="sc">{</span>sentiment_categories<span class="sc">}</span><span class="ss">."</span>}</span>
+<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"I really did not like the movie."</span>}</span>
+<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>response <span class="op">=</span> client.chat.completions.create(</span>
+<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>    messages<span class="op">=</span>messages,</span>
+<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>    tools<span class="op">=</span>tools,</span>
+<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>    tool_choice<span class="op">=</span>{</span>
+<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>        <span class="st">"type"</span>: <span class="st">"function"</span>, </span>
+<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a>        <span class="st">"function"</span>: {<span class="st">"name"</span>: <span class="st">"analyze_sentiment"</span>}}</span>
+<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Response: '</span><span class="sc">{</span>response<span class="sc">.</span>choices[<span class="dv">0</span>]<span class="sc">.</span>message<span class="sc">.</span>tool_calls[<span class="dv">0</span>]<span class="sc">.</span>function<span class="sc">.</span>arguments<span class="sc">}</span><span class="ss">'"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>Response: '{
+"sentiment": "negative"
+}'</code></pre>
+</div>
+</div>
+<p>We can now easily extract what we need:</p>
+<div id="5e3c869b" class="cell" data-execution_count="8">
+<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> json </span>
+<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>result <span class="op">=</span> json.loads(response.choices[<span class="dv">0</span>].message.tool_calls[<span class="dv">0</span>].function.arguments) <span class="co"># remember that the answer is a string</span></span>
+<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(result[<span class="st">"sentiment"</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>negative</code></pre>
+</div>
+</div>
+<p>We can also include multiple function parameters if our desired output has multiple components. Let’s try to include another parameter which includes the <code>reason</code> for the sentiment.</p>
+<div id="9c902d9f" class="cell" data-execution_count="9">
+<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>tools <span class="op">=</span> [</span>
+<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>    {</span>
+<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>        <span class="st">"type"</span>: <span class="st">"function"</span>,</span>
+<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>        <span class="st">"function"</span>: {</span>
+<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>            <span class="st">"name"</span>: <span class="st">"analyze_sentiment"</span>,</span>
+<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>            <span class="st">"description"</span>: <span class="st">"Analyze the sentiment in a given text."</span>,</span>
+<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>            <span class="st">"parameters"</span>: {</span>
+<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a>                <span class="st">"type"</span>: <span class="st">"object"</span>,</span>
+<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>                <span class="st">"properties"</span>: {</span>
+<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>                    <span class="st">"sentiment"</span>: {</span>
+<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"type"</span>: <span class="st">"string"</span>,</span>
+<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"enum"</span>: sentiment_categories,</span>
+<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"description"</span>: <span class="ss">f"The sentiment of the text."</span></span>
+<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>                    },</span>
+<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>                    <span class="st">"reason"</span>: {</span>
+<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"type"</span>: <span class="st">"string"</span>,</span>
+<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>                        <span class="st">"description"</span>: <span class="st">"The reason for the sentiment in few words. If there is no information, do not make assumptions and leave blank."</span></span>
+<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>                    }</span>
+<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a>                },</span>
+<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>                <span class="st">"required"</span>: [<span class="st">"sentiment"</span>, <span class="st">"reason"</span>],</span>
+<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a>            }</span>
+<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a>        }</span>
+<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a>    }</span>
+<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div>
+<div id="d77c3b61" class="cell" data-execution_count="10">
+<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>messages <span class="op">=</span> []</span>
+<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"system"</span>, <span class="st">"content"</span>: <span class="ss">f"Classify the given text into one of the following sentiment categories: </span><span class="sc">{</span>sentiment_categories<span class="sc">}</span><span class="ss">. If you can, also extract the reason."</span>}</span>
+<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>messages.append(</span>
+<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>    {<span class="st">"role"</span>: <span class="st">"user"</span>, <span class="st">"content"</span>: <span class="st">"I loved the movie, Johnny Depp is a great actor."</span>}</span>
+<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>response <span class="op">=</span> client.chat.completions.create(</span>
+<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>    messages<span class="op">=</span>messages,</span>
+<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>MODEL,</span>
+<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>    tools<span class="op">=</span>tools,</span>
+<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>    tool_choice<span class="op">=</span>{</span>
+<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>        <span class="st">"type"</span>: <span class="st">"function"</span>, </span>
+<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>        <span class="st">"function"</span>: {<span class="st">"name"</span>: <span class="st">"analyze_sentiment"</span>}}</span>
+<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Response: '</span><span class="sc">{</span>response<span class="sc">.</span>choices[<span class="dv">0</span>]<span class="sc">.</span>message<span class="sc">.</span>tool_calls[<span class="dv">0</span>]<span class="sc">.</span>function<span class="sc">.</span>arguments<span class="sc">}</span><span class="ss">'"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stdout">
+<pre><code>Response: '{
+"sentiment": "positive",
+"reason": "Appreciation for the movie and actor"
+}'</code></pre>
+</div>
+</div>
+<p>Here, again, we could also constrain the possibilities for the <code>reason</code> to a certain set. Hence, functions are great to have more consistent answers of the language model such that we can use it in applications.</p>
 
 
 </section>
diff --git a/docs/llm/prompting.html b/docs/llm/prompting.html
index a9d2d5f..7859158 100644
--- a/docs/llm/prompting.html
+++ b/docs/llm/prompting.html
@@ -285,6 +285,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -416,9 +422,82 @@ <h1 class="title">Prompting</h1>
 </header>
 
 
-<p><strong>Resources:</strong> - https://platform.openai.com/docs/guides/prompt-engineering -</p>
+<p>Learning prompting is a science for itself. The difficulty lies in the probabilistic nature of the language models. That means, small changes to your prompt (that you might even find insignificant) can have a large impact on the result/the answer. In particular, the changes do not have to be “logical”, i.e., depend on your changes in a comprehensible or reproducible way. This can sometimes be frustrating, but can also be avoided in many cases when following the right instructions for prompting. To do so, let’s best follow the creators.</p>
+<div class="callout callout-style-default callout-note callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Note
+</div>
+</div>
+<div class="callout-body-container callout-body">
+<p><em>The following is taken from the <a href="https://platform.openai.com/docs/guides/prompt-engineering" class="external">OpenAI Guide</a></em></p>
+</div>
+</div>
+<section id="write-clear-instructions" class="level4">
+<h4 class="anchored" data-anchor-id="write-clear-instructions">Write clear instructions</h4>
+<p>These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.</p>
+<p>Tactics:</p>
+<ul>
+<li>Include details in your query to get more relevant answers</li>
+<li>Ask the model to adopt a persona</li>
+<li>Use delimiters to clearly indicate distinct parts of the input</li>
+<li>Specify the steps required to complete a task</li>
+<li>Provide examples</li>
+<li>Specify the desired length of the output <br><br></li>
+</ul>
+</section>
+<section id="provide-reference-text" class="level4">
+<h4 class="anchored" data-anchor-id="provide-reference-text">Provide reference text</h4>
+<p>Language models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.</p>
+<p>Tactics:</p>
+<ul>
+<li>Instruct the model to answer using a reference text</li>
+<li>Instruct the model to answer with citations from a reference text <br><br></li>
+</ul>
+</section>
+<section id="split-complex-tasks-into-simpler-subtasks" class="level4">
+<h4 class="anchored" data-anchor-id="split-complex-tasks-into-simpler-subtasks">Split complex tasks into simpler subtasks</h4>
+<p>Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks - submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.</p>
+<p>Tactics:</p>
+<ul>
+<li>Use intent classification to identify the most relevant instructions for a user query</li>
+<li>For dialogue applications that require very long conversations, summarize or filter previous dialogue</li>
+<li>Summarize long documents piecewise and construct a full summary recursively <br><br></li>
+</ul>
+</section>
+<section id="give-the-model-time-to-think" class="level4">
+<h4 class="anchored" data-anchor-id="give-the-model-time-to-think">Give the model time to “think”</h4>
+<p>If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a “chain of thought” before an answer can help the model reason its way toward correct answers more reliably.</p>
+<p>Tactics:</p>
+<ul>
+<li>Instruct the model to work out its own solution before rushing to a conclusion</li>
+<li>Use inner monologue or a sequence of queries to hide the model’s reasoning process</li>
+<li>Ask the model if it missed anything on previous passes <br><br></li>
+</ul>
+</section>
+<section id="use-external-tools" class="level4">
+<h4 class="anchored" data-anchor-id="use-external-tools">Use external tools</h4>
+<p>Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI’s Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.</p>
+<p>Tactics:</p>
+<ul>
+<li>Use embeddings-based search to implement efficient knowledge retrieval</li>
+<li>Use code execution to perform more accurate calculations or call external APIs</li>
+<li>Give the model access to specific functions <br><br></li>
+</ul>
+</section>
+<section id="test-changes-systematically" class="level4">
+<h4 class="anchored" data-anchor-id="test-changes-systematically">Test changes systematically</h4>
+<p>Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an “eval”).</p>
+<p>Tactic:</p>
+<ul>
+<li>Evaluate model outputs with reference to gold-standard answers</li>
+</ul>
 
 
+</section>
 
 <a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
diff --git a/docs/nlp/exercises/ex_fuzzy_matching.html b/docs/nlp/exercises/ex_fuzzy_matching.html
index 1d80503..08fc77d 100644
--- a/docs/nlp/exercises/ex_fuzzy_matching.html
+++ b/docs/nlp/exercises/ex_fuzzy_matching.html
@@ -285,6 +285,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/exercises/ex_tfidf.html b/docs/nlp/exercises/ex_tfidf.html
index eccad10..d6aa474 100644
--- a/docs/nlp/exercises/ex_tfidf.html
+++ b/docs/nlp/exercises/ex_tfidf.html
@@ -319,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/exercises/ex_tokenization.html b/docs/nlp/exercises/ex_tokenization.html
index c064206..0034407 100644
--- a/docs/nlp/exercises/ex_tokenization.html
+++ b/docs/nlp/exercises/ex_tokenization.html
@@ -319,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/exercises/ex_word_matching.html b/docs/nlp/exercises/ex_word_matching.html
index 1f2b2ed..9fdde62 100644
--- a/docs/nlp/exercises/ex_word_matching.html
+++ b/docs/nlp/exercises/ex_word_matching.html
@@ -319,6 +319,12 @@
   <a href="../../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/fuzzy_matching.html b/docs/nlp/fuzzy_matching.html
index 56346d8..e0a5fc5 100644
--- a/docs/nlp/fuzzy_matching.html
+++ b/docs/nlp/fuzzy_matching.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/overview.html b/docs/nlp/overview.html
index da96162..52b4953 100644
--- a/docs/nlp/overview.html
+++ b/docs/nlp/overview.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -467,7 +473,7 @@ <h4 class="anchored" data-anchor-id="machine-learning-revolution-2010s">Machine
 </section>
 <section id="large-language-models-transformers-2010s-present" class="level4">
 <h4 class="anchored" data-anchor-id="large-language-models-transformers-2010s-present">Large Language Models: Transformers (2010s-Present)</h4>
-<p>The latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.</p>
+<p>The latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pre-trained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.</p>
 </section>
 <section id="challenges-in-nlp" class="level4">
 <h4 class="anchored" data-anchor-id="challenges-in-nlp">Challenges in NLP</h4>
@@ -495,7 +501,7 @@ <h4 class="anchored" data-anchor-id="part-of-speech-tagging">Part-of-Speech Tagg
 <summary>
 Code example
 </summary>
-<div id="d87df8e1" class="cell" data-execution_count="1">
+<div id="1de29527" class="cell" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> spacy</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the English language model</span></span>
@@ -541,7 +547,7 @@ <h4 class="anchored" data-anchor-id="named-entity-recognition">Named Entity Reco
 <summary>
 Code example
 </summary>
-<div id="acc47b23" class="cell" data-execution_count="2">
+<div id="e9d253d7" class="cell" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> spacy</span>
 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the English language model</span></span>
@@ -569,14 +575,14 @@ <h4 class="anchored" data-anchor-id="named-entity-recognition">Named Entity Reco
 <h4 class="anchored" data-anchor-id="machine-translation">Machine Translation</h4>
 <p>Machine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. For example, translating a sentence from English to Spanish or vice versa. MT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. MT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.</p>
 </section>
-<section id="sentiment-analysis" class="level4">
-<h4 class="anchored" data-anchor-id="sentiment-analysis">Sentiment Analysis</h4>
+<section id="sec-sentiment-analysis" class="level4">
+<h4 class="anchored" data-anchor-id="sec-sentiment-analysis">Sentiment Analysis</h4>
 <p>Sentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. For instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. Sentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.</p>
 <details>
 <summary>
 Code example
 </summary>
-<div id="02b6acb2" class="cell" data-execution_count="3">
+<div id="8ee69b14" class="cell" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># python -m textblob.download_corpora</span></span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> textblob <span class="im">import</span> TextBlob</span>
@@ -615,7 +621,7 @@ <h4 class="anchored" data-anchor-id="text-classification">Text Classification</h
 <summary>
 Code example
 </summary>
-<div id="356c23f3" class="cell" data-execution_count="4">
+<div id="b55c09cb" class="cell" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.feature_extraction.text <span class="im">import</span> TfidfVectorizer</span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.svm <span class="im">import</span> SVC</span>
 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.pipeline <span class="im">import</span> make_pipeline</span>
diff --git a/docs/nlp/statistical_text_analysis.html b/docs/nlp/statistical_text_analysis.html
index 236f5c3..9b42ea7 100644
--- a/docs/nlp/statistical_text_analysis.html
+++ b/docs/nlp/statistical_text_analysis.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
diff --git a/docs/nlp/tokenization.html b/docs/nlp/tokenization.html
index 73a3dce..75e9508 100644
--- a/docs/nlp/tokenization.html
+++ b/docs/nlp/tokenization.html
@@ -319,6 +319,12 @@
   <a href="../llm/exercises/ex_gpt_parameterization.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Exercise: GPT Parameterization</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../llm/exercises/ex_gpt_ner_with_function_calls.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Exercise: NER with tool calling</span></a>
+  </div>
 </li>
       </ul>
   </li>
@@ -454,21 +460,21 @@ <h1 class="title">Tokenization</h1>
 <section id="simple-word-tokenization" class="level2">
 <h2 class="anchored" data-anchor-id="simple-word-tokenization">Simple word tokenization</h2>
 <p>A key element for a computer to understand the words we speak or type is the concept of word tokenization. For a human, the sentence</p>
-<div id="4561108d" class="cell" data-execution_count="1">
+<div id="706b1324" class="cell" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>sentence <span class="op">=</span> <span class="st">"I love reading science fiction books or books about science."</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 <p>is easy to understand since we are able to split the sentence into its individual parts in order to figure out the meaning of the full sentence. For a computer, the sentence is just a simple string of characters, like any other word or longer text. In order to make a computer understand the meaning of a sentence, we need to help break it down into its relevant parts.</p>
 <p>Simply put, word tokenization is the process of breaking down a piece of text into individual words or so-called tokens. It is like taking a sentence and splitting it into smaller pieces, where each piece represents a word. Word tokenization involves analyzing the text character by character and identifying boundaries between words. It uses various rules and techniques to decide where one word ends and the next one begins. For example, spaces, punctuation marks, and special characters often serve as natural boundaries between words.</p>
 <p>So let’s start breaking down the sentence into its individual parts.</p>
-<div id="36ccc15c" class="cell" data-execution_count="2">
+<div id="650aa91e" class="cell" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>tokenized_sentence <span class="op">=</span> sentence.split(<span class="st">" "</span>)</span>
 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(tokenized_sentence)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code>['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science.']</code></pre>
 </div>
 </div>
-<p>Once we have tokenized the sentence, we can start anaylzing it with some simple statistical methods. For example, in order to figure out what the sentence might be about, we could count the most frequent words.</p>
-<div id="4b5bb732" class="cell" data-execution_count="3">
+<p>Once we have tokenized the sentence, we can start analyzing it with some simple statistical methods. For example, in order to figure out what the sentence might be about, we could count the most frequent words.</p>
+<div id="f5e22bc0" class="cell" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> collections <span class="im">import</span> Counter</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>token_counter <span class="op">=</span> Counter(tokenized_sentence)</span>
@@ -477,8 +483,8 @@ <h2 class="anchored" data-anchor-id="simple-word-tokenization">Simple word token
 <pre><code>[('books', 2), ('I', 1)]</code></pre>
 </div>
 </div>
-<p>Unfortunately, we already realize that we have not done the best job with our “tokenizer”: The second occurence of the word <code>science</code> is missing do to the punctuation. While this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let’s get rid of it.</p>
-<div id="54b1654a" class="cell" data-execution_count="4">
+<p>Unfortunately, we already realize that we have not done the best job with our “tokenizer”: The second occurrence of the word <code>science</code> is missing do to the punctuation. While this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let’s get rid of it.</p>
+<div id="4547c10a" class="cell" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>tokenized_sentence <span class="op">=</span> sentence.replace(<span class="st">"."</span>, <span class="st">" "</span>).split(<span class="st">" "</span>)</span>
 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>token_counter <span class="op">=</span> Counter(tokenized_sentence)</span>
@@ -488,7 +494,7 @@ <h2 class="anchored" data-anchor-id="simple-word-tokenization">Simple word token
 </div>
 </div>
 <p>So that worked. As you can imagine, tokenization can get increasingly difficult when we have to deal with all sorts of situations in larger corpora of texts (see also the exercise). So it is great that there are already all sorts of libraries available that can help us with this process.</p>
-<div id="7a06e431" class="cell" data-execution_count="5">
+<div id="f635468e" class="cell" data-execution_count="5">
 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> nltk.tokenize <span class="im">import</span> wordpunct_tokenize</span>
 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> string <span class="im">import</span> punctuation</span>
 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span>
@@ -502,12 +508,64 @@ <h2 class="anchored" data-anchor-id="simple-word-tokenization">Simple word token
 </section>
 <section id="advanced-word-tokenization" class="level2">
 <h2 class="anchored" data-anchor-id="advanced-word-tokenization">Advanced word tokenization</h2>
-<p>TODO: Write</p>
-<p>From the docs:</p>
-<p>https://platform.openai.com/tokenizer</p>
+<p>The above ideas illustrate well the idea of tokenization of splitting text into smaller chunks that we can feed to a language model. In practice, especially in models like GPT, a critical component is the vocabulary or the set of unique words or tokens the model understands. Traditional approaches use fixed-size vocabularies, which means every unique word in the corpus has its own representation (index or embedding) in the model’s vocabulary. However, as the vocabulary size increases (for example, by including more languages), so does the memory requirement, which can be impractical for large-scale language models. One solution is the so-called bit-pair encoding. Bit pair encoding is a data compression technique specifically designed to tackle the issue of large vocabularies in language models. Instead of assigning a unique index or embedding to each token, bit pair encoding identifies frequent pairs of characters (bits) within the corpus and represents them as a single token. This effectively reduces the size of the vocabulary while preserving the essential information needed for language modeling tasks.</p>
+<section id="how-bit-pair-encoding-works" class="level3">
+<h3 class="anchored" data-anchor-id="how-bit-pair-encoding-works">How Bit Pair Encoding Works:</h3>
+<ol type="1">
+<li><p><strong>Tokenization</strong>: The first step in bit pair encoding is tokenization, where the text corpus is broken down into individual tokens. These tokens could be characters, subwords, or words, depending on the tokenization strategy used.</p></li>
+<li><p><strong>Pair Identification</strong>: Next, the algorithm identifies pairs of characters (bits) that occur frequently within the corpus. These pairs are typically consecutive characters in the text.</p></li>
+<li><p><strong>Replacement with Single Token</strong>: Once frequent pairs are identified, they are replaced with a single token. This effectively reduces the number of unique tokens in the vocabulary.</p></li>
+<li><p><strong>Iterative Process</strong>: The process of identifying frequent pairs and replacing them with single tokens is iterative. It continues until a predefined stopping criterion is met, such as reaching a target vocabulary size or when no more frequent pairs can be found.</p></li>
+<li><p><strong>Vocabulary Construction</strong>: After the iterative process, a vocabulary is constructed, consisting of the single tokens generated through pair replacement, along with any remaining tokens from the original tokenization process.</p></li>
+<li><p><strong>Encoding and Decoding</strong>: During training and inference, text data is encoded using the constructed vocabulary, where each token is represented by its corresponding index in the vocabulary. During decoding, the indices are mapped back to their respective tokens.</p></li>
+</ol>
+<div class="callout callout-style-default callout-tip callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Tip
+</div>
+</div>
+<div class="callout-body-container callout-body">
+<p>It is very illustrative to use the the OpenAI <a href="https://platform.openai.com/tokenizer" class="external">tokenizer</a> to see how a sentence is split up into different token. Try mixing languages and standard as well as more rare words and observe how they are split up.</p>
+<p>Another detailed example can be found <a href="https://www.geeksforgeeks.org/byte-pair-encoding-bpe-in-nlp/" class="external">here</a>.</p>
+</div>
+</div>
+</section>
+<section id="advantages-of-bit-pair-encoding" class="level3">
+<h3 class="anchored" data-anchor-id="advantages-of-bit-pair-encoding">Advantages of Bit Pair Encoding:</h3>
+<ol type="1">
+<li><p><strong>Efficient Memory Usage</strong>: Bit pair encoding significantly reduces the size of the vocabulary, leading to more efficient memory usage, especially in large-scale language models.</p></li>
+<li><p><strong>Retains Information</strong>: Despite reducing the vocabulary size, bit pair encoding retains important linguistic information by capturing frequent character pairs.</p></li>
+<li><p><strong>Flexible</strong>: Bit pair encoding is flexible and can be adapted to different tokenization strategies and corpus characteristics.</p></li>
+</ol>
+</section>
+<section id="limitations-and-considerations" class="level3">
+<h3 class="anchored" data-anchor-id="limitations-and-considerations">Limitations and Considerations:</h3>
+<ol type="1">
+<li><p><strong>Computational Overhead</strong>: The iterative nature of bit pair encoding can be computationally intensive, especially for large corpora.</p></li>
+<li><p><strong>Loss of Granularity</strong>: While bit pair encoding reduces vocabulary size, it may lead to a loss of granularity, especially for rare or out-of-vocabulary words.</p></li>
+<li><p><strong>Tokenization Strategy</strong>: The effectiveness of bit pair encoding depends on the tokenization strategy used and the characteristics of the corpus.</p></li>
+</ol>
+<div class="callout callout-style-default callout-tip callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Tip
+</div>
+</div>
+<div class="callout-body-container callout-body">
+<p><strong>From the <a href="https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them" class="external">OpenAI Guide</a></strong>:</p>
 <p>A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).</p>
+</div>
+</div>
 
 
+</section>
 </section>
 
 <a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
diff --git a/docs/search.json b/docs/search.json
index e25534e..ee38dc0 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -94,7 +94,7 @@
     "href": "nlp/overview.html",
     "title": "Overview of NLP",
     "section": "",
-    "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
+    "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pre-trained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -106,7 +106,7 @@
     "href": "nlp/overview.html#a-short-history-of-natural-language-processing",
     "title": "Overview of NLP",
     "section": "",
-    "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
+    "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pre-trained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -202,7 +202,7 @@
     "href": "llm/exercises/ex_gpt_parameterization.html",
     "title": "Exercise: GPT Parameterization",
     "section": "",
-    "text": "Task: Explore the parameterization possibilities of the OpenAI API for GPT.\nInstructions:\n\nSome instructions\n\nTODO: Finalize this!\n\n\n\n Back to top",
+    "text": "Task: Explore the parameterization possibilities of the OpenAI API for GPT.\nInstructions:\nSome possibilities are:\n\nUse the system role in order to give instructions to the language model before the interaction with the user starts in order to change the response style of the model.\nChange the temparature or top_p parameters and explore the effect on your prompts.\nUse the\n\n\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\n# here goes your code\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
@@ -210,113 +210,101 @@
     ]
   },
   {
-    "objectID": "llm/exercises/ex_gpt_start.html",
-    "href": "llm/exercises/ex_gpt_start.html",
-    "title": "Exercise: OpenAI - Getting started",
-    "section": "",
-    "text": "Task: Explore the OpenAI chat.completions API.\nInstructions:\n\nGenerate a chat completion and analyze the response object ChatCompletion. What information do you get with each completion?\nHow can you access the actual completion of your prompt?\nUse the OpenAI API documentation to find out what choices are and how they are used.\nPlay around with the parameters temperature and top_p for a simple prompt. What do you notice?\n\n\n\n\n Back to top",
-    "crumbs": [
-      "Seminar",
-      "Large Language Models",
-      "Exercise: OpenAI - Getting started"
-    ]
-  },
-  {
-    "objectID": "llm/intro.html",
-    "href": "llm/intro.html",
-    "title": "Introduction to LLM",
+    "objectID": "llm/exercises/ex_gpt_chatbot.html",
+    "href": "llm/exercises/ex_gpt_chatbot.html",
+    "title": "Exercise: GPT Chatbot",
     "section": "",
-    "text": "Definition of Large Language Models: Large Language Models (LLMs) are deep learning models trained on vast amounts of text data to understand and generate human-like text. They use advanced techniques such as Transformers and self-attention mechanisms to process and generate sequences of words.\nPre-training and Fine-tuning: LLMs are typically pre-trained on large text corpora using unsupervised learning techniques, where they learn the statistical properties of natural language. After pre-training, they can be fine-tuned on specific tasks or domains with labeled data to adapt their knowledge and capabilities.\nTransformer Architecture: Transformers are the backbone of LLMs, consisting of multiple layers of self-attention mechanisms and feed-forward neural networks. They excel at capturing long-range dependencies in sequential data, making them well-suited for NLP tasks.\nSelf-Attention Mechanism: Self-attention allows LLMs to weigh the importance of each word in a sequence based on its relationship with other words in the sequence. This mechanism enables them to capture contextual information effectively and generate coherent text.\n\n\n\n\n Back to top",
+    "text": "Task: Create a simple chatbot using the OpenAI chat.completions API.\nInstructions:\n\nUse the chat.completions API to send prompts to GPT, receive the answers and displaying them.\nStop the conversation when the user inputs the word exit instead of a new prompt.\nHint: Remember that GPT has no memory, so you always have to include the previous conversation in your prompts.\n\n\n\nShow solution\n\n\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nclass ChatGPT:\n    def __init__(self, model=MODEL):\n        self.model = model\n        self.client = client\n        self.messages = []\n\n    def chat_with_gpt(self, user_input: str):\n        self.messages.append({\n            \"role\": \"user\",\n            \"content\": user_input\n        })\n        response = self._generate_response(self.messages)\n        return response\n\n    def _generate_response(self, messages):\n        response = self.client.chat.completions.create(\n            model=self.model,\n            messages=messages,        \n            temperature=0.2, \n            max_tokens=150,\n            top_p=1.0\n        )\n        response_message = response.choices[0].message\n        self.messages.append({\n            \"role\": response_message.role,\n            \"content\": response_message.content\n        })\n\n        return response_message.content\n\n\n# Conversation loop\nchat_gpt = ChatGPT(model=\"gpt4\")\n\nwhile True:\n    user_input = input(\"User: \")\n\n    if user_input.lower() == 'exit':\n        break\n    \n    print(\"User:\", user_input)\n    \n    # Get bot response based on user input\n    bot_response = chat_gpt.chat_with_gpt(user_input)\n\n    print(\"Bot:\", bot_response)\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "Introduction to LLM"
+      "Exercise: GPT Chatbot"
     ]
   },
   {
-    "objectID": "llm/parameterization.html",
-    "href": "llm/parameterization.html",
-    "title": "Parameterization of GPT",
+    "objectID": "llm/gpt.html",
+    "href": "llm/gpt.html",
+    "title": "GPT",
     "section": "",
-    "text": "Temperature: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It’s often used to balance between generating safe, conservative responses and more novel, imaginative ones.\nMax Tokens: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.\nTop P (Nucleus Sampling): Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It’s particularly useful for generating diverse and contextually relevant responses.\nFrequency Penalty: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.\nPresence Penalty: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It’s useful for generating more creative and novel outputs that are not directly predictable from the input.\nStop Sequence: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.\nRepetition Penalty: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It’s particularly useful for generating coherent and diverse text without excessive redundancy.\nLength Penalty: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.",
+    "text": "Definition of GPT: GPT is a state-of-the-art large language model developed by OpenAI. It belongs to the family of Transformer-based architectures and is renowned for its ability to generate coherent and contextually relevant text across a wide range of tasks.\nKey Features of GPT: Highlight the key features that distinguish GPT from other LLMs, such as its autoregressive nature, the use of self-attention mechanisms, and the ability to generate text of variable length.\nPre-training Objective: GPT is pre-trained using an unsupervised learning objective known as language modeling. During pre-training, it learns to predict the next word in a sequence based on the preceding context, which enables it to capture the statistical properties of natural language.\nArchitecture of GPT: Provide an overview of the architecture of GPT, which consists of multiple layers of Transformer blocks. Each block includes self-attention layers, feed-forward neural networks, and layer normalization, allowing GPT to process input sequences and generate output sequences effectively.\nFine-tuning and Adaptation: GPT can be fine-tuned on specific tasks or domains with labeled data to adapt its pre-trained knowledge to new tasks. This fine-tuning process allows GPT to achieve state-of-the-art performance on a wide range of natural language processing tasks.\nApplications of GPT: Discuss the diverse applications of GPT across various domains, including text generation, summarization, translation, question-answering, conversation generation, and more. Highlight real-world examples and use cases where GPT has been deployed successfully.\nRecent Advancements and Versions: Mention the evolution of GPT over time, including the release of different versions such as GPT-1, GPT-2, GPT-3, and any subsequent versions or variants. Discuss the improvements and advancements introduced in each iteration.\nChallenges and Limitations: Acknowledge the challenges and limitations associated with GPT, such as the potential for generating biased or inappropriate content, the need for large-scale computational resources, and the difficulty of fine-tuning for specific tasks without overfitting.",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "Parameterization of GPT"
+      "GPT"
     ]
   },
   {
-    "objectID": "llm/parameterization.html#roles",
-    "href": "llm/parameterization.html#roles",
-    "title": "Parameterization of GPT",
-    "section": "Roles:",
-    "text": "Roles:\n\n\nCode\nfrom openai import OpenAI\nclient = OpenAI()\n\ncompletion = client.chat.completions.create(\n  model=\"gpt-3.5-turbo\",\n  messages=[\n    {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n    {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n  ]\n)\n\nprint(completion.choices[0].message)",
+    "objectID": "llm/gpt.html#completions-and-how-they-work",
+    "href": "llm/gpt.html#completions-and-how-they-work",
+    "title": "GPT",
+    "section": "Completions and how they work",
+    "text": "Completions and how they work\n\n1. Prompt:\nThe prompt serves as the cornerstone of completion generation, acting as the initial input or context upon which the model bases its predictions and generates completions. Its significance lies in its ability to set the tone, theme, and direction for the subsequent text generation process. Prompts can vary widely in length and complexity, ranging from concise prompts that elicit specific responses to more extensive prompts that allow for nuanced and detailed completions. The effectiveness of the prompt in guiding the completion generation process depends on its clarity, relevance, and specificity to the desired task or objective.\n\n\n2. Model Architecture:\nCompletions derive their power from sophisticated machine learning models, with transformer-based architectures like GPT (Generative Pre-trained Transformer) leading the forefront. These models undergo extensive training on vast amounts of text data, spanning diverse domains and languages, to develop a deep understanding of human language. Through this training process, the models learn to capture the intricacies of grammar, syntax, semantics, and context inherent in natural language. The architecture of these models is designed to efficiently process and analyze input text, enabling them to capture long-range dependencies within text and generate coherent completions that align with the provided prompt.\n\n\n3. Tokenization:\nBefore processing the prompt and generating completions, the input text undergoes tokenization, a crucial preprocessing step that breaks it down into smaller units known as tokens. These tokens typically represent words or subwords and serve as the fundamental building blocks for the model’s understanding of the text. Tokenization enables the model to analyze the underlying structure of the text at a granular level, facilitating more effective learning and prediction. Each token encapsulates a discrete unit of meaning within the text and serves as input to the model during the completion generation process.\n\n\n4. Probability Distribution:\nCentral to the completion generation process is the prediction of the likelihood of each possible token that could follow the prompt. This prediction is based on the model’s learned parameters and contextual understanding of the input text. The model computes a probability distribution over the vocabulary of tokens, assigning a probability score to each token to indicate its likelihood of occurrence given the context provided by the prompt. This probability distribution guides the selection of tokens during the completion generation process, ensuring that the generated completions are coherent and contextually relevant.\n\n\n5. Sampling Strategy:\nTo generate completions, the model employs various sampling strategies to select tokens from the probability distribution. Greedy sampling, for example, selects the token with the highest probability at each step, favoring the most probable tokens but potentially leading to repetitive or predictable completions. In contrast, random sampling randomly selects tokens according to their probabilities, introducing variability and unpredictability into the generated completions. Top-k sampling restricts token selection to the top-k most probable tokens, striking a balance between diversity and coherence in the completions. Each sampling strategy offers unique trade-offs in terms of diversity, coherence, and computational efficiency, allowing users to tailor the completion generation process to their specific needs and preferences.\n\n\nConclusion:\nCompletions represent a sophisticated approach to natural language processing, leveraging advanced machine learning models and algorithms to generate coherent and contextually relevant text based on given input. By understanding the underlying components and mechanisms of completions, users can harness their power to develop innovative applications and solutions across a wide range of domains and use cases. As research in NLP continues to advance, the capabilities and applications of completions are expected to evolve, driving further innovation and exploration in the field of human-computer interaction.",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "Parameterization of GPT"
+      "GPT"
     ]
   },
   {
-    "objectID": "llm/parameterization.html#function-calling",
-    "href": "llm/parameterization.html#function-calling",
-    "title": "Parameterization of GPT",
-    "section": "Function calling:",
-    "text": "Function calling:\nhttps://platform.openai.com/docs/guides/function-calling",
+    "objectID": "llm/gpt_api.html",
+    "href": "llm/gpt_api.html",
+    "title": "The OpenAI API",
+    "section": "",
+    "text": "Note\n\n\n\nResource: OpenAI API docs\n\n\nLet’s get started with the OpenAI API for GPT.\n\nAuthentication\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the Jupyterlab (TODO: Link).\n\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n\n\n\nRequesting Completions\nMost interaction with GPT and other models consist in generating completions for certain tasks (TODO: Link to completions)\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n# creating a completion\nchat_completion = client.chat.completions.create(\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"How old is the earth?\",\n        }\n    ],\n    model=\"gpt-3.5-turbo\"\n)\n\n\n\nProcessing\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. This process involves analyzing the context provided by the prompt and leveraging the model’s pre-trained knowledge to generate text completions. The model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. By drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n\nResponse\nAfter processing your request, the OpenAI API returns a JSON-formatted response containing the generated text completions. Depending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as a confidence score indicating the model’s level of certainty in the generated text. This response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application’s behavior accordingly.\n\n\nError Handling\nWhile interacting with the OpenAI API, it’s crucial to implement robust error handling mechanisms to gracefully manage any potential issues that may arise. Common errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. B y handling errors effectively, you can ensure the reliability and resilience of your application, minimizing disruptions to the user experience and maintaining smooth operation under varying conditions. Implementing proper error handling practices is essential for building robust and dependable applications that leverage the capabilities of the OpenAI Chat Completions API effectively.\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "Parameterization of GPT"
+      "The OpenAI API"
     ]
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/httpx-0.27.0.dist-info/licenses/LICENSE.html",
-    "href": "script_venv/lib/python3.8/site-packages/httpx-0.27.0.dist-info/licenses/LICENSE.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/idna-3.6.dist-info/LICENSE.html",
+    "href": "script_venv/lib/python3.8/site-packages/idna-3.6.dist-info/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "Copyright © 2019, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
+    "text": "BSD 3-Clause License\nCopyright (c) 2013-2023, Kim Davies and contributors. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html",
-    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html",
-    "title": "Natural Language Toolkit (NLTK) Authors",
+    "objectID": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html",
+    "href": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html",
+    "title": "UAT for NbAgg backend.",
     "section": "",
-    "text": "Steven Bird stevenbird1@gmail.com\nEdward Loper edloper@gmail.com\nEwan Klein ewan@inf.ed.ac.uk\n\n\n\n\n\nTom Aarsen\nRami Al-Rfou’\nMark Amery\nGreg Aumann\nIvan Barria\nIngolf Becker\nYonatan Becker\nPaul Bedaride\nSteven Bethard\nRobert Berwick\nDan Blanchard\nNathan Bodenstab\nAlexander Böhm\nFrancis Bond\nPaul Bone\nJordan Boyd-Graber\nDaniel Blanchard\nPhil Blunsom\nLars Buitinck\nCristian Capdevila\nSteve Cassidy\nChen-Fu Chiang\nDmitry Chichkov\nJinyoung Choi\nAndrew Clausen\nLucas Champollion\nGraham Christensen\nTrevor Cohn\nDavid Coles\nTom Conroy https://github.com/tconroy\nClaude Coulombe\nLucas Cooper\nRobin Cooper\nChris Crowner\nJames Curran\nArthur Darcet\nDariel Dato-on\nSelina Dennis\nLeon Derczynski\nAlexis Dimitriadis\nNikhil Dinesh\nLiang Dong\nDavid Doukhan\nRebecca Dridan\nPablo Duboue\nLong Duong\nChristian Federmann\nCampion Fellin\nMichelle Fullwood\nDan Garrette\nMaciej Gawinecki\nJean Mark Gawron\nSumukh Ghodke\nYoav Goldberg\nMichael Wayne Goodman\nDougal Graham\nBrent Gray\nSimon Greenhill\nClark Grubb\nEduardo Pereira Habkost\nMasato Hagiwara\nLauri Hallila\nMichael Hansen\nYurie Hara\nWill Hardy\nTyler Hartley\nPeter Hawkins\nSaimadhav Heblikar\nFredrik Hedman\nHelder\nMichael Heilman\nOfer Helman\nChristopher Hench\nBruce Hill\nAmy Holland\nKristy Hollingshead\nMarcus Huderle\nBaden Hughes\nNancy Ide\nRebecca Ingram\nEdward Ivanovic\nThomas Jakobsen\nNick Johnson\nEric Kafe\nPiotr Kasprzyk\nAngelos Katharopoulos\nSudharshan Kaushik\nChris Koenig\nMikhail Korobov\nDenis Krusko\nIlia Kurenkov\nStefano Lattarini\nPierre-François Laquerre\nStefano Lattarini\nHaejoong Lee\nJackson Lee\nMax Leonov\nChris Liechti\nHyuckin David Lim\nTom Lippincott\nPeter Ljunglöf\nAlex Louden\nJoseph Lynch\nNitin Madnani\nFelipe Madrigal\nBjørn Mæland\nDean Malmgren\nChristopher Maloof\nRob Malouf\nIker Manterola\nCarl de Marcken\nMitch Marcus\nTorsten Marek\nRobert Marshall\nMarius Mather\nDuncan McGreggor\nDavid McClosky\nXinfan Meng\nDmitrijs Milajevs\nMargaret Mitchell\nTomonori Nagano\nJason Narad\nShari A’aidil Nasruddin\nLance Nathan\nMorten Neergaard\nDavid Nemeskey\nEric Nichols\nJoel Nothman\nAlireza Nourian\nAlexander Oleynikov\nPierpaolo Pantone\nTed Pedersen\nJacob Perkins\nAlberto Planas\nOndrej Platek\nAlessandro Presta\nQi Liu\nMartin Thorsen Ranang\nMichael Recachinas\nBrandon Rhodes\nJoshua Ritterman\nWill Roberts\nStuart Robinson\nCarlos Rodriguez\nLorenzo Rubio\nAlex Rudnick\nJussi Salmela\nGeoffrey Sampson\nKepa Sarasola\nKevin Scannell\nNathan Schneider\nRico Sennrich\nThomas Skardal\nEric Smith\nLynn Soe\nRob Speer\nPeter Spiller\nRichard Sproat\nCeri Stagg\nPeter Stahl\nOliver Steele\nThomas Stieglmaier\nJan Strunk\nLiling Tan\nClaire Taylor\nLouis Tiao\nSteven Tomcavage\nTiago Tresoldi\nMarcus Uneson\nYu Usami\nPetro Verkhogliad\nPeter Wang\nZhe Wang\nCharlotte Wilson\nChuck Wooters\nSteven Xu\nBeracah Yankama\nLei Ye (叶磊)\nPatrick Ye\nGeraldine Sim Wei Ying\nJason Yoder\nThomas Zieglier\n0ssifrage\nducki13\nkiwipi\nlade\nisnowfy\nonesandzeros\npquentin\nwvanlint\nÁlvaro Justen https://github.com/turicas\nbjut-hz\nSergio Oller\nWill Monroe\nElijah Rippeth\nEmil Manukyan\nCasper Lehmann-Strøm\nAndrew Giel\nTanin Na Nakorn\nLinghao Zhang\nColin Carroll\nHeguang Miao\nHannah Aizenman (story645)\nGeorge Berry\nAdam Nelson\nJ Richard Snape\nAlex Constantin alex@keyworder.ch\nTsolak Ghukasyan\nPrasasto Adi\nSafwan Kamarrudin\nArthur Tilley\nVilhjalmur Thorsteinsson\nJaehoon Hwang https://github.com/jaehoonhwang\nChintan Shah https://github.com/chintanshah24\nsbagan\nZicheng Xu\nAlbert Au Yeung https://github.com/albertauyeung\nShenjian Zhao\nDeng Wang https://github.com/lmatt-bit\nAli Abdullah\nStoytcho Stoytchev\nLakhdar Benzahia\nKheireddine Abainia https://github.com/xprogramer\nYibin Lin https://github.com/yibinlin\nArtiem Krinitsyn\nBjörn Mattsson\nOleg Chislov\nPavan Gururaj Joshi https://github.com/PavanGJ\nEthan Hill https://github.com/hill1303\nVivek Lakshmanan\nSomnath Rakshit https://github.com/somnathrakshit\nAnlan Du\nPulkit Maloo https://github.com/pulkitmaloo\nBrandon M. Burroughs https://github.com/brandonmburroughs\nJohn Stewart https://github.com/free-variation\nIaroslav Tymchenko https://github.com/myproblemchild\nAleš Tamchyna\nTim Gianitsos https://github.com/timgianitsos\nPhilippe Partarrieu https://github.com/ppartarr\nAndrew Owen Martin\nAdrian Ellis https://github.com/adrianjellis\nNat Quayle Nelson https://github.com/nqnstudios\nYanpeng Zhao https://github.com/zhaoyanpeng\nMatan Rak https://github.com/matanrak\nNick Ulle https://github.com/nick-ulle\nUday Krishna https://github.com/udaykrishna\nOsman Zubair https://github.com/okz12\nViresh Gupta https://github.com/virresh\nOndřej Cífka https://github.com/cifkao\nIris X. Zhou https://github.com/irisxzhou\nDevashish Lal https://github.com/BLaZeKiLL\nGerhard Kremer https://github.com/GerhardKa\nNicolas Darr https://github.com/ndarr\nHervé Nicol https://github.com/hervenicol\nAlexandre H. T. Dias https://github.com/alexandredias3d\nDaksh Shah https://github.com/Daksh\nJacob Weightman https://github.com/jacobdweightman\nBonifacio de Oliveira https://github.com/Bonifacio2\nArmins Bagrats Stepanjans https://github.com/ab-10\nVassilis Palassopoulos https://github.com/palasso\nRam Rachum https://github.com/cool-RR\nOr Sharir https://github.com/orsharir\nDenali Molitor https://github.com/dmmolitor\nJacob Moorman https://github.com/jdmoorman\nCory Nezin https://github.com/corynezin\nMatt Chaput\nDanny Sepler https://github.com/dannysepler\nAkshita Bhagia https://github.com/AkshitaB\nPratap Yadav https://github.com/prtpydv\nHiroki Teranishi https://github.com/chantera\nRuben Cartuyvels https://github.com/rubencart\nDalton Pearson https://github.com/daltonpearson\nRobby Horvath https://github.com/robbyhorvath\nGavish Poddar https://github.com/gavishpoddar\nSaibo Geng https://github.com/Saibo-creator\nAhmet Yildirim https://github.com/RnDevelover\nYuta Nakamura https://github.com/yutanakamura-tky\nAdam Hawley https://github.com/adamjhawley\nPanagiotis Simakis https://github.com/sp1thas\nRichard Wang https://github.com/richarddwang\nAlexandre Perez-Lebel https://github.com/aperezlebel\nFernando Carranza https://github.com/fernandocar86\nMartin Kondratzky https://github.com/martinkondra\nHeungson Lee https://github.com/heungson\nM.K. Pawelkiewicz https://github.com/hamiltonianflow\nSteven Thomas Smith https://github.com/essandess\nJan Lennartz https://github.com/Madnex\n\n\n\n\n\n\n\nMartin Porter\nVivake Gupta\nBarry Wilkins\nHiranmay Ghosh\nChris Emerson\n\n\n\n\n\nAssem Chelli\nAbdelkrim Aries\nLakhdar Benzahia"
+    "text": "from imp import reload\nThe first line simply reloads matplotlib, uses the nbagg backend and then reloads the backend, just to ensure we have the latest modification to the backend code. Note: The underlying JavaScript will not be updated by this process, so a refresh of the browser after clearing the output and saving is necessary to clear everything fully.\nimport matplotlib\nreload(matplotlib)\n\nmatplotlib.use('nbagg')\n\nimport matplotlib.backends.backend_nbagg\nreload(matplotlib.backends.backend_nbagg)"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#original-authors",
-    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#original-authors",
-    "title": "Natural Language Toolkit (NLTK) Authors",
-    "section": "",
-    "text": "Steven Bird stevenbird1@gmail.com\nEdward Loper edloper@gmail.com\nEwan Klein ewan@inf.ed.ac.uk"
+    "objectID": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html#uat-13---animation",
+    "href": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html#uat-13---animation",
+    "title": "UAT for NbAgg backend.",
+    "section": "UAT 13 - Animation",
+    "text": "UAT 13 - Animation\nThe following should generate an animated line:\n\nimport matplotlib.animation as animation\nimport numpy as np\n\nfig, ax = plt.subplots()\n\nx = np.arange(0, 2*np.pi, 0.01)        # x-array\nline, = ax.plot(x, np.sin(x))\n\ndef animate(i):\n    line.set_ydata(np.sin(x+i/10.0))  # update the data\n    return line,\n\n#Init only required for blitting to give a clean slate.\ndef init():\n    line.set_ydata(np.ma.array(x, mask=True))\n    return line,\n\nani = animation.FuncAnimation(fig, animate, np.arange(1, 200), init_func=init,\n                              interval=100., blit=True)\nplt.show()\n\n\nUAT 14 - Keyboard shortcuts in IPython after close of figure\nAfter closing the previous figure (with the close button above the figure) the IPython keyboard shortcuts should still function.\n\n\nUAT 15 - Figure face colours\nThe nbagg honours all colours apart from that of the figure.patch. The two plots below should produce a figure with a red background. There should be no yellow figure.\n\nimport matplotlib\nmatplotlib.rcParams.update({'figure.facecolor': 'red',\n                            'savefig.facecolor': 'yellow'})\nplt.figure()\nplt.plot([3, 2, 1])\n\nplt.show()\n\n\n\nUAT 16 - Events\nPressing any keyboard key or mouse button (or scrolling) should cycle the line while the figure has focus. The figure should have focus by default when it is created and re-gain it by clicking on the canvas. Clicking anywhere outside of the figure should release focus, but moving the mouse out of the figure should not release focus.\n\nimport itertools\nfig, ax = plt.subplots()\nx = np.linspace(0,10,10000)\ny = np.sin(x)\nln, = ax.plot(x,y)\nevt = []\ncolors = iter(itertools.cycle(['r', 'g', 'b', 'k', 'c']))\ndef on_event(event):\n    if event.name.startswith('key'):\n        fig.suptitle('%s: %s' % (event.name, event.key))\n    elif event.name == 'scroll_event':\n        fig.suptitle('%s: %s' % (event.name, event.step))\n    else:\n        fig.suptitle('%s: %s' % (event.name, event.button))\n    evt.append(event)\n    ln.set_color(next(colors))\n    fig.canvas.draw()\n    fig.canvas.draw_idle()\n\nfig.canvas.mpl_connect('button_press_event', on_event)\nfig.canvas.mpl_connect('button_release_event', on_event)\nfig.canvas.mpl_connect('scroll_event', on_event)\nfig.canvas.mpl_connect('key_press_event', on_event)\nfig.canvas.mpl_connect('key_release_event', on_event)\n\nplt.show()\n\n\n\nUAT 17 - Timers\nSingle-shot timers follow a completely different code path in the nbagg backend than regular timers (such as those used in the animation example above.) The next set of tests ensures that both “regular” and “single-shot” timers work properly.\nThe following should show a simple clock that updates twice a second:\n\nimport time\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\n\ndef update(text):\n    text.set(text=time.ctime())\n    text.axes.figure.canvas.draw()\n    \ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\ntimer.start()\nplt.show()\n\nHowever, the following should only update once and then stop:\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center') \ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.single_shot = True\ntimer.start()\n\nplt.show()\n\nAnd the next two examples should never show any visible text at all:\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.start()\ntimer.stop()\n\nplt.show()\n\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.single_shot = True\ntimer.start()\ntimer.stop()\n\nplt.show()\n\n\n\nUAT 18 - stopping figure when removed from DOM\nWhen the div that contains from the figure is removed from the DOM the figure should shut down it’s comm, and if the python-side figure has no more active comms, it should destroy the figure. Repeatedly running the cell below should always have the same figure number\n\nfig, ax = plt.subplots()\nax.plot(range(5))\nplt.show()\n\nRunning the cell below will re-show the figure. After this, re-running the cell above should result in a new figure number.\n\nfig.canvas.manager.reshow()\n\n\n\nUAT 19 - Blitting\nClicking on the figure should plot a green horizontal line moving up the axes.\n\nimport itertools\n\ncnt = itertools.count()\nbg = None\n\ndef onclick_handle(event):\n    \"\"\"Should draw elevating green line on each mouse click\"\"\"\n    global bg\n    if bg is None:\n        bg = ax.figure.canvas.copy_from_bbox(ax.bbox) \n    ax.figure.canvas.restore_region(bg)\n\n    cur_y = (next(cnt) % 10) * 0.1\n    ln.set_ydata([cur_y, cur_y])\n    ax.draw_artist(ln)\n    ax.figure.canvas.blit(ax.bbox)\n\nfig, ax = plt.subplots()\nax.plot([0, 1], [0, 1], 'r')\nln, = ax.plot([0, 1], [0, 0], 'g', animated=True)\nplt.show()\nax.figure.canvas.draw()\n\nax.figure.canvas.mpl_connect('button_press_event', onclick_handle)"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#contributors",
-    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#contributors",
-    "title": "Natural Language Toolkit (NLTK) Authors",
+    "objectID": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html",
+    "href": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html",
+    "title": "",
     "section": "",
-    "text": "Tom Aarsen\nRami Al-Rfou’\nMark Amery\nGreg Aumann\nIvan Barria\nIngolf Becker\nYonatan Becker\nPaul Bedaride\nSteven Bethard\nRobert Berwick\nDan Blanchard\nNathan Bodenstab\nAlexander Böhm\nFrancis Bond\nPaul Bone\nJordan Boyd-Graber\nDaniel Blanchard\nPhil Blunsom\nLars Buitinck\nCristian Capdevila\nSteve Cassidy\nChen-Fu Chiang\nDmitry Chichkov\nJinyoung Choi\nAndrew Clausen\nLucas Champollion\nGraham Christensen\nTrevor Cohn\nDavid Coles\nTom Conroy https://github.com/tconroy\nClaude Coulombe\nLucas Cooper\nRobin Cooper\nChris Crowner\nJames Curran\nArthur Darcet\nDariel Dato-on\nSelina Dennis\nLeon Derczynski\nAlexis Dimitriadis\nNikhil Dinesh\nLiang Dong\nDavid Doukhan\nRebecca Dridan\nPablo Duboue\nLong Duong\nChristian Federmann\nCampion Fellin\nMichelle Fullwood\nDan Garrette\nMaciej Gawinecki\nJean Mark Gawron\nSumukh Ghodke\nYoav Goldberg\nMichael Wayne Goodman\nDougal Graham\nBrent Gray\nSimon Greenhill\nClark Grubb\nEduardo Pereira Habkost\nMasato Hagiwara\nLauri Hallila\nMichael Hansen\nYurie Hara\nWill Hardy\nTyler Hartley\nPeter Hawkins\nSaimadhav Heblikar\nFredrik Hedman\nHelder\nMichael Heilman\nOfer Helman\nChristopher Hench\nBruce Hill\nAmy Holland\nKristy Hollingshead\nMarcus Huderle\nBaden Hughes\nNancy Ide\nRebecca Ingram\nEdward Ivanovic\nThomas Jakobsen\nNick Johnson\nEric Kafe\nPiotr Kasprzyk\nAngelos Katharopoulos\nSudharshan Kaushik\nChris Koenig\nMikhail Korobov\nDenis Krusko\nIlia Kurenkov\nStefano Lattarini\nPierre-François Laquerre\nStefano Lattarini\nHaejoong Lee\nJackson Lee\nMax Leonov\nChris Liechti\nHyuckin David Lim\nTom Lippincott\nPeter Ljunglöf\nAlex Louden\nJoseph Lynch\nNitin Madnani\nFelipe Madrigal\nBjørn Mæland\nDean Malmgren\nChristopher Maloof\nRob Malouf\nIker Manterola\nCarl de Marcken\nMitch Marcus\nTorsten Marek\nRobert Marshall\nMarius Mather\nDuncan McGreggor\nDavid McClosky\nXinfan Meng\nDmitrijs Milajevs\nMargaret Mitchell\nTomonori Nagano\nJason Narad\nShari A’aidil Nasruddin\nLance Nathan\nMorten Neergaard\nDavid Nemeskey\nEric Nichols\nJoel Nothman\nAlireza Nourian\nAlexander Oleynikov\nPierpaolo Pantone\nTed Pedersen\nJacob Perkins\nAlberto Planas\nOndrej Platek\nAlessandro Presta\nQi Liu\nMartin Thorsen Ranang\nMichael Recachinas\nBrandon Rhodes\nJoshua Ritterman\nWill Roberts\nStuart Robinson\nCarlos Rodriguez\nLorenzo Rubio\nAlex Rudnick\nJussi Salmela\nGeoffrey Sampson\nKepa Sarasola\nKevin Scannell\nNathan Schneider\nRico Sennrich\nThomas Skardal\nEric Smith\nLynn Soe\nRob Speer\nPeter Spiller\nRichard Sproat\nCeri Stagg\nPeter Stahl\nOliver Steele\nThomas Stieglmaier\nJan Strunk\nLiling Tan\nClaire Taylor\nLouis Tiao\nSteven Tomcavage\nTiago Tresoldi\nMarcus Uneson\nYu Usami\nPetro Verkhogliad\nPeter Wang\nZhe Wang\nCharlotte Wilson\nChuck Wooters\nSteven Xu\nBeracah Yankama\nLei Ye (叶磊)\nPatrick Ye\nGeraldine Sim Wei Ying\nJason Yoder\nThomas Zieglier\n0ssifrage\nducki13\nkiwipi\nlade\nisnowfy\nonesandzeros\npquentin\nwvanlint\nÁlvaro Justen https://github.com/turicas\nbjut-hz\nSergio Oller\nWill Monroe\nElijah Rippeth\nEmil Manukyan\nCasper Lehmann-Strøm\nAndrew Giel\nTanin Na Nakorn\nLinghao Zhang\nColin Carroll\nHeguang Miao\nHannah Aizenman (story645)\nGeorge Berry\nAdam Nelson\nJ Richard Snape\nAlex Constantin alex@keyworder.ch\nTsolak Ghukasyan\nPrasasto Adi\nSafwan Kamarrudin\nArthur Tilley\nVilhjalmur Thorsteinsson\nJaehoon Hwang https://github.com/jaehoonhwang\nChintan Shah https://github.com/chintanshah24\nsbagan\nZicheng Xu\nAlbert Au Yeung https://github.com/albertauyeung\nShenjian Zhao\nDeng Wang https://github.com/lmatt-bit\nAli Abdullah\nStoytcho Stoytchev\nLakhdar Benzahia\nKheireddine Abainia https://github.com/xprogramer\nYibin Lin https://github.com/yibinlin\nArtiem Krinitsyn\nBjörn Mattsson\nOleg Chislov\nPavan Gururaj Joshi https://github.com/PavanGJ\nEthan Hill https://github.com/hill1303\nVivek Lakshmanan\nSomnath Rakshit https://github.com/somnathrakshit\nAnlan Du\nPulkit Maloo https://github.com/pulkitmaloo\nBrandon M. Burroughs https://github.com/brandonmburroughs\nJohn Stewart https://github.com/free-variation\nIaroslav Tymchenko https://github.com/myproblemchild\nAleš Tamchyna\nTim Gianitsos https://github.com/timgianitsos\nPhilippe Partarrieu https://github.com/ppartarr\nAndrew Owen Martin\nAdrian Ellis https://github.com/adrianjellis\nNat Quayle Nelson https://github.com/nqnstudios\nYanpeng Zhao https://github.com/zhaoyanpeng\nMatan Rak https://github.com/matanrak\nNick Ulle https://github.com/nick-ulle\nUday Krishna https://github.com/udaykrishna\nOsman Zubair https://github.com/okz12\nViresh Gupta https://github.com/virresh\nOndřej Cífka https://github.com/cifkao\nIris X. Zhou https://github.com/irisxzhou\nDevashish Lal https://github.com/BLaZeKiLL\nGerhard Kremer https://github.com/GerhardKa\nNicolas Darr https://github.com/ndarr\nHervé Nicol https://github.com/hervenicol\nAlexandre H. T. Dias https://github.com/alexandredias3d\nDaksh Shah https://github.com/Daksh\nJacob Weightman https://github.com/jacobdweightman\nBonifacio de Oliveira https://github.com/Bonifacio2\nArmins Bagrats Stepanjans https://github.com/ab-10\nVassilis Palassopoulos https://github.com/palasso\nRam Rachum https://github.com/cool-RR\nOr Sharir https://github.com/orsharir\nDenali Molitor https://github.com/dmmolitor\nJacob Moorman https://github.com/jdmoorman\nCory Nezin https://github.com/corynezin\nMatt Chaput\nDanny Sepler https://github.com/dannysepler\nAkshita Bhagia https://github.com/AkshitaB\nPratap Yadav https://github.com/prtpydv\nHiroki Teranishi https://github.com/chantera\nRuben Cartuyvels https://github.com/rubencart\nDalton Pearson https://github.com/daltonpearson\nRobby Horvath https://github.com/robbyhorvath\nGavish Poddar https://github.com/gavishpoddar\nSaibo Geng https://github.com/Saibo-creator\nAhmet Yildirim https://github.com/RnDevelover\nYuta Nakamura https://github.com/yutanakamura-tky\nAdam Hawley https://github.com/adamjhawley\nPanagiotis Simakis https://github.com/sp1thas\nRichard Wang https://github.com/richarddwang\nAlexandre Perez-Lebel https://github.com/aperezlebel\nFernando Carranza https://github.com/fernandocar86\nMartin Kondratzky https://github.com/martinkondra\nHeungson Lee https://github.com/heungson\nM.K. Pawelkiewicz https://github.com/hamiltonianflow\nSteven Thomas Smith https://github.com/essandess\nJan Lennartz https://github.com/Madnex"
+    "text": "This project was started and continues to be led by Brian E. Granger (ellisonbg AT gmail DOT com). Min Ragan-Kelley (benjaminrk AT gmail DOT com) is the primary developer of pyzmq at this time.\nThe following people have contributed to the project:\n\nAlexander Else (alexander DOT else AT team DOT telstra DOT com)\nAlexander Pyhalov (apyhalov AT gmail DOT com)\nAlexandr Emelin (frvzmb AT gmail DOT com)\nAmr Ali (amr AT ledgerx DOT com)\nAndre Caron (andre DOT l DOT caron AT gmail DOT com)\nAndrea Crotti (andrea DOT crotti DOT 0 AT gmail DOT com)\nAndrew Gwozdziewycz (git AT apgwoz DOT com)\nBaptiste Lepilleur (baptiste DOT lepilleur AT gmail DOT com)\nBrandyn A. White (bwhite AT dappervision DOT com)\nBrian E. Granger (ellisonbg AT gmail DOT com)\nBrian Hoffman (hoffman_brian AT bah DOT com)\nCarlos A. Rocha (carlos DOT rocha AT gmail DOT com)\nChris Laws (clawsicus AT gmail DOT com)\nChristian Wyglendowski (christian AT bu DOT mp)\nChristoph Gohlke (cgohlke AT uci DOT edu)\nCurtis (curtis AT tinbrain DOT net)\nCyril Holweck (cyril DOT holweck AT free DOT fr)\nDan Colish (dcolish AT gmail DOT com)\nDaniel Lundin (dln AT eintr DOT org)\nDaniel Truemper (truemped AT googlemail DOT com)\nDouglas Creager (douglas DOT creager AT redjack DOT com)\nEduardo Stalinho (eduardooc DOT 86 AT gmail DOT com)\nEren Güven (erenguven0 AT gmail DOT com)\nErick Tryzelaar (erick DOT tryzelaar AT gmail DOT com)\nErik Tollerud (erik DOT tollerud AT gmail DOT com)\nFELD Boris (lothiraldan AT gmail DOT com)\nFantix King (fantix DOT king AT gmail DOT com)\nFelipe Cruz (felipecruz AT loogica DOT net)\nFernando Perez (Fernando DOT Perez AT berkeley DOT edu)\nFrank Wiles (frank AT revsys DOT com)\nFélix-Antoine Fortin (felix DOT antoine DOT fortin AT gmail DOT com)\nGavrie Philipson (gavriep AT il DOT ibm DOT com)\nGodefroid Chapelle (gotcha AT bubblenet DOT be)\nGreg Banks (gbanks AT mybasis DOT com)\nGreg Ward (greg AT gerg DOT ca)\nGuido Goldstein (github AT a-nugget DOT de)\nIan Lee (IanLee1521 AT gmail DOT com)\nIonuț Arțăriși (ionut AT artarisi DOT eu)\nIvo Danihelka (ivo AT danihelka DOT net)\nIyed (iyed DOT bennour AT gmail DOT com)\nJim Garrison (jim AT garrison DOT cc)\nJohn Gallagher (johnkgallagher AT gmail DOT com)\nJulian Taylor (jtaylor DOT debian AT googlemail DOT com)\nJustin Bronder (jsbronder AT gmail DOT com)\nJustin Riley (justin DOT t DOT riley AT gmail DOT com)\nMarc Abramowitz (marc AT marc-abramowitz DOT com)\nMatthew Aburn (mattja6 AT gmail DOT com)\nMichel Pelletier (pelletier DOT michel AT gmail DOT com)\nMichel Zou (xantares09 AT hotmail DOT com)\nMin Ragan-Kelley (benjaminrk AT gmail DOT com)\nNell Hardcastle (nell AT dev-nell DOT com)\nNicholas Pilkington (nicholas DOT pilkington AT gmail DOT com)\nNicholas Piël (nicholas AT nichol DOT as)\nNick Pellegrino (npellegrino AT mozilla DOT com)\nNicolas Delaby (nicolas DOT delaby AT ezeep DOT com)\nOndrej Certik (ondrej AT certik DOT cz)\nPaul Colomiets (paul AT colomiets DOT name)\nPawel Jasinski (pawel DOT jasinski AT gmail DOT com)\nPhus Lu (phus DOT lu AT gmail DOT com)\nRobert Buchholz (rbu AT goodpoint DOT de)\nRobert Jordens (jordens AT gmail DOT com)\nRyan Cox (ryan DOT a DOT cox AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nScott Maxwell (scott AT codecobblers DOT com)\nScott Sadler (github AT mashi DOT org)\nSimon Knight (simon DOT knight AT gmail DOT com)\nStefan Friesel (sf AT cloudcontrol DOT de)\nStefan van der Walt (stefan AT sun DOT ac DOT za)\nStephen Diehl (stephen DOT m DOT diehl AT gmail DOT com)\nSylvain Corlay (scorlay AT bloomberg DOT net)\nThomas Kluyver (takowl AT gmail DOT com)\nThomas Spura (tomspur AT fedoraproject DOT org)\nTigger Bear (Tigger AT Tiggers-Mac-mini DOT local)\nTorsten Landschoff (torsten DOT landschoff AT dynamore DOT de)\nVadim Markovtsev (v DOT markovtsev AT samsung DOT com)\nYannick Hold (yannickhold AT gmail DOT com)\nZbigniew Jędrzejewski-Szmek (zbyszek AT in DOT waw DOT pl)\nhugo shi (hugoshi AT bleb2 DOT (none))\njdgleeson (jdgleeson AT mac DOT com)\nkyledj (kyle AT bucebuce DOT com)\nspez (steve AT hipmunk DOT com)\nstu (stuart DOT axon AT jpcreative DOT co DOT uk)\nxantares (xantares AT fujitsu-l64 DOT (none))\n\nas reported by:\ngit log --all --format='- %aN (%aE)' | sort -u | sed 's/@/ AT /1' | sed -e 's/\\.\\([^ ]\\)/ DOT \\1/g'\nwith some adjustments.\n\n\n\nBrandon Craig-Rhodes (brandon AT rhodesmill DOT org)\nEugene Chernyshov (chernyshov DOT eugene AT gmail DOT com)\nCraig Austin (craig DOT austin AT gmail DOT com)\n\n\n\n\n\nTravis Cline (travis DOT cline AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nZachary Voase (z AT zacharyvoase DOT com)"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#others-whose-work-weve-taken-and-included-in-nltk-but-who-didnt-directly-contribute-it",
-    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#others-whose-work-weve-taken-and-included-in-nltk-but-who-didnt-directly-contribute-it",
-    "title": "Natural Language Toolkit (NLTK) Authors",
+    "objectID": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html#authors",
+    "href": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html#authors",
+    "title": "",
     "section": "",
-    "text": "Martin Porter\nVivake Gupta\nBarry Wilkins\nHiranmay Ghosh\nChris Emerson\n\n\n\n\n\nAssem Chelli\nAbdelkrim Aries\nLakhdar Benzahia"
+    "text": "This project was started and continues to be led by Brian E. Granger (ellisonbg AT gmail DOT com). Min Ragan-Kelley (benjaminrk AT gmail DOT com) is the primary developer of pyzmq at this time.\nThe following people have contributed to the project:\n\nAlexander Else (alexander DOT else AT team DOT telstra DOT com)\nAlexander Pyhalov (apyhalov AT gmail DOT com)\nAlexandr Emelin (frvzmb AT gmail DOT com)\nAmr Ali (amr AT ledgerx DOT com)\nAndre Caron (andre DOT l DOT caron AT gmail DOT com)\nAndrea Crotti (andrea DOT crotti DOT 0 AT gmail DOT com)\nAndrew Gwozdziewycz (git AT apgwoz DOT com)\nBaptiste Lepilleur (baptiste DOT lepilleur AT gmail DOT com)\nBrandyn A. White (bwhite AT dappervision DOT com)\nBrian E. Granger (ellisonbg AT gmail DOT com)\nBrian Hoffman (hoffman_brian AT bah DOT com)\nCarlos A. Rocha (carlos DOT rocha AT gmail DOT com)\nChris Laws (clawsicus AT gmail DOT com)\nChristian Wyglendowski (christian AT bu DOT mp)\nChristoph Gohlke (cgohlke AT uci DOT edu)\nCurtis (curtis AT tinbrain DOT net)\nCyril Holweck (cyril DOT holweck AT free DOT fr)\nDan Colish (dcolish AT gmail DOT com)\nDaniel Lundin (dln AT eintr DOT org)\nDaniel Truemper (truemped AT googlemail DOT com)\nDouglas Creager (douglas DOT creager AT redjack DOT com)\nEduardo Stalinho (eduardooc DOT 86 AT gmail DOT com)\nEren Güven (erenguven0 AT gmail DOT com)\nErick Tryzelaar (erick DOT tryzelaar AT gmail DOT com)\nErik Tollerud (erik DOT tollerud AT gmail DOT com)\nFELD Boris (lothiraldan AT gmail DOT com)\nFantix King (fantix DOT king AT gmail DOT com)\nFelipe Cruz (felipecruz AT loogica DOT net)\nFernando Perez (Fernando DOT Perez AT berkeley DOT edu)\nFrank Wiles (frank AT revsys DOT com)\nFélix-Antoine Fortin (felix DOT antoine DOT fortin AT gmail DOT com)\nGavrie Philipson (gavriep AT il DOT ibm DOT com)\nGodefroid Chapelle (gotcha AT bubblenet DOT be)\nGreg Banks (gbanks AT mybasis DOT com)\nGreg Ward (greg AT gerg DOT ca)\nGuido Goldstein (github AT a-nugget DOT de)\nIan Lee (IanLee1521 AT gmail DOT com)\nIonuț Arțăriși (ionut AT artarisi DOT eu)\nIvo Danihelka (ivo AT danihelka DOT net)\nIyed (iyed DOT bennour AT gmail DOT com)\nJim Garrison (jim AT garrison DOT cc)\nJohn Gallagher (johnkgallagher AT gmail DOT com)\nJulian Taylor (jtaylor DOT debian AT googlemail DOT com)\nJustin Bronder (jsbronder AT gmail DOT com)\nJustin Riley (justin DOT t DOT riley AT gmail DOT com)\nMarc Abramowitz (marc AT marc-abramowitz DOT com)\nMatthew Aburn (mattja6 AT gmail DOT com)\nMichel Pelletier (pelletier DOT michel AT gmail DOT com)\nMichel Zou (xantares09 AT hotmail DOT com)\nMin Ragan-Kelley (benjaminrk AT gmail DOT com)\nNell Hardcastle (nell AT dev-nell DOT com)\nNicholas Pilkington (nicholas DOT pilkington AT gmail DOT com)\nNicholas Piël (nicholas AT nichol DOT as)\nNick Pellegrino (npellegrino AT mozilla DOT com)\nNicolas Delaby (nicolas DOT delaby AT ezeep DOT com)\nOndrej Certik (ondrej AT certik DOT cz)\nPaul Colomiets (paul AT colomiets DOT name)\nPawel Jasinski (pawel DOT jasinski AT gmail DOT com)\nPhus Lu (phus DOT lu AT gmail DOT com)\nRobert Buchholz (rbu AT goodpoint DOT de)\nRobert Jordens (jordens AT gmail DOT com)\nRyan Cox (ryan DOT a DOT cox AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nScott Maxwell (scott AT codecobblers DOT com)\nScott Sadler (github AT mashi DOT org)\nSimon Knight (simon DOT knight AT gmail DOT com)\nStefan Friesel (sf AT cloudcontrol DOT de)\nStefan van der Walt (stefan AT sun DOT ac DOT za)\nStephen Diehl (stephen DOT m DOT diehl AT gmail DOT com)\nSylvain Corlay (scorlay AT bloomberg DOT net)\nThomas Kluyver (takowl AT gmail DOT com)\nThomas Spura (tomspur AT fedoraproject DOT org)\nTigger Bear (Tigger AT Tiggers-Mac-mini DOT local)\nTorsten Landschoff (torsten DOT landschoff AT dynamore DOT de)\nVadim Markovtsev (v DOT markovtsev AT samsung DOT com)\nYannick Hold (yannickhold AT gmail DOT com)\nZbigniew Jędrzejewski-Szmek (zbyszek AT in DOT waw DOT pl)\nhugo shi (hugoshi AT bleb2 DOT (none))\njdgleeson (jdgleeson AT mac DOT com)\nkyledj (kyle AT bucebuce DOT com)\nspez (steve AT hipmunk DOT com)\nstu (stuart DOT axon AT jpcreative DOT co DOT uk)\nxantares (xantares AT fujitsu-l64 DOT (none))\n\nas reported by:\ngit log --all --format='- %aN (%aE)' | sort -u | sed 's/@/ AT /1' | sed -e 's/\\.\\([^ ]\\)/ DOT \\1/g'\nwith some adjustments.\n\n\n\nBrandon Craig-Rhodes (brandon AT rhodesmill DOT org)\nEugene Chernyshov (chernyshov DOT eugene AT gmail DOT com)\nCraig Austin (craig DOT austin AT gmail DOT com)\n\n\n\n\n\nTravis Cline (travis DOT cline AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nZachary Voase (z AT zacharyvoase DOT com)"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/soupsieve-2.5.dist-info/licenses/LICENSE.html",
-    "href": "script_venv/lib/python3.8/site-packages/soupsieve-2.5.dist-info/licenses/LICENSE.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
+    "href": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "MIT License\nCopyright (c) 2018 - 2023 Isaac Muse isaacmuse@gmail.com\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n\n\n Back to top"
+    "text": "Copyright © 2020, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
-    "href": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
+    "href": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
     "title": "",
     "section": "",
-    "text": "import sys\nimport wasabi\n\nwasabi.msg.warn(\"This is a test. This is only a test.\")\nif sys.version_info &gt;= (3, 7):\n    assert wasabi.util.supports_ansi()\n\nprint(sys.stdout)\n\n\n\n\n Back to top"
+    "text": "Back to top"
   },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/QtPy-2.4.1.dist-info/AUTHORS.html",
@@ -347,100 +335,124 @@
     "text": "The QtPy Contributors"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
-    "href": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
+    "href": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
     "title": "",
     "section": "",
-    "text": "Back to top"
+    "text": "import sys\nimport wasabi\n\nwasabi.msg.warn(\"This is a test. This is only a test.\")\nif sys.version_info &gt;= (3, 7):\n    assert wasabi.util.supports_ansi()\n\nprint(sys.stdout)\n\n\n\n\n Back to top"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
-    "href": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/soupsieve-2.5.dist-info/licenses/LICENSE.html",
+    "href": "script_venv/lib/python3.8/site-packages/soupsieve-2.5.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "Copyright © 2020, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
+    "text": "MIT License\nCopyright (c) 2018 - 2023 Isaac Muse isaacmuse@gmail.com\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n\n\n Back to top"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html",
-    "href": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html",
-    "title": "",
+    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html",
+    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html",
+    "title": "Natural Language Toolkit (NLTK) Authors",
     "section": "",
-    "text": "This project was started and continues to be led by Brian E. Granger (ellisonbg AT gmail DOT com). Min Ragan-Kelley (benjaminrk AT gmail DOT com) is the primary developer of pyzmq at this time.\nThe following people have contributed to the project:\n\nAlexander Else (alexander DOT else AT team DOT telstra DOT com)\nAlexander Pyhalov (apyhalov AT gmail DOT com)\nAlexandr Emelin (frvzmb AT gmail DOT com)\nAmr Ali (amr AT ledgerx DOT com)\nAndre Caron (andre DOT l DOT caron AT gmail DOT com)\nAndrea Crotti (andrea DOT crotti DOT 0 AT gmail DOT com)\nAndrew Gwozdziewycz (git AT apgwoz DOT com)\nBaptiste Lepilleur (baptiste DOT lepilleur AT gmail DOT com)\nBrandyn A. White (bwhite AT dappervision DOT com)\nBrian E. Granger (ellisonbg AT gmail DOT com)\nBrian Hoffman (hoffman_brian AT bah DOT com)\nCarlos A. Rocha (carlos DOT rocha AT gmail DOT com)\nChris Laws (clawsicus AT gmail DOT com)\nChristian Wyglendowski (christian AT bu DOT mp)\nChristoph Gohlke (cgohlke AT uci DOT edu)\nCurtis (curtis AT tinbrain DOT net)\nCyril Holweck (cyril DOT holweck AT free DOT fr)\nDan Colish (dcolish AT gmail DOT com)\nDaniel Lundin (dln AT eintr DOT org)\nDaniel Truemper (truemped AT googlemail DOT com)\nDouglas Creager (douglas DOT creager AT redjack DOT com)\nEduardo Stalinho (eduardooc DOT 86 AT gmail DOT com)\nEren Güven (erenguven0 AT gmail DOT com)\nErick Tryzelaar (erick DOT tryzelaar AT gmail DOT com)\nErik Tollerud (erik DOT tollerud AT gmail DOT com)\nFELD Boris (lothiraldan AT gmail DOT com)\nFantix King (fantix DOT king AT gmail DOT com)\nFelipe Cruz (felipecruz AT loogica DOT net)\nFernando Perez (Fernando DOT Perez AT berkeley DOT edu)\nFrank Wiles (frank AT revsys DOT com)\nFélix-Antoine Fortin (felix DOT antoine DOT fortin AT gmail DOT com)\nGavrie Philipson (gavriep AT il DOT ibm DOT com)\nGodefroid Chapelle (gotcha AT bubblenet DOT be)\nGreg Banks (gbanks AT mybasis DOT com)\nGreg Ward (greg AT gerg DOT ca)\nGuido Goldstein (github AT a-nugget DOT de)\nIan Lee (IanLee1521 AT gmail DOT com)\nIonuț Arțăriși (ionut AT artarisi DOT eu)\nIvo Danihelka (ivo AT danihelka DOT net)\nIyed (iyed DOT bennour AT gmail DOT com)\nJim Garrison (jim AT garrison DOT cc)\nJohn Gallagher (johnkgallagher AT gmail DOT com)\nJulian Taylor (jtaylor DOT debian AT googlemail DOT com)\nJustin Bronder (jsbronder AT gmail DOT com)\nJustin Riley (justin DOT t DOT riley AT gmail DOT com)\nMarc Abramowitz (marc AT marc-abramowitz DOT com)\nMatthew Aburn (mattja6 AT gmail DOT com)\nMichel Pelletier (pelletier DOT michel AT gmail DOT com)\nMichel Zou (xantares09 AT hotmail DOT com)\nMin Ragan-Kelley (benjaminrk AT gmail DOT com)\nNell Hardcastle (nell AT dev-nell DOT com)\nNicholas Pilkington (nicholas DOT pilkington AT gmail DOT com)\nNicholas Piël (nicholas AT nichol DOT as)\nNick Pellegrino (npellegrino AT mozilla DOT com)\nNicolas Delaby (nicolas DOT delaby AT ezeep DOT com)\nOndrej Certik (ondrej AT certik DOT cz)\nPaul Colomiets (paul AT colomiets DOT name)\nPawel Jasinski (pawel DOT jasinski AT gmail DOT com)\nPhus Lu (phus DOT lu AT gmail DOT com)\nRobert Buchholz (rbu AT goodpoint DOT de)\nRobert Jordens (jordens AT gmail DOT com)\nRyan Cox (ryan DOT a DOT cox AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nScott Maxwell (scott AT codecobblers DOT com)\nScott Sadler (github AT mashi DOT org)\nSimon Knight (simon DOT knight AT gmail DOT com)\nStefan Friesel (sf AT cloudcontrol DOT de)\nStefan van der Walt (stefan AT sun DOT ac DOT za)\nStephen Diehl (stephen DOT m DOT diehl AT gmail DOT com)\nSylvain Corlay (scorlay AT bloomberg DOT net)\nThomas Kluyver (takowl AT gmail DOT com)\nThomas Spura (tomspur AT fedoraproject DOT org)\nTigger Bear (Tigger AT Tiggers-Mac-mini DOT local)\nTorsten Landschoff (torsten DOT landschoff AT dynamore DOT de)\nVadim Markovtsev (v DOT markovtsev AT samsung DOT com)\nYannick Hold (yannickhold AT gmail DOT com)\nZbigniew Jędrzejewski-Szmek (zbyszek AT in DOT waw DOT pl)\nhugo shi (hugoshi AT bleb2 DOT (none))\njdgleeson (jdgleeson AT mac DOT com)\nkyledj (kyle AT bucebuce DOT com)\nspez (steve AT hipmunk DOT com)\nstu (stuart DOT axon AT jpcreative DOT co DOT uk)\nxantares (xantares AT fujitsu-l64 DOT (none))\n\nas reported by:\ngit log --all --format='- %aN (%aE)' | sort -u | sed 's/@/ AT /1' | sed -e 's/\\.\\([^ ]\\)/ DOT \\1/g'\nwith some adjustments.\n\n\n\nBrandon Craig-Rhodes (brandon AT rhodesmill DOT org)\nEugene Chernyshov (chernyshov DOT eugene AT gmail DOT com)\nCraig Austin (craig DOT austin AT gmail DOT com)\n\n\n\n\n\nTravis Cline (travis DOT cline AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nZachary Voase (z AT zacharyvoase DOT com)"
+    "text": "Steven Bird stevenbird1@gmail.com\nEdward Loper edloper@gmail.com\nEwan Klein ewan@inf.ed.ac.uk\n\n\n\n\n\nTom Aarsen\nRami Al-Rfou’\nMark Amery\nGreg Aumann\nIvan Barria\nIngolf Becker\nYonatan Becker\nPaul Bedaride\nSteven Bethard\nRobert Berwick\nDan Blanchard\nNathan Bodenstab\nAlexander Böhm\nFrancis Bond\nPaul Bone\nJordan Boyd-Graber\nDaniel Blanchard\nPhil Blunsom\nLars Buitinck\nCristian Capdevila\nSteve Cassidy\nChen-Fu Chiang\nDmitry Chichkov\nJinyoung Choi\nAndrew Clausen\nLucas Champollion\nGraham Christensen\nTrevor Cohn\nDavid Coles\nTom Conroy https://github.com/tconroy\nClaude Coulombe\nLucas Cooper\nRobin Cooper\nChris Crowner\nJames Curran\nArthur Darcet\nDariel Dato-on\nSelina Dennis\nLeon Derczynski\nAlexis Dimitriadis\nNikhil Dinesh\nLiang Dong\nDavid Doukhan\nRebecca Dridan\nPablo Duboue\nLong Duong\nChristian Federmann\nCampion Fellin\nMichelle Fullwood\nDan Garrette\nMaciej Gawinecki\nJean Mark Gawron\nSumukh Ghodke\nYoav Goldberg\nMichael Wayne Goodman\nDougal Graham\nBrent Gray\nSimon Greenhill\nClark Grubb\nEduardo Pereira Habkost\nMasato Hagiwara\nLauri Hallila\nMichael Hansen\nYurie Hara\nWill Hardy\nTyler Hartley\nPeter Hawkins\nSaimadhav Heblikar\nFredrik Hedman\nHelder\nMichael Heilman\nOfer Helman\nChristopher Hench\nBruce Hill\nAmy Holland\nKristy Hollingshead\nMarcus Huderle\nBaden Hughes\nNancy Ide\nRebecca Ingram\nEdward Ivanovic\nThomas Jakobsen\nNick Johnson\nEric Kafe\nPiotr Kasprzyk\nAngelos Katharopoulos\nSudharshan Kaushik\nChris Koenig\nMikhail Korobov\nDenis Krusko\nIlia Kurenkov\nStefano Lattarini\nPierre-François Laquerre\nStefano Lattarini\nHaejoong Lee\nJackson Lee\nMax Leonov\nChris Liechti\nHyuckin David Lim\nTom Lippincott\nPeter Ljunglöf\nAlex Louden\nJoseph Lynch\nNitin Madnani\nFelipe Madrigal\nBjørn Mæland\nDean Malmgren\nChristopher Maloof\nRob Malouf\nIker Manterola\nCarl de Marcken\nMitch Marcus\nTorsten Marek\nRobert Marshall\nMarius Mather\nDuncan McGreggor\nDavid McClosky\nXinfan Meng\nDmitrijs Milajevs\nMargaret Mitchell\nTomonori Nagano\nJason Narad\nShari A’aidil Nasruddin\nLance Nathan\nMorten Neergaard\nDavid Nemeskey\nEric Nichols\nJoel Nothman\nAlireza Nourian\nAlexander Oleynikov\nPierpaolo Pantone\nTed Pedersen\nJacob Perkins\nAlberto Planas\nOndrej Platek\nAlessandro Presta\nQi Liu\nMartin Thorsen Ranang\nMichael Recachinas\nBrandon Rhodes\nJoshua Ritterman\nWill Roberts\nStuart Robinson\nCarlos Rodriguez\nLorenzo Rubio\nAlex Rudnick\nJussi Salmela\nGeoffrey Sampson\nKepa Sarasola\nKevin Scannell\nNathan Schneider\nRico Sennrich\nThomas Skardal\nEric Smith\nLynn Soe\nRob Speer\nPeter Spiller\nRichard Sproat\nCeri Stagg\nPeter Stahl\nOliver Steele\nThomas Stieglmaier\nJan Strunk\nLiling Tan\nClaire Taylor\nLouis Tiao\nSteven Tomcavage\nTiago Tresoldi\nMarcus Uneson\nYu Usami\nPetro Verkhogliad\nPeter Wang\nZhe Wang\nCharlotte Wilson\nChuck Wooters\nSteven Xu\nBeracah Yankama\nLei Ye (叶磊)\nPatrick Ye\nGeraldine Sim Wei Ying\nJason Yoder\nThomas Zieglier\n0ssifrage\nducki13\nkiwipi\nlade\nisnowfy\nonesandzeros\npquentin\nwvanlint\nÁlvaro Justen https://github.com/turicas\nbjut-hz\nSergio Oller\nWill Monroe\nElijah Rippeth\nEmil Manukyan\nCasper Lehmann-Strøm\nAndrew Giel\nTanin Na Nakorn\nLinghao Zhang\nColin Carroll\nHeguang Miao\nHannah Aizenman (story645)\nGeorge Berry\nAdam Nelson\nJ Richard Snape\nAlex Constantin alex@keyworder.ch\nTsolak Ghukasyan\nPrasasto Adi\nSafwan Kamarrudin\nArthur Tilley\nVilhjalmur Thorsteinsson\nJaehoon Hwang https://github.com/jaehoonhwang\nChintan Shah https://github.com/chintanshah24\nsbagan\nZicheng Xu\nAlbert Au Yeung https://github.com/albertauyeung\nShenjian Zhao\nDeng Wang https://github.com/lmatt-bit\nAli Abdullah\nStoytcho Stoytchev\nLakhdar Benzahia\nKheireddine Abainia https://github.com/xprogramer\nYibin Lin https://github.com/yibinlin\nArtiem Krinitsyn\nBjörn Mattsson\nOleg Chislov\nPavan Gururaj Joshi https://github.com/PavanGJ\nEthan Hill https://github.com/hill1303\nVivek Lakshmanan\nSomnath Rakshit https://github.com/somnathrakshit\nAnlan Du\nPulkit Maloo https://github.com/pulkitmaloo\nBrandon M. Burroughs https://github.com/brandonmburroughs\nJohn Stewart https://github.com/free-variation\nIaroslav Tymchenko https://github.com/myproblemchild\nAleš Tamchyna\nTim Gianitsos https://github.com/timgianitsos\nPhilippe Partarrieu https://github.com/ppartarr\nAndrew Owen Martin\nAdrian Ellis https://github.com/adrianjellis\nNat Quayle Nelson https://github.com/nqnstudios\nYanpeng Zhao https://github.com/zhaoyanpeng\nMatan Rak https://github.com/matanrak\nNick Ulle https://github.com/nick-ulle\nUday Krishna https://github.com/udaykrishna\nOsman Zubair https://github.com/okz12\nViresh Gupta https://github.com/virresh\nOndřej Cífka https://github.com/cifkao\nIris X. Zhou https://github.com/irisxzhou\nDevashish Lal https://github.com/BLaZeKiLL\nGerhard Kremer https://github.com/GerhardKa\nNicolas Darr https://github.com/ndarr\nHervé Nicol https://github.com/hervenicol\nAlexandre H. T. Dias https://github.com/alexandredias3d\nDaksh Shah https://github.com/Daksh\nJacob Weightman https://github.com/jacobdweightman\nBonifacio de Oliveira https://github.com/Bonifacio2\nArmins Bagrats Stepanjans https://github.com/ab-10\nVassilis Palassopoulos https://github.com/palasso\nRam Rachum https://github.com/cool-RR\nOr Sharir https://github.com/orsharir\nDenali Molitor https://github.com/dmmolitor\nJacob Moorman https://github.com/jdmoorman\nCory Nezin https://github.com/corynezin\nMatt Chaput\nDanny Sepler https://github.com/dannysepler\nAkshita Bhagia https://github.com/AkshitaB\nPratap Yadav https://github.com/prtpydv\nHiroki Teranishi https://github.com/chantera\nRuben Cartuyvels https://github.com/rubencart\nDalton Pearson https://github.com/daltonpearson\nRobby Horvath https://github.com/robbyhorvath\nGavish Poddar https://github.com/gavishpoddar\nSaibo Geng https://github.com/Saibo-creator\nAhmet Yildirim https://github.com/RnDevelover\nYuta Nakamura https://github.com/yutanakamura-tky\nAdam Hawley https://github.com/adamjhawley\nPanagiotis Simakis https://github.com/sp1thas\nRichard Wang https://github.com/richarddwang\nAlexandre Perez-Lebel https://github.com/aperezlebel\nFernando Carranza https://github.com/fernandocar86\nMartin Kondratzky https://github.com/martinkondra\nHeungson Lee https://github.com/heungson\nM.K. Pawelkiewicz https://github.com/hamiltonianflow\nSteven Thomas Smith https://github.com/essandess\nJan Lennartz https://github.com/Madnex\n\n\n\n\n\n\n\nMartin Porter\nVivake Gupta\nBarry Wilkins\nHiranmay Ghosh\nChris Emerson\n\n\n\n\n\nAssem Chelli\nAbdelkrim Aries\nLakhdar Benzahia"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html#authors",
-    "href": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html#authors",
-    "title": "",
+    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#original-authors",
+    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#original-authors",
+    "title": "Natural Language Toolkit (NLTK) Authors",
     "section": "",
-    "text": "This project was started and continues to be led by Brian E. Granger (ellisonbg AT gmail DOT com). Min Ragan-Kelley (benjaminrk AT gmail DOT com) is the primary developer of pyzmq at this time.\nThe following people have contributed to the project:\n\nAlexander Else (alexander DOT else AT team DOT telstra DOT com)\nAlexander Pyhalov (apyhalov AT gmail DOT com)\nAlexandr Emelin (frvzmb AT gmail DOT com)\nAmr Ali (amr AT ledgerx DOT com)\nAndre Caron (andre DOT l DOT caron AT gmail DOT com)\nAndrea Crotti (andrea DOT crotti DOT 0 AT gmail DOT com)\nAndrew Gwozdziewycz (git AT apgwoz DOT com)\nBaptiste Lepilleur (baptiste DOT lepilleur AT gmail DOT com)\nBrandyn A. White (bwhite AT dappervision DOT com)\nBrian E. Granger (ellisonbg AT gmail DOT com)\nBrian Hoffman (hoffman_brian AT bah DOT com)\nCarlos A. Rocha (carlos DOT rocha AT gmail DOT com)\nChris Laws (clawsicus AT gmail DOT com)\nChristian Wyglendowski (christian AT bu DOT mp)\nChristoph Gohlke (cgohlke AT uci DOT edu)\nCurtis (curtis AT tinbrain DOT net)\nCyril Holweck (cyril DOT holweck AT free DOT fr)\nDan Colish (dcolish AT gmail DOT com)\nDaniel Lundin (dln AT eintr DOT org)\nDaniel Truemper (truemped AT googlemail DOT com)\nDouglas Creager (douglas DOT creager AT redjack DOT com)\nEduardo Stalinho (eduardooc DOT 86 AT gmail DOT com)\nEren Güven (erenguven0 AT gmail DOT com)\nErick Tryzelaar (erick DOT tryzelaar AT gmail DOT com)\nErik Tollerud (erik DOT tollerud AT gmail DOT com)\nFELD Boris (lothiraldan AT gmail DOT com)\nFantix King (fantix DOT king AT gmail DOT com)\nFelipe Cruz (felipecruz AT loogica DOT net)\nFernando Perez (Fernando DOT Perez AT berkeley DOT edu)\nFrank Wiles (frank AT revsys DOT com)\nFélix-Antoine Fortin (felix DOT antoine DOT fortin AT gmail DOT com)\nGavrie Philipson (gavriep AT il DOT ibm DOT com)\nGodefroid Chapelle (gotcha AT bubblenet DOT be)\nGreg Banks (gbanks AT mybasis DOT com)\nGreg Ward (greg AT gerg DOT ca)\nGuido Goldstein (github AT a-nugget DOT de)\nIan Lee (IanLee1521 AT gmail DOT com)\nIonuț Arțăriși (ionut AT artarisi DOT eu)\nIvo Danihelka (ivo AT danihelka DOT net)\nIyed (iyed DOT bennour AT gmail DOT com)\nJim Garrison (jim AT garrison DOT cc)\nJohn Gallagher (johnkgallagher AT gmail DOT com)\nJulian Taylor (jtaylor DOT debian AT googlemail DOT com)\nJustin Bronder (jsbronder AT gmail DOT com)\nJustin Riley (justin DOT t DOT riley AT gmail DOT com)\nMarc Abramowitz (marc AT marc-abramowitz DOT com)\nMatthew Aburn (mattja6 AT gmail DOT com)\nMichel Pelletier (pelletier DOT michel AT gmail DOT com)\nMichel Zou (xantares09 AT hotmail DOT com)\nMin Ragan-Kelley (benjaminrk AT gmail DOT com)\nNell Hardcastle (nell AT dev-nell DOT com)\nNicholas Pilkington (nicholas DOT pilkington AT gmail DOT com)\nNicholas Piël (nicholas AT nichol DOT as)\nNick Pellegrino (npellegrino AT mozilla DOT com)\nNicolas Delaby (nicolas DOT delaby AT ezeep DOT com)\nOndrej Certik (ondrej AT certik DOT cz)\nPaul Colomiets (paul AT colomiets DOT name)\nPawel Jasinski (pawel DOT jasinski AT gmail DOT com)\nPhus Lu (phus DOT lu AT gmail DOT com)\nRobert Buchholz (rbu AT goodpoint DOT de)\nRobert Jordens (jordens AT gmail DOT com)\nRyan Cox (ryan DOT a DOT cox AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nScott Maxwell (scott AT codecobblers DOT com)\nScott Sadler (github AT mashi DOT org)\nSimon Knight (simon DOT knight AT gmail DOT com)\nStefan Friesel (sf AT cloudcontrol DOT de)\nStefan van der Walt (stefan AT sun DOT ac DOT za)\nStephen Diehl (stephen DOT m DOT diehl AT gmail DOT com)\nSylvain Corlay (scorlay AT bloomberg DOT net)\nThomas Kluyver (takowl AT gmail DOT com)\nThomas Spura (tomspur AT fedoraproject DOT org)\nTigger Bear (Tigger AT Tiggers-Mac-mini DOT local)\nTorsten Landschoff (torsten DOT landschoff AT dynamore DOT de)\nVadim Markovtsev (v DOT markovtsev AT samsung DOT com)\nYannick Hold (yannickhold AT gmail DOT com)\nZbigniew Jędrzejewski-Szmek (zbyszek AT in DOT waw DOT pl)\nhugo shi (hugoshi AT bleb2 DOT (none))\njdgleeson (jdgleeson AT mac DOT com)\nkyledj (kyle AT bucebuce DOT com)\nspez (steve AT hipmunk DOT com)\nstu (stuart DOT axon AT jpcreative DOT co DOT uk)\nxantares (xantares AT fujitsu-l64 DOT (none))\n\nas reported by:\ngit log --all --format='- %aN (%aE)' | sort -u | sed 's/@/ AT /1' | sed -e 's/\\.\\([^ ]\\)/ DOT \\1/g'\nwith some adjustments.\n\n\n\nBrandon Craig-Rhodes (brandon AT rhodesmill DOT org)\nEugene Chernyshov (chernyshov DOT eugene AT gmail DOT com)\nCraig Austin (craig DOT austin AT gmail DOT com)\n\n\n\n\n\nTravis Cline (travis DOT cline AT gmail DOT com)\nRyan Kelly (ryan AT rfk DOT id DOT au)\nZachary Voase (z AT zacharyvoase DOT com)"
+    "text": "Steven Bird stevenbird1@gmail.com\nEdward Loper edloper@gmail.com\nEwan Klein ewan@inf.ed.ac.uk"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html",
-    "href": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html",
-    "title": "UAT for NbAgg backend.",
+    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#contributors",
+    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#contributors",
+    "title": "Natural Language Toolkit (NLTK) Authors",
     "section": "",
-    "text": "from imp import reload\nThe first line simply reloads matplotlib, uses the nbagg backend and then reloads the backend, just to ensure we have the latest modification to the backend code. Note: The underlying JavaScript will not be updated by this process, so a refresh of the browser after clearing the output and saving is necessary to clear everything fully.\nimport matplotlib\nreload(matplotlib)\n\nmatplotlib.use('nbagg')\n\nimport matplotlib.backends.backend_nbagg\nreload(matplotlib.backends.backend_nbagg)"
+    "text": "Tom Aarsen\nRami Al-Rfou’\nMark Amery\nGreg Aumann\nIvan Barria\nIngolf Becker\nYonatan Becker\nPaul Bedaride\nSteven Bethard\nRobert Berwick\nDan Blanchard\nNathan Bodenstab\nAlexander Böhm\nFrancis Bond\nPaul Bone\nJordan Boyd-Graber\nDaniel Blanchard\nPhil Blunsom\nLars Buitinck\nCristian Capdevila\nSteve Cassidy\nChen-Fu Chiang\nDmitry Chichkov\nJinyoung Choi\nAndrew Clausen\nLucas Champollion\nGraham Christensen\nTrevor Cohn\nDavid Coles\nTom Conroy https://github.com/tconroy\nClaude Coulombe\nLucas Cooper\nRobin Cooper\nChris Crowner\nJames Curran\nArthur Darcet\nDariel Dato-on\nSelina Dennis\nLeon Derczynski\nAlexis Dimitriadis\nNikhil Dinesh\nLiang Dong\nDavid Doukhan\nRebecca Dridan\nPablo Duboue\nLong Duong\nChristian Federmann\nCampion Fellin\nMichelle Fullwood\nDan Garrette\nMaciej Gawinecki\nJean Mark Gawron\nSumukh Ghodke\nYoav Goldberg\nMichael Wayne Goodman\nDougal Graham\nBrent Gray\nSimon Greenhill\nClark Grubb\nEduardo Pereira Habkost\nMasato Hagiwara\nLauri Hallila\nMichael Hansen\nYurie Hara\nWill Hardy\nTyler Hartley\nPeter Hawkins\nSaimadhav Heblikar\nFredrik Hedman\nHelder\nMichael Heilman\nOfer Helman\nChristopher Hench\nBruce Hill\nAmy Holland\nKristy Hollingshead\nMarcus Huderle\nBaden Hughes\nNancy Ide\nRebecca Ingram\nEdward Ivanovic\nThomas Jakobsen\nNick Johnson\nEric Kafe\nPiotr Kasprzyk\nAngelos Katharopoulos\nSudharshan Kaushik\nChris Koenig\nMikhail Korobov\nDenis Krusko\nIlia Kurenkov\nStefano Lattarini\nPierre-François Laquerre\nStefano Lattarini\nHaejoong Lee\nJackson Lee\nMax Leonov\nChris Liechti\nHyuckin David Lim\nTom Lippincott\nPeter Ljunglöf\nAlex Louden\nJoseph Lynch\nNitin Madnani\nFelipe Madrigal\nBjørn Mæland\nDean Malmgren\nChristopher Maloof\nRob Malouf\nIker Manterola\nCarl de Marcken\nMitch Marcus\nTorsten Marek\nRobert Marshall\nMarius Mather\nDuncan McGreggor\nDavid McClosky\nXinfan Meng\nDmitrijs Milajevs\nMargaret Mitchell\nTomonori Nagano\nJason Narad\nShari A’aidil Nasruddin\nLance Nathan\nMorten Neergaard\nDavid Nemeskey\nEric Nichols\nJoel Nothman\nAlireza Nourian\nAlexander Oleynikov\nPierpaolo Pantone\nTed Pedersen\nJacob Perkins\nAlberto Planas\nOndrej Platek\nAlessandro Presta\nQi Liu\nMartin Thorsen Ranang\nMichael Recachinas\nBrandon Rhodes\nJoshua Ritterman\nWill Roberts\nStuart Robinson\nCarlos Rodriguez\nLorenzo Rubio\nAlex Rudnick\nJussi Salmela\nGeoffrey Sampson\nKepa Sarasola\nKevin Scannell\nNathan Schneider\nRico Sennrich\nThomas Skardal\nEric Smith\nLynn Soe\nRob Speer\nPeter Spiller\nRichard Sproat\nCeri Stagg\nPeter Stahl\nOliver Steele\nThomas Stieglmaier\nJan Strunk\nLiling Tan\nClaire Taylor\nLouis Tiao\nSteven Tomcavage\nTiago Tresoldi\nMarcus Uneson\nYu Usami\nPetro Verkhogliad\nPeter Wang\nZhe Wang\nCharlotte Wilson\nChuck Wooters\nSteven Xu\nBeracah Yankama\nLei Ye (叶磊)\nPatrick Ye\nGeraldine Sim Wei Ying\nJason Yoder\nThomas Zieglier\n0ssifrage\nducki13\nkiwipi\nlade\nisnowfy\nonesandzeros\npquentin\nwvanlint\nÁlvaro Justen https://github.com/turicas\nbjut-hz\nSergio Oller\nWill Monroe\nElijah Rippeth\nEmil Manukyan\nCasper Lehmann-Strøm\nAndrew Giel\nTanin Na Nakorn\nLinghao Zhang\nColin Carroll\nHeguang Miao\nHannah Aizenman (story645)\nGeorge Berry\nAdam Nelson\nJ Richard Snape\nAlex Constantin alex@keyworder.ch\nTsolak Ghukasyan\nPrasasto Adi\nSafwan Kamarrudin\nArthur Tilley\nVilhjalmur Thorsteinsson\nJaehoon Hwang https://github.com/jaehoonhwang\nChintan Shah https://github.com/chintanshah24\nsbagan\nZicheng Xu\nAlbert Au Yeung https://github.com/albertauyeung\nShenjian Zhao\nDeng Wang https://github.com/lmatt-bit\nAli Abdullah\nStoytcho Stoytchev\nLakhdar Benzahia\nKheireddine Abainia https://github.com/xprogramer\nYibin Lin https://github.com/yibinlin\nArtiem Krinitsyn\nBjörn Mattsson\nOleg Chislov\nPavan Gururaj Joshi https://github.com/PavanGJ\nEthan Hill https://github.com/hill1303\nVivek Lakshmanan\nSomnath Rakshit https://github.com/somnathrakshit\nAnlan Du\nPulkit Maloo https://github.com/pulkitmaloo\nBrandon M. Burroughs https://github.com/brandonmburroughs\nJohn Stewart https://github.com/free-variation\nIaroslav Tymchenko https://github.com/myproblemchild\nAleš Tamchyna\nTim Gianitsos https://github.com/timgianitsos\nPhilippe Partarrieu https://github.com/ppartarr\nAndrew Owen Martin\nAdrian Ellis https://github.com/adrianjellis\nNat Quayle Nelson https://github.com/nqnstudios\nYanpeng Zhao https://github.com/zhaoyanpeng\nMatan Rak https://github.com/matanrak\nNick Ulle https://github.com/nick-ulle\nUday Krishna https://github.com/udaykrishna\nOsman Zubair https://github.com/okz12\nViresh Gupta https://github.com/virresh\nOndřej Cífka https://github.com/cifkao\nIris X. Zhou https://github.com/irisxzhou\nDevashish Lal https://github.com/BLaZeKiLL\nGerhard Kremer https://github.com/GerhardKa\nNicolas Darr https://github.com/ndarr\nHervé Nicol https://github.com/hervenicol\nAlexandre H. T. Dias https://github.com/alexandredias3d\nDaksh Shah https://github.com/Daksh\nJacob Weightman https://github.com/jacobdweightman\nBonifacio de Oliveira https://github.com/Bonifacio2\nArmins Bagrats Stepanjans https://github.com/ab-10\nVassilis Palassopoulos https://github.com/palasso\nRam Rachum https://github.com/cool-RR\nOr Sharir https://github.com/orsharir\nDenali Molitor https://github.com/dmmolitor\nJacob Moorman https://github.com/jdmoorman\nCory Nezin https://github.com/corynezin\nMatt Chaput\nDanny Sepler https://github.com/dannysepler\nAkshita Bhagia https://github.com/AkshitaB\nPratap Yadav https://github.com/prtpydv\nHiroki Teranishi https://github.com/chantera\nRuben Cartuyvels https://github.com/rubencart\nDalton Pearson https://github.com/daltonpearson\nRobby Horvath https://github.com/robbyhorvath\nGavish Poddar https://github.com/gavishpoddar\nSaibo Geng https://github.com/Saibo-creator\nAhmet Yildirim https://github.com/RnDevelover\nYuta Nakamura https://github.com/yutanakamura-tky\nAdam Hawley https://github.com/adamjhawley\nPanagiotis Simakis https://github.com/sp1thas\nRichard Wang https://github.com/richarddwang\nAlexandre Perez-Lebel https://github.com/aperezlebel\nFernando Carranza https://github.com/fernandocar86\nMartin Kondratzky https://github.com/martinkondra\nHeungson Lee https://github.com/heungson\nM.K. Pawelkiewicz https://github.com/hamiltonianflow\nSteven Thomas Smith https://github.com/essandess\nJan Lennartz https://github.com/Madnex"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html#uat-13---animation",
-    "href": "script_venv/lib/python3.8/site-packages/matplotlib/backends/web_backend/nbagg_uat.html#uat-13---animation",
-    "title": "UAT for NbAgg backend.",
-    "section": "UAT 13 - Animation",
-    "text": "UAT 13 - Animation\nThe following should generate an animated line:\n\nimport matplotlib.animation as animation\nimport numpy as np\n\nfig, ax = plt.subplots()\n\nx = np.arange(0, 2*np.pi, 0.01)        # x-array\nline, = ax.plot(x, np.sin(x))\n\ndef animate(i):\n    line.set_ydata(np.sin(x+i/10.0))  # update the data\n    return line,\n\n#Init only required for blitting to give a clean slate.\ndef init():\n    line.set_ydata(np.ma.array(x, mask=True))\n    return line,\n\nani = animation.FuncAnimation(fig, animate, np.arange(1, 200), init_func=init,\n                              interval=100., blit=True)\nplt.show()\n\n\nUAT 14 - Keyboard shortcuts in IPython after close of figure\nAfter closing the previous figure (with the close button above the figure) the IPython keyboard shortcuts should still function.\n\n\nUAT 15 - Figure face colours\nThe nbagg honours all colours apart from that of the figure.patch. The two plots below should produce a figure with a red background. There should be no yellow figure.\n\nimport matplotlib\nmatplotlib.rcParams.update({'figure.facecolor': 'red',\n                            'savefig.facecolor': 'yellow'})\nplt.figure()\nplt.plot([3, 2, 1])\n\nplt.show()\n\n\n\nUAT 16 - Events\nPressing any keyboard key or mouse button (or scrolling) should cycle the line while the figure has focus. The figure should have focus by default when it is created and re-gain it by clicking on the canvas. Clicking anywhere outside of the figure should release focus, but moving the mouse out of the figure should not release focus.\n\nimport itertools\nfig, ax = plt.subplots()\nx = np.linspace(0,10,10000)\ny = np.sin(x)\nln, = ax.plot(x,y)\nevt = []\ncolors = iter(itertools.cycle(['r', 'g', 'b', 'k', 'c']))\ndef on_event(event):\n    if event.name.startswith('key'):\n        fig.suptitle('%s: %s' % (event.name, event.key))\n    elif event.name == 'scroll_event':\n        fig.suptitle('%s: %s' % (event.name, event.step))\n    else:\n        fig.suptitle('%s: %s' % (event.name, event.button))\n    evt.append(event)\n    ln.set_color(next(colors))\n    fig.canvas.draw()\n    fig.canvas.draw_idle()\n\nfig.canvas.mpl_connect('button_press_event', on_event)\nfig.canvas.mpl_connect('button_release_event', on_event)\nfig.canvas.mpl_connect('scroll_event', on_event)\nfig.canvas.mpl_connect('key_press_event', on_event)\nfig.canvas.mpl_connect('key_release_event', on_event)\n\nplt.show()\n\n\n\nUAT 17 - Timers\nSingle-shot timers follow a completely different code path in the nbagg backend than regular timers (such as those used in the animation example above.) The next set of tests ensures that both “regular” and “single-shot” timers work properly.\nThe following should show a simple clock that updates twice a second:\n\nimport time\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\n\ndef update(text):\n    text.set(text=time.ctime())\n    text.axes.figure.canvas.draw()\n    \ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\ntimer.start()\nplt.show()\n\nHowever, the following should only update once and then stop:\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center') \ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.single_shot = True\ntimer.start()\n\nplt.show()\n\nAnd the next two examples should never show any visible text at all:\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.start()\ntimer.stop()\n\nplt.show()\n\n\nfig, ax = plt.subplots()\ntext = ax.text(0.5, 0.5, '', ha='center')\ntimer = fig.canvas.new_timer(500, [(update, [text], {})])\n\ntimer.single_shot = True\ntimer.start()\ntimer.stop()\n\nplt.show()\n\n\n\nUAT 18 - stopping figure when removed from DOM\nWhen the div that contains from the figure is removed from the DOM the figure should shut down it’s comm, and if the python-side figure has no more active comms, it should destroy the figure. Repeatedly running the cell below should always have the same figure number\n\nfig, ax = plt.subplots()\nax.plot(range(5))\nplt.show()\n\nRunning the cell below will re-show the figure. After this, re-running the cell above should result in a new figure number.\n\nfig.canvas.manager.reshow()\n\n\n\nUAT 19 - Blitting\nClicking on the figure should plot a green horizontal line moving up the axes.\n\nimport itertools\n\ncnt = itertools.count()\nbg = None\n\ndef onclick_handle(event):\n    \"\"\"Should draw elevating green line on each mouse click\"\"\"\n    global bg\n    if bg is None:\n        bg = ax.figure.canvas.copy_from_bbox(ax.bbox) \n    ax.figure.canvas.restore_region(bg)\n\n    cur_y = (next(cnt) % 10) * 0.1\n    ln.set_ydata([cur_y, cur_y])\n    ax.draw_artist(ln)\n    ax.figure.canvas.blit(ax.bbox)\n\nfig, ax = plt.subplots()\nax.plot([0, 1], [0, 1], 'r')\nln, = ax.plot([0, 1], [0, 0], 'g', animated=True)\nplt.show()\nax.figure.canvas.draw()\n\nax.figure.canvas.mpl_connect('button_press_event', onclick_handle)"
+    "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#others-whose-work-weve-taken-and-included-in-nltk-but-who-didnt-directly-contribute-it",
+    "href": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html#others-whose-work-weve-taken-and-included-in-nltk-but-who-didnt-directly-contribute-it",
+    "title": "Natural Language Toolkit (NLTK) Authors",
+    "section": "",
+    "text": "Martin Porter\nVivake Gupta\nBarry Wilkins\nHiranmay Ghosh\nChris Emerson\n\n\n\n\n\nAssem Chelli\nAbdelkrim Aries\nLakhdar Benzahia"
   },
   {
-    "objectID": "script_venv/lib/python3.8/site-packages/idna-3.6.dist-info/LICENSE.html",
-    "href": "script_venv/lib/python3.8/site-packages/idna-3.6.dist-info/LICENSE.html",
+    "objectID": "script_venv/lib/python3.8/site-packages/httpx-0.27.0.dist-info/licenses/LICENSE.html",
+    "href": "script_venv/lib/python3.8/site-packages/httpx-0.27.0.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "BSD 3-Clause License\nCopyright (c) 2013-2023, Kim Davies and contributors. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
+    "text": "Copyright © 2019, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
   },
   {
-    "objectID": "llm/gpt_api.html",
-    "href": "llm/gpt_api.html",
-    "title": "The OpenAI API",
+    "objectID": "llm/parameterization.html",
+    "href": "llm/parameterization.html",
+    "title": "Parameterization of GPT",
     "section": "",
-    "text": "Resource: OpenAI API docs\nLet’s get started with the OpenAI API for GPT.\n\nAuthentication\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the Jupyterlab (TODO: Link).\n\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n\n\n\nRequesting Completions\nMost interaction with GPT and other models consist in generating completions for certain tasks (TODO: Link to completions)\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n# creating a completion\nchat_completion = client.chat.completions.create(\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"How old is the earth?\",\n        }\n    ],\n    model=\"gpt-3.5-turbo\"\n)\n\n\n\nProcessing\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. This process involves analyzing the context provided by the prompt and leveraging the model’s pre-trained knowledge to generate text completions. The model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. By drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n\nResponse\nAfter processing your request, the OpenAI API returns a JSON-formatted response containing the generated text completions. Depending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as a confidence score indicating the model’s level of certainty in the generated text. This response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application’s behavior accordingly.\n\n\nError Handling\nWhile interacting with the OpenAI API, it’s crucial to implement robust error handling mechanisms to gracefully manage any potential issues that may arise. Common errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. B y handling errors effectively, you can ensure the reliability and resilience of your application, minimizing disruptions to the user experience and maintaining smooth operation under varying conditions. Implementing proper error handling practices is essential for building robust and dependable applications that leverage the capabilities of the OpenAI Chat Completions API effectively.\n\n\n\n\n Back to top",
+    "text": "The GPT models provided by OpenAI provide a variety of parameters that can change the way the language model responds. Below you can find a list of the most important ones.",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "The OpenAI API"
+      "Parameterization of GPT"
     ]
   },
   {
-    "objectID": "llm/gpt.html",
-    "href": "llm/gpt.html",
-    "title": "GPT",
+    "objectID": "llm/parameterization.html#roles",
+    "href": "llm/parameterization.html#roles",
+    "title": "Parameterization of GPT",
+    "section": "Roles:",
+    "text": "Roles:\nIn order to cover most tasks you want to perform using a chat format, the OpenAI API let’s you define different roles in the chat. The available roles are system, assistant, user and tools. You should already be familiar with two of them by now: The user role corresponds to the actual user prompting the language model, all answers are given with the assisstant role.\nThe system role can now be given to provide some additional general instructions to the language model that are typically not a user input, for example, the style in which the model responds. In this case, an example is better than any explanation.\n\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\ncompletion = client.chat.completions.create(\n  model=\"MODEL\",\n  messages=[\n    {\"role\": \"system\", \"content\": \"You are an annoyed technician working in a help center for dish washers, who answers in short, unfriendly bursts.\"},\n    {\"role\": \"user\", \"content\": \"My dish washer does not clean the dishes, what could be the reason.\"}\n  ]\n)\n\nprint(completion.choices[0].message.content)\n\nCould be anything. Blocked spray arm. Clogged filter. Faulty pump. Detergent issue. Check all that.",
+    "crumbs": [
+      "Seminar",
+      "Large Language Models",
+      "Parameterization of GPT"
+    ]
+  },
+  {
+    "objectID": "llm/parameterization.html#sec-test",
+    "href": "llm/parameterization.html#sec-test",
+    "title": "Parameterization of GPT",
+    "section": "Function calling:",
+    "text": "Function calling:\nAs we have seen, most interactions with a language model happen in form of a chat with almost “free” question or instructions and answers. While this seems the most natural in most cases, it is not always a practical format if we want to use a language model for very specific purposes. This happens particularly often when we want to employ a language model in business situations, where we require a consistent output of the model.\nAs an example, let us try to use GPT for sentiment analysis (see also here). Let’s say we want GPT to classify a text into one of the following four categories:\n\nsentiment_categories = [\n    \"positive\", \n    \"negative\",\n    \"neutral\",\n    \"mixed\"\n]\n\nWe could do the following:\n\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I really did not like the movie.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL\n)\n\nprint(f\"Response: '{response.choices[0].message.content}'\")\n\n\n\nResponse: 'Category: Negative'\n\n\nIt is easy to spot the problem: GPT does not necessarily answer in the way we expect or want it to. In this case, instead of simply returning the correct category, it also returns the string Category: alongside it (and capitalized Negative). So if we were to use the answer in a program or data base, we’d now again have to use some NLP techniques to parse it in order eventually retrieve exactly the category we were looking for: negative. What we need instead is a way to constrain GPT to a specific way of answering, and this is where functions or tools come into play (see also Function calling and Function calling (cookbook)).\nThis concept allows us to specify the exact output format we expect to receive from GPT (it is called functions since ideally we want to call a function directly on the output of GPT so it has to be in a specific format).\n\n# this looks intimidating but isn't that complicated\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"analyze_sentiment\",\n            \"description\": \"Analyze the sentiment in a given text.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"sentiment\": {\n                        \"type\": \"string\",\n                        \"enum\": sentiment_categories,\n                        \"description\": f\"The sentiment of the text.\"\n                    }\n                },\n                \"required\": [\"sentiment\"],\n            }\n        }\n    }\n]\n\n\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I really did not like the movie.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL,\n    tools=tools,\n    tool_choice={\n        \"type\": \"function\", \n        \"function\": {\"name\": \"analyze_sentiment\"}}\n)\n\nprint(f\"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'\")\n\nResponse: '{\n\"sentiment\": \"negative\"\n}'\n\n\nWe can now easily extract what we need:\n\nimport json \nresult = json.loads(response.choices[0].message.tool_calls[0].function.arguments) # remember that the answer is a string\nprint(result[\"sentiment\"])\n\nnegative\n\n\nWe can also include multiple function parameters if our desired output has multiple components. Let’s try to include another parameter which includes the reason for the sentiment.\n\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"analyze_sentiment\",\n            \"description\": \"Analyze the sentiment in a given text.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"sentiment\": {\n                        \"type\": \"string\",\n                        \"enum\": sentiment_categories,\n                        \"description\": f\"The sentiment of the text.\"\n                    },\n                    \"reason\": {\n                        \"type\": \"string\",\n                        \"description\": \"The reason for the sentiment in few words. If there is no information, do not make assumptions and leave blank.\"\n                    }\n                },\n                \"required\": [\"sentiment\", \"reason\"],\n            }\n        }\n    }\n]\n\n\nmessages = []\nmessages.append(\n    {\"role\": \"system\", \"content\": f\"Classify the given text into one of the following sentiment categories: {sentiment_categories}. If you can, also extract the reason.\"}\n)\nmessages.append(\n    {\"role\": \"user\", \"content\": \"I loved the movie, Johnny Depp is a great actor.\"}\n)\n\nresponse = client.chat.completions.create(\n    messages=messages,\n    model=MODEL,\n    tools=tools,\n    tool_choice={\n        \"type\": \"function\", \n        \"function\": {\"name\": \"analyze_sentiment\"}}\n)\n\nprint(f\"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'\")\n\nResponse: '{\n\"sentiment\": \"positive\",\n\"reason\": \"Appreciation for the movie and actor\"\n}'\n\n\nHere, again, we could also constrain the possibilities for the reason to a certain set. Hence, functions are great to have more consistent answers of the language model such that we can use it in applications.",
+    "crumbs": [
+      "Seminar",
+      "Large Language Models",
+      "Parameterization of GPT"
+    ]
+  },
+  {
+    "objectID": "llm/intro.html",
+    "href": "llm/intro.html",
+    "title": "Introduction to LLM",
     "section": "",
-    "text": "Definition of GPT: GPT is a state-of-the-art large language model developed by OpenAI. It belongs to the family of Transformer-based architectures and is renowned for its ability to generate coherent and contextually relevant text across a wide range of tasks.\nKey Features of GPT: Highlight the key features that distinguish GPT from other LLMs, such as its autoregressive nature, the use of self-attention mechanisms, and the ability to generate text of variable length.\nPre-training Objective: GPT is pre-trained using an unsupervised learning objective known as language modeling. During pre-training, it learns to predict the next word in a sequence based on the preceding context, which enables it to capture the statistical properties of natural language.\nArchitecture of GPT: Provide an overview of the architecture of GPT, which consists of multiple layers of Transformer blocks. Each block includes self-attention layers, feed-forward neural networks, and layer normalization, allowing GPT to process input sequences and generate output sequences effectively.\nFine-tuning and Adaptation: GPT can be fine-tuned on specific tasks or domains with labeled data to adapt its pre-trained knowledge to new tasks. This fine-tuning process allows GPT to achieve state-of-the-art performance on a wide range of natural language processing tasks.\nApplications of GPT: Discuss the diverse applications of GPT across various domains, including text generation, summarization, translation, question-answering, conversation generation, and more. Highlight real-world examples and use cases where GPT has been deployed successfully.\nRecent Advancements and Versions: Mention the evolution of GPT over time, including the release of different versions such as GPT-1, GPT-2, GPT-3, and any subsequent versions or variants. Discuss the improvements and advancements introduced in each iteration.\nChallenges and Limitations: Acknowledge the challenges and limitations associated with GPT, such as the potential for generating biased or inappropriate content, the need for large-scale computational resources, and the difficulty of fine-tuning for specific tasks without overfitting.",
+    "text": "Definition of Large Language Models: Large Language Models (LLMs) are deep learning models trained on vast amounts of text data to understand and generate human-like text. They use advanced techniques such as Transformers and self-attention mechanisms to process and generate sequences of words.\nPre-training and Fine-tuning: LLMs are typically pre-trained on large text corpora using unsupervised learning techniques, where they learn the statistical properties of natural language. After pre-training, they can be fine-tuned on specific tasks or domains with labeled data to adapt their knowledge and capabilities.\nTransformer Architecture: Transformers are the backbone of LLMs, consisting of multiple layers of self-attention mechanisms and feed-forward neural networks. They excel at capturing long-range dependencies in sequential data, making them well-suited for NLP tasks.\nSelf-Attention Mechanism: Self-attention allows LLMs to weigh the importance of each word in a sequence based on its relationship with other words in the sequence. This mechanism enables them to capture contextual information effectively and generate coherent text.\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "GPT"
+      "Introduction to LLM"
     ]
   },
   {
-    "objectID": "llm/gpt.html#completions-and-how-they-work",
-    "href": "llm/gpt.html#completions-and-how-they-work",
-    "title": "GPT",
-    "section": "Completions and how they work",
-    "text": "Completions and how they work\n\n1. Prompt:\nThe prompt serves as the cornerstone of completion generation, acting as the initial input or context upon which the model bases its predictions and generates completions. Its significance lies in its ability to set the tone, theme, and direction for the subsequent text generation process. Prompts can vary widely in length and complexity, ranging from concise prompts that elicit specific responses to more extensive prompts that allow for nuanced and detailed completions. The effectiveness of the prompt in guiding the completion generation process depends on its clarity, relevance, and specificity to the desired task or objective.\n\n\n2. Model Architecture:\nCompletions derive their power from sophisticated machine learning models, with transformer-based architectures like GPT (Generative Pre-trained Transformer) leading the forefront. These models undergo extensive training on vast amounts of text data, spanning diverse domains and languages, to develop a deep understanding of human language. Through this training process, the models learn to capture the intricacies of grammar, syntax, semantics, and context inherent in natural language. The architecture of these models is designed to efficiently process and analyze input text, enabling them to capture long-range dependencies within text and generate coherent completions that align with the provided prompt.\n\n\n3. Tokenization:\nBefore processing the prompt and generating completions, the input text undergoes tokenization, a crucial preprocessing step that breaks it down into smaller units known as tokens. These tokens typically represent words or subwords and serve as the fundamental building blocks for the model’s understanding of the text. Tokenization enables the model to analyze the underlying structure of the text at a granular level, facilitating more effective learning and prediction. Each token encapsulates a discrete unit of meaning within the text and serves as input to the model during the completion generation process.\n\n\n4. Probability Distribution:\nCentral to the completion generation process is the prediction of the likelihood of each possible token that could follow the prompt. This prediction is based on the model’s learned parameters and contextual understanding of the input text. The model computes a probability distribution over the vocabulary of tokens, assigning a probability score to each token to indicate its likelihood of occurrence given the context provided by the prompt. This probability distribution guides the selection of tokens during the completion generation process, ensuring that the generated completions are coherent and contextually relevant.\n\n\n5. Sampling Strategy:\nTo generate completions, the model employs various sampling strategies to select tokens from the probability distribution. Greedy sampling, for example, selects the token with the highest probability at each step, favoring the most probable tokens but potentially leading to repetitive or predictable completions. In contrast, random sampling randomly selects tokens according to their probabilities, introducing variability and unpredictability into the generated completions. Top-k sampling restricts token selection to the top-k most probable tokens, striking a balance between diversity and coherence in the completions. Each sampling strategy offers unique trade-offs in terms of diversity, coherence, and computational efficiency, allowing users to tailor the completion generation process to their specific needs and preferences.\n\n\nConclusion:\nCompletions represent a sophisticated approach to natural language processing, leveraging advanced machine learning models and algorithms to generate coherent and contextually relevant text based on given input. By understanding the underlying components and mechanisms of completions, users can harness their power to develop innovative applications and solutions across a wide range of domains and use cases. As research in NLP continues to advance, the capabilities and applications of completions are expected to evolve, driving further innovation and exploration in the field of human-computer interaction.",
+    "objectID": "llm/exercises/ex_gpt_start.html",
+    "href": "llm/exercises/ex_gpt_start.html",
+    "title": "Exercise: OpenAI - Getting started",
+    "section": "",
+    "text": "Task: Explore the OpenAI chat.completions API.\nInstructions:\n\nGenerate a chat completion and analyze the response object ChatCompletion. What information do you get with each completion?\nHow can you access the actual completion of your prompt?\nUse the OpenAI API documentation to find out what choices are and how they are used.\nPlay around with the parameters temperature and top_p for a simple prompt. What do you notice?\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "GPT"
+      "Exercise: OpenAI - Getting started"
     ]
   },
   {
-    "objectID": "llm/exercises/ex_gpt_chatbot.html",
-    "href": "llm/exercises/ex_gpt_chatbot.html",
-    "title": "Exercise: GPT Chatbot",
+    "objectID": "llm/exercises/ex_gpt_ner_with_function_calls.html",
+    "href": "llm/exercises/ex_gpt_ner_with_function_calls.html",
+    "title": "Exercise: NER with tool calling",
     "section": "",
-    "text": "Task: Create a simple chatbot using the OpenAI chat.completions API.\nInstructions:\n\nUse the chat.completions API to send prompts to GPT, receive the answers and displaying them.\nStop the conversation when the user inputs the word exit instead of a new prompt.\nHint: Remember that GPT has no memory, so you always have to include the previous conversation in your prompts.\n\n\n\nShow solution\n\n\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nclass ChatGPT:\n    def __init__(self, model=MODEL):\n        self.model = model\n        self.client = client\n        self.messages = []\n\n    def chat_with_gpt(self, user_input: str):\n        self.messages.append({\n            \"role\": \"user\",\n            \"content\": user_input\n        })\n        response = self._generate_response(self.messages)\n        return response\n\n    def _generate_response(self, messages):\n        response = self.client.chat.completions.create(\n            model=self.model,\n            messages=messages,        \n            temperature=0.2, \n            max_tokens=150,\n            top_p=1.0\n        )\n        response_message = response.choices[0].message\n        self.messages.append({\n            \"role\": response_message.role,\n            \"content\": response_message.content\n        })\n\n        return response_message.content\n\n\n# Conversation loop\nchat_gpt = ChatGPT(model=\"gpt4\")\n\nwhile True:\n    user_input = input(\"User: \")\n\n    if user_input.lower() == 'exit':\n        break\n    \n    print(\"User:\", user_input)\n    \n    # Get bot response based on user input\n    bot_response = chat_gpt.chat_with_gpt(user_input)\n\n    print(\"Bot:\", bot_response)\n\n\n\n\n\n Back to top",
+    "text": "Task: Create a small script that uses tool (or function calling) to extract the following named entities from a given text: City, State, Person.\nInstructions:\n\nDefine an OpenAI tool with a function named_entity_recognition.\nChoose an appropriate output format, for example: {\"named_entities\": [{\"entity\": \"Mike\", \"label\": \"Person}, {\"entity\": \"Münster\", \"label\": \"City\"}]}\nDefine a matching prompt in the role system and the text input for the role user.\nExtract the result.\n\n\n# prerequisites\nimport os\nfrom llm_utils.client import get_openai_client\n\nMODEL = \"gpt4\"\n\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n# here goes your code\n\n\n\nShow solution\n\n\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"named_entity_recognition\",\n            \"description\": \"Extract the named entities from the given text.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"named_entities\": {\n                        \"type\": \"array\",\n                        \"description\": \"A list of all extracted named entities in form of dictionaries containing the entity name and the label\",\n                        \"items\": {\n                            \"type\": \"object\",\n                            \"properties\": {\n                                \"entity\": {\"type\": \"string\"}, \n                                \"label\": {\"type\": \"string\"}\n                            },\n                            \"required\": [\"entity\", \"label\"]\n                        }\n                    },\n                },\n                \"required\": [\"named_entities\"],\n            },\n        }\n    }\n]\n\n\n# define the prompts\nmessages = []\nmessages.append({\"role\": \"system\", \"content\": \"Extract all named entities from the provided text. Possible labels are 'City', 'State' or 'Person'. If no named entities are contained in the text, do not make assumptions and return nothing.\"})\nmessages.append({\"role\": \"user\", \"content\": \"Leonard Hoffstaedter lives in Pasadena, CA.\"})\n\nresponse = client.chat.completions.create(\n    model=MODEL,\n    messages=messages,\n    tools=tools,\n    tool_choice={\"type\": \"function\", \"function\": {\"name\": \"named_entity_recognition\"}}\n)\nresponse\n\n\nChatCompletion(id='chatcmpl-99ALw7LjaBzZ63s5CMt9wDGn3aWhM', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_1aw75NLIUiEpdYMztdXRDZEh', function=Function(arguments='{\\n\"named_entities\": [\\n  {\\n    \"entity\": \"Leonard Hoffstaedter\",\\n    \"label\": \"Person\"\\n  },\\n  {\\n    \"entity\": \"Pasadena\",\\n    \"label\": \"City\"\\n  },\\n  {\\n    \"entity\": \"CA\",\\n    \"label\": \"State\"\\n  }\\n]\\n}', name='named_entity_recognition'), type='function')]), content_filter_results={})], created=1711971776, model='gpt-4', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=68, prompt_tokens=142, total_tokens=210), prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])\n\n\n\n# retrieve the result\nimport json \n\nresult = json.loads(response.choices[0].message.tool_calls[0].function.arguments)\nfor named_entity in result[\"named_entities\"]: \n    print(f\"{named_entity['entity']}: {named_entity['label']}\")\n\nLeonard Hoffstaedter: Person\nPasadena: City\nCA: State\n\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
-      "Exercise: GPT Chatbot"
+      "Exercise: NER with tool calling"
     ]
   },
   {
@@ -448,7 +460,7 @@
     "href": "llm/prompting.html",
     "title": "Prompting",
     "section": "",
-    "text": "Resources: - https://platform.openai.com/docs/guides/prompt-engineering -\n\n\n\n Back to top",
+    "text": "Learning prompting is a science for itself. The difficulty lies in the probabilistic nature of the language models. That means, small changes to your prompt (that you might even find insignificant) can have a large impact on the result/the answer. In particular, the changes do not have to be “logical”, i.e., depend on your changes in a comprehensible or reproducible way. This can sometimes be frustrating, but can also be avoided in many cases when following the right instructions for prompting. To do so, let’s best follow the creators.\n\n\n\n\n\n\nNote\n\n\n\nThe following is taken from the OpenAI Guide\n\n\n\nWrite clear instructions\nThese models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.\nTactics:\n\nInclude details in your query to get more relevant answers\nAsk the model to adopt a persona\nUse delimiters to clearly indicate distinct parts of the input\nSpecify the steps required to complete a task\nProvide examples\nSpecify the desired length of the output \n\n\n\nProvide reference text\nLanguage models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.\nTactics:\n\nInstruct the model to answer using a reference text\nInstruct the model to answer with citations from a reference text \n\n\n\nSplit complex tasks into simpler subtasks\nJust as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks - submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.\nTactics:\n\nUse intent classification to identify the most relevant instructions for a user query\nFor dialogue applications that require very long conversations, summarize or filter previous dialogue\nSummarize long documents piecewise and construct a full summary recursively \n\n\n\nGive the model time to “think”\nIf asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a “chain of thought” before an answer can help the model reason its way toward correct answers more reliably.\nTactics:\n\nInstruct the model to work out its own solution before rushing to a conclusion\nUse inner monologue or a sequence of queries to hide the model’s reasoning process\nAsk the model if it missed anything on previous passes \n\n\n\nUse external tools\nCompensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI’s Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.\nTactics:\n\nUse embeddings-based search to implement efficient knowledge retrieval\nUse code execution to perform more accurate calculations or call external APIs\nGive the model access to specific functions \n\n\n\nTest changes systematically\nImproving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an “eval”).\nTactic:\n\nEvaluate model outputs with reference to gold-standard answers\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
@@ -472,7 +484,7 @@
     "href": "embeddings/applications.html",
     "title": "Applications",
     "section": "",
-    "text": "Back to top",
+    "text": "Build a bot that can answer questions based on documents! Resource: https://platform.openai.com/docs/tutorials/web-qa-embeddings\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Embeddings",
@@ -539,7 +551,7 @@
     "href": "nlp/tokenization.html#simple-word-tokenization",
     "title": "Tokenization",
     "section": "Simple word tokenization",
-    "text": "Simple word tokenization\nA key element for a computer to understand the words we speak or type is the concept of word tokenization. For a human, the sentence\n\nsentence = \"I love reading science fiction books or books about science.\"\n\nis easy to understand since we are able to split the sentence into its individual parts in order to figure out the meaning of the full sentence. For a computer, the sentence is just a simple string of characters, like any other word or longer text. In order to make a computer understand the meaning of a sentence, we need to help break it down into its relevant parts.\nSimply put, word tokenization is the process of breaking down a piece of text into individual words or so-called tokens. It is like taking a sentence and splitting it into smaller pieces, where each piece represents a word. Word tokenization involves analyzing the text character by character and identifying boundaries between words. It uses various rules and techniques to decide where one word ends and the next one begins. For example, spaces, punctuation marks, and special characters often serve as natural boundaries between words.\nSo let’s start breaking down the sentence into its individual parts.\n\ntokenized_sentence = sentence.split(\" \")\nprint(tokenized_sentence)\n\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science.']\n\n\nOnce we have tokenized the sentence, we can start anaylzing it with some simple statistical methods. For example, in order to figure out what the sentence might be about, we could count the most frequent words.\n\nfrom collections import Counter\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n\n[('books', 2), ('I', 1)]\n\n\nUnfortunately, we already realize that we have not done the best job with our “tokenizer”: The second occurence of the word science is missing do to the punctuation. While this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let’s get rid of it.\n\ntokenized_sentence = sentence.replace(\".\", \" \").split(\" \")\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n\n[('science', 2), ('books', 2)]\n\n\nSo that worked. As you can imagine, tokenization can get increasingly difficult when we have to deal with all sorts of situations in larger corpora of texts (see also the exercise). So it is great that there are already all sorts of libraries available that can help us with this process.\n\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\n\ntokenized_sentence = wordpunct_tokenize(sentence)\ntokenized_sentence = [t for t in tokenized_sentence if t not in punctuation]\nprint(tokenized_sentence)\n\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science']",
+    "text": "Simple word tokenization\nA key element for a computer to understand the words we speak or type is the concept of word tokenization. For a human, the sentence\n\nsentence = \"I love reading science fiction books or books about science.\"\n\nis easy to understand since we are able to split the sentence into its individual parts in order to figure out the meaning of the full sentence. For a computer, the sentence is just a simple string of characters, like any other word or longer text. In order to make a computer understand the meaning of a sentence, we need to help break it down into its relevant parts.\nSimply put, word tokenization is the process of breaking down a piece of text into individual words or so-called tokens. It is like taking a sentence and splitting it into smaller pieces, where each piece represents a word. Word tokenization involves analyzing the text character by character and identifying boundaries between words. It uses various rules and techniques to decide where one word ends and the next one begins. For example, spaces, punctuation marks, and special characters often serve as natural boundaries between words.\nSo let’s start breaking down the sentence into its individual parts.\n\ntokenized_sentence = sentence.split(\" \")\nprint(tokenized_sentence)\n\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science.']\n\n\nOnce we have tokenized the sentence, we can start analyzing it with some simple statistical methods. For example, in order to figure out what the sentence might be about, we could count the most frequent words.\n\nfrom collections import Counter\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n\n[('books', 2), ('I', 1)]\n\n\nUnfortunately, we already realize that we have not done the best job with our “tokenizer”: The second occurrence of the word science is missing do to the punctuation. While this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let’s get rid of it.\n\ntokenized_sentence = sentence.replace(\".\", \" \").split(\" \")\n\ntoken_counter = Counter(tokenized_sentence)\nprint(token_counter.most_common(2))\n\n[('science', 2), ('books', 2)]\n\n\nSo that worked. As you can imagine, tokenization can get increasingly difficult when we have to deal with all sorts of situations in larger corpora of texts (see also the exercise). So it is great that there are already all sorts of libraries available that can help us with this process.\n\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\n\ntokenized_sentence = wordpunct_tokenize(sentence)\ntokenized_sentence = [t for t in tokenized_sentence if t not in punctuation]\nprint(tokenized_sentence)\n\n['I', 'love', 'reading', 'science', 'fiction', 'books', 'or', 'books', 'about', 'science']",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -551,7 +563,7 @@
     "href": "nlp/tokenization.html#advanced-word-tokenization",
     "title": "Tokenization",
     "section": "Advanced word tokenization",
-    "text": "Advanced word tokenization\nTODO: Write\nFrom the docs:\nhttps://platform.openai.com/tokenizer\nA helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).",
+    "text": "Advanced word tokenization\nThe above ideas illustrate well the idea of tokenization of splitting text into smaller chunks that we can feed to a language model. In practice, especially in models like GPT, a critical component is the vocabulary or the set of unique words or tokens the model understands. Traditional approaches use fixed-size vocabularies, which means every unique word in the corpus has its own representation (index or embedding) in the model’s vocabulary. However, as the vocabulary size increases (for example, by including more languages), so does the memory requirement, which can be impractical for large-scale language models. One solution is the so-called bit-pair encoding. Bit pair encoding is a data compression technique specifically designed to tackle the issue of large vocabularies in language models. Instead of assigning a unique index or embedding to each token, bit pair encoding identifies frequent pairs of characters (bits) within the corpus and represents them as a single token. This effectively reduces the size of the vocabulary while preserving the essential information needed for language modeling tasks.\n\nHow Bit Pair Encoding Works:\n\nTokenization: The first step in bit pair encoding is tokenization, where the text corpus is broken down into individual tokens. These tokens could be characters, subwords, or words, depending on the tokenization strategy used.\nPair Identification: Next, the algorithm identifies pairs of characters (bits) that occur frequently within the corpus. These pairs are typically consecutive characters in the text.\nReplacement with Single Token: Once frequent pairs are identified, they are replaced with a single token. This effectively reduces the number of unique tokens in the vocabulary.\nIterative Process: The process of identifying frequent pairs and replacing them with single tokens is iterative. It continues until a predefined stopping criterion is met, such as reaching a target vocabulary size or when no more frequent pairs can be found.\nVocabulary Construction: After the iterative process, a vocabulary is constructed, consisting of the single tokens generated through pair replacement, along with any remaining tokens from the original tokenization process.\nEncoding and Decoding: During training and inference, text data is encoded using the constructed vocabulary, where each token is represented by its corresponding index in the vocabulary. During decoding, the indices are mapped back to their respective tokens.\n\n\n\n\n\n\n\nTip\n\n\n\nIt is very illustrative to use the the OpenAI tokenizer to see how a sentence is split up into different token. Try mixing languages and standard as well as more rare words and observe how they are split up.\nAnother detailed example can be found here.\n\n\n\n\nAdvantages of Bit Pair Encoding:\n\nEfficient Memory Usage: Bit pair encoding significantly reduces the size of the vocabulary, leading to more efficient memory usage, especially in large-scale language models.\nRetains Information: Despite reducing the vocabulary size, bit pair encoding retains important linguistic information by capturing frequent character pairs.\nFlexible: Bit pair encoding is flexible and can be adapted to different tokenization strategies and corpus characteristics.\n\n\n\nLimitations and Considerations:\n\nComputational Overhead: The iterative nature of bit pair encoding can be computationally intensive, especially for large corpora.\nLoss of Granularity: While bit pair encoding reduces vocabulary size, it may lead to a loss of granularity, especially for rare or out-of-vocabulary words.\nTokenization Strategy: The effectiveness of bit pair encoding depends on the tokenization strategy used and the characteristics of the corpus.\n\n\n\n\n\n\n\nTip\n\n\n\nFrom the OpenAI Guide:\nA helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
diff --git a/embeddings/applications.qmd b/embeddings/applications.qmd
index c9cfb71..24cb320 100644
--- a/embeddings/applications.qmd
+++ b/embeddings/applications.qmd
@@ -5,3 +5,7 @@ format:
     code-fold: true
 jupyter: python3
 ---
+
+
+Build a bot that can answer questions based on documents!
+Resource: https://platform.openai.com/docs/tutorials/web-qa-embeddings
diff --git a/llm/exercises/ex_gpt_ner_with_function_calls.ipynb b/llm/exercises/ex_gpt_ner_with_function_calls.ipynb
new file mode 100644
index 0000000..4d41f14
--- /dev/null
+++ b/llm/exercises/ex_gpt_ner_with_function_calls.ipynb
@@ -0,0 +1,179 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: \"Exercise: NER with tool calling\"\n",
+    "format:\n",
+    "  html:\n",
+    "    code-fold: false\n",
+    "jupyter: python3\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "**Task:** Create a small script that uses tool (or function calling) to extract the following named entities from a given text: `City`, `State`, `Person`.\n",
+    "\n",
+    "**Instructions:**\n",
+    "\n",
+    "- Define an OpenAI `tool` with a function `named_entity_recognition`. \n",
+    "- Choose an appropriate output format, for example: `{\"named_entities\": [{\"entity\": \"Mike\", \"label\": \"Person}, {\"entity\": \"Münster\", \"label\": \"City\"}]}`\n",
+    "- Define a matching prompt in the role `system` and the text input for the role `user`.\n",
+    "- Extract the result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# prerequisites\n",
+    "import os\n",
+    "from llm_utils.client import get_openai_client\n",
+    "\n",
+    "MODEL = \"gpt4\"\n",
+    "\n",
+    "client = get_openai_client(\n",
+    "    model=MODEL,\n",
+    "    config_path=os.environ.get(\"CONFIG_PATH\")\n",
+    ")\n",
+    "\n",
+    "# here goes your code"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<details>\n",
+    "<summary>Show solution</summary>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tools = [\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"named_entity_recognition\",\n",
+    "            \"description\": \"Extract the named entities from the given text.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"named_entities\": {\n",
+    "                        \"type\": \"array\",\n",
+    "                        \"description\": \"A list of all extracted named entities in form of dictionaries containing the entity name and the label\",\n",
+    "                        \"items\": {\n",
+    "                            \"type\": \"object\",\n",
+    "                            \"properties\": {\n",
+    "                                \"entity\": {\"type\": \"string\"}, \n",
+    "                                \"label\": {\"type\": \"string\"}\n",
+    "                            },\n",
+    "                            \"required\": [\"entity\", \"label\"]\n",
+    "                        }\n",
+    "                    },\n",
+    "                },\n",
+    "                \"required\": [\"named_entities\"],\n",
+    "            },\n",
+    "        }\n",
+    "    }\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "ChatCompletion(id='chatcmpl-99ALw7LjaBzZ63s5CMt9wDGn3aWhM', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_1aw75NLIUiEpdYMztdXRDZEh', function=Function(arguments='{\\n\"named_entities\": [\\n  {\\n    \"entity\": \"Leonard Hoffstaedter\",\\n    \"label\": \"Person\"\\n  },\\n  {\\n    \"entity\": \"Pasadena\",\\n    \"label\": \"City\"\\n  },\\n  {\\n    \"entity\": \"CA\",\\n    \"label\": \"State\"\\n  }\\n]\\n}', name='named_entity_recognition'), type='function')]), content_filter_results={})], created=1711971776, model='gpt-4', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=68, prompt_tokens=142, total_tokens=210), prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# define the prompts\n",
+    "messages = []\n",
+    "messages.append({\"role\": \"system\", \"content\": \"Extract all named entities from the provided text. Possible labels are 'City', 'State' or 'Person'. If no named entities are contained in the text, do not make assumptions and return nothing.\"})\n",
+    "messages.append({\"role\": \"user\", \"content\": \"Leonard Hoffstaedter lives in Pasadena, CA.\"})\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=MODEL,\n",
+    "    messages=messages,\n",
+    "    tools=tools,\n",
+    "    tool_choice={\"type\": \"function\", \"function\": {\"name\": \"named_entity_recognition\"}}\n",
+    ")\n",
+    "response\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Leonard Hoffstaedter: Person\n",
+      "Pasadena: City\n",
+      "CA: State\n"
+     ]
+    }
+   ],
+   "source": [
+    "# retrieve the result\n",
+    "import json \n",
+    "\n",
+    "result = json.loads(response.choices[0].message.tool_calls[0].function.arguments)\n",
+    "for named_entity in result[\"named_entities\"]: \n",
+    "    print(f\"{named_entity['entity']}: {named_entity['label']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "</details>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "script_venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/llm/exercises/ex_gpt_parameterization.ipynb b/llm/exercises/ex_gpt_parameterization.ipynb
index e9604a7..7097a4f 100644
--- a/llm/exercises/ex_gpt_parameterization.ipynb
+++ b/llm/exercises/ex_gpt_parameterization.ipynb
@@ -14,22 +14,64 @@
    ]
   },
   {
-   "cell_type": "raw",
+   "cell_type": "markdown",
    "metadata": {},
    "source": [
     "**Task:** Explore the parameterization possibilities of the OpenAI API for GPT. \n",
     "\n",
     "**Instructions:**\n",
     "\n",
-    "- Some instructions\n",
+    "Some possibilities are:\n",
+    "\n",
+    "- Use the `system` role in order to give instructions to the language model before the interaction with the user starts in order to change the response style of the model. \n",
+    "- Change the `temparature` __or__ `top_p` parameters and explore the effect on your prompts. \n",
+    "- Use the "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from llm_utils.client import get_openai_client\n",
+    "\n",
+    "MODEL = \"gpt4\"\n",
     "\n",
-    "TODO: Finalize this!"
+    "client = get_openai_client(\n",
+    "    model=MODEL,\n",
+    "    config_path=os.environ.get(\"CONFIG_PATH\")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# here goes your code"
    ]
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "script_venv",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
   }
  },
  "nbformat": 4,
diff --git a/llm/gpt_api.qmd b/llm/gpt_api.qmd
index d9e71e2..1683605 100644
--- a/llm/gpt_api.qmd
+++ b/llm/gpt_api.qmd
@@ -6,7 +6,10 @@ format:
 jupyter: python3
 ---
 
+::: {.callout-note}
 Resource: [OpenAI API docs](https://platform.openai.com/docs/introduction){.external}
+:::
+
 
 
 Let's get started with the OpenAI API for GPT. 
diff --git a/llm/parameterization.qmd b/llm/parameterization.qmd
index 98ae3c3..dd9b469 100644
--- a/llm/parameterization.qmd
+++ b/llm/parameterization.qmd
@@ -2,48 +2,233 @@
 title: "Parameterization of GPT"
 format:
   html:
-    code-fold: true
+    code-fold: false
+    code-wrap: true
 jupyter: python3
 ---
 
+The GPT models provided by OpenAI provide a variety of parameters that can change the way the language model responds. 
+Below you can find a list of the most important ones.
 
-- **Temperature**: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It's often used to balance between generating safe, conservative responses and more novel, imaginative ones.
+- **Temperature**: Temperature (`temperaure`) is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It's often used to balance between generating safe, conservative responses and more novel, imaginative ones.
 
-- **Max Tokens**: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.
+- **Max Tokens**: Max Tokens (`max_tokens`) limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.
 
-- **Top P (Nucleus Sampling)**: Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It's particularly useful for generating diverse and contextually relevant responses.
+- **Top P (Nucleus Sampling)**: Top P (`top_p`), also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It's particularly useful for generating diverse and contextually relevant responses.
 
-- **Frequency Penalty**: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.
+- **Frequency Penalty**: Frequency Penalty (`frequency_penalty`) penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.
 
-- **Presence Penalty**: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It's useful for generating more creative and novel outputs that are not directly predictable from the input.
+- **Presence Penalty**: Presence Penalty (`presence_penalty`) penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It's useful for generating more creative and novel outputs that are not directly predictable from the input.
 
-- **Stop Sequence**: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.
+- **Stop Sequence**: Stop Sequence (`stop`) specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.
 
-- **Repetition Penalty**: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It's particularly useful for generating coherent and diverse text without excessive redundancy.
 
-- **Length Penalty**: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.
+## Roles: 
 
+In order to cover most tasks you want to perform using a chat format, the OpenAI API let's you define different `roles` in the chat. 
+The available roles are `system`, `assistant`, `user` and `tools`. 
+You should already be familiar with two of them by now: 
+The `user` role corresponds to the actual user prompting the language model, all answers are given with the `assisstant` role.
 
+The `system` role can now be given to provide some additional general instructions to the language model that are typically not a user input, for example, the style in which the model responds. 
+In this case, an example is better than any explanation.
 
-## Roles: 
 
 ```{python}
-#| eval: false
+import os
+from llm_utils.client import get_openai_client
+
+MODEL = "gpt4"
 
-from openai import OpenAI
-client = OpenAI()
+client = get_openai_client(
+    model=MODEL,
+    config_path=os.environ.get("CONFIG_PATH")
+)
 
 completion = client.chat.completions.create(
-  model="gpt-3.5-turbo",
+  model="MODEL",
   messages=[
-    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
-    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
+    {"role": "system", "content": "You are an annoyed technician working in a help center for dish washers, who answers in short, unfriendly bursts."},
+    {"role": "user", "content": "My dish washer does not clean the dishes, what could be the reason."}
   ]
 )
 
-print(completion.choices[0].message)
+print(completion.choices[0].message.content)
+```
+
+
+## Function calling: {#sec-test} 
+
+As we have seen, most interactions with a language model happen in form of a chat with almost "free" question or instructions and answers.
+While this seems the most natural in most cases, it is not always a practical format if we want to use a language model for very specific purposes.
+This happens particularly often when we want to employ a language model in business situations, where we require a consistent output of the model.
+
+As an example, let us try to use GPT for sentiment analysis (see also [here](../nlp/overview.qmd#sec-sentiment-analysis)).
+Let's say we want GPT to classify a text into one of the following four categories: 
+
+```{python}
+#| output: false
+sentiment_categories = [
+    "positive", 
+    "negative",
+    "neutral",
+    "mixed"
+]
+```
+
+We could do the following:
+
+```{python}
+#| echo: false
+# prerequisites
+import os
+from llm_utils.client import get_openai_client
+
+MODEL = "gpt4"
+
+client = get_openai_client(
+    model=MODEL,
+    config_path=os.environ.get("CONFIG_PATH")
+)
+```
+
+```{python}
+#| eval: false
+messages = []
+messages.append(
+    {"role": "system", "content": f"Classify the given text into one of the following sentiment categories: {sentiment_categories}."}
+)
+messages.append(
+    {"role": "user", "content": "I really did not like the movie."}
+)
+
+response = client.chat.completions.create(
+    messages=messages,
+    model=MODEL
+)
+
+print(f"Response: '{response.choices[0].message.content}'")
+```
+```{python}
+#| echo: false
+# need to do this since GPT answers are often random
+print("Response: 'Category: Negative'")
+
+```
+
+It is easy to spot the problem: GPT does not necessarily answer in the way we expect or want it to. 
+In this case, instead of simply returning the correct category, it also returns the string `Category: ` alongside it (and capitalized `Negative`).
+So if we were to use the answer in a program or data base, we'd now again have to use some NLP techniques to parse it in order eventually retrieve **exactly** the category we were looking for: `negative`. 
+What we need instead is a way to constrain GPT to a specific way of answering, and this is where `functions` or `tools` come into play (see also [Function calling](https://platform.openai.com/docs/guides/function-calling){.external} and [Function calling (cookbook)](https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models){.external}).
+
+This concept allows us to specify the exact output format we expect to receive from GPT (it is called functions since ideally we want to call a function directly on the output of GPT so it has to be in a specific format). 
+
+```{python}
+# this looks intimidating but isn't that complicated
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "analyze_sentiment",
+            "description": "Analyze the sentiment in a given text.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "sentiment": {
+                        "type": "string",
+                        "enum": sentiment_categories,
+                        "description": f"The sentiment of the text."
+                    }
+                },
+                "required": ["sentiment"],
+            }
+        }
+    }
+]
+
+```
+
+```{python}
+messages = []
+messages.append(
+    {"role": "system", "content": f"Classify the given text into one of the following sentiment categories: {sentiment_categories}."}
+)
+messages.append(
+    {"role": "user", "content": "I really did not like the movie."}
+)
+
+response = client.chat.completions.create(
+    messages=messages,
+    model=MODEL,
+    tools=tools,
+    tool_choice={
+        "type": "function", 
+        "function": {"name": "analyze_sentiment"}}
+)
+
+print(f"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'")
+
+```
+We can now easily extract what we need: 
+```{python}
+import json 
+result = json.loads(response.choices[0].message.tool_calls[0].function.arguments) # remember that the answer is a string
+print(result["sentiment"])
+```
+
+
+We can also include multiple function parameters if our desired output has multiple components.
+Let's try to include another parameter which includes the `reason` for the sentiment.
+
+```{python}
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "analyze_sentiment",
+            "description": "Analyze the sentiment in a given text.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "sentiment": {
+                        "type": "string",
+                        "enum": sentiment_categories,
+                        "description": f"The sentiment of the text."
+                    },
+                    "reason": {
+                        "type": "string",
+                        "description": "The reason for the sentiment in few words. If there is no information, do not make assumptions and leave blank."
+                    }
+                },
+                "required": ["sentiment", "reason"],
+            }
+        }
+    }
+]
+
 ```
 
+```{python}
+messages = []
+messages.append(
+    {"role": "system", "content": f"Classify the given text into one of the following sentiment categories: {sentiment_categories}. If you can, also extract the reason."}
+)
+messages.append(
+    {"role": "user", "content": "I loved the movie, Johnny Depp is a great actor."}
+)
+
+response = client.chat.completions.create(
+    messages=messages,
+    model=MODEL,
+    tools=tools,
+    tool_choice={
+        "type": "function", 
+        "function": {"name": "analyze_sentiment"}}
+)
+
+print(f"Response: '{response.choices[0].message.tool_calls[0].function.arguments}'")
+
+```
 
-## Function calling: 
-https://platform.openai.com/docs/guides/function-calling
+Here, again, we could also constrain the possibilities for the `reason` to a certain set. 
+Hence, functions are great to have more consistent answers of the language model such that we can use it in applications.
diff --git a/llm/prompting.qmd b/llm/prompting.qmd
index c61aca9..ed46d50 100644
--- a/llm/prompting.qmd
+++ b/llm/prompting.qmd
@@ -6,6 +6,74 @@ format:
 jupyter: python3
 ---
 
-**Resources:** 
-- https://platform.openai.com/docs/guides/prompt-engineering
-- 
+Learning prompting is a science for itself. 
+The difficulty lies in the probabilistic nature of the language models. 
+That means, small changes to your prompt (that you might even find insignificant) can have a large impact on the result/the answer.
+In particular, the changes do not have to be "logical", i.e., depend on your changes in a comprehensible or reproducible way. 
+This can sometimes be frustrating, but can also be avoided in many cases when following the right instructions for prompting. 
+To do so, let's best follow the creators.
+
+
+::: {.callout-note}
+_The following is taken from the [OpenAI Guide](https://platform.openai.com/docs/guides/prompt-engineering){.external}_
+:::
+
+#### Write clear instructions
+These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.
+
+Tactics:
+
+- Include details in your query to get more relevant answers
+- Ask the model to adopt a persona
+- Use delimiters to clearly indicate distinct parts of the input
+- Specify the steps required to complete a task
+- Provide examples
+- Specify the desired length of the output
+<br/><br/>
+
+#### Provide reference text
+Language models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.
+
+Tactics:
+
+- Instruct the model to answer using a reference text
+- Instruct the model to answer with citations from a reference text
+<br/><br/>
+
+#### Split complex tasks into simpler subtasks
+Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks - submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
+
+Tactics:
+
+- Use intent classification to identify the most relevant instructions for a user query
+- For dialogue applications that require very long conversations, summarize or filter previous dialogue
+- Summarize long documents piecewise and construct a full summary recursively
+<br/><br/>
+
+#### Give the model time to "think"
+If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a "chain of thought" before an answer can help the model reason its way toward correct answers more reliably.
+
+Tactics:
+
+- Instruct the model to work out its own solution before rushing to a conclusion
+- Use inner monologue or a sequence of queries to hide the model's reasoning process
+- Ask the model if it missed anything on previous passes
+<br/><br/>
+
+#### Use external tools
+Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI's Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.
+
+Tactics:
+
+- Use embeddings-based search to implement efficient knowledge retrieval
+- Use code execution to perform more accurate calculations or call external APIs
+- Give the model access to specific functions
+<br/><br/>
+
+#### Test changes systematically
+Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").
+
+Tactic:
+
+- Evaluate model outputs with reference to gold-standard answers
+
diff --git a/nlp/overview.qmd b/nlp/overview.qmd
index bf6e4e1..446c0b7 100644
--- a/nlp/overview.qmd
+++ b/nlp/overview.qmd
@@ -35,7 +35,7 @@ These models represented a significant leap forward in NLP, enabling more nuance
 #### Large Language Models: Transformers (2010s-Present)
 The latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.
 Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. 
-Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. 
+Pre-trained on massive text corpora, models like GPT (Generative Pre-trained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. 
 Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.
 
 #### Challenges in NLP
@@ -119,14 +119,14 @@ for ent in doc.ents:
 </details>
 
 
-
 #### Machine Translation
 Machine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. 
 For example, translating a sentence from English to Spanish or vice versa. 
 MT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. 
 MT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.
 
-#### Sentiment Analysis
+#### Sentiment Analysis {#sec-sentiment-analysis}
+
 Sentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. 
 For instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. 
 Sentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.
diff --git a/nlp/tokenization.qmd b/nlp/tokenization.qmd
index e7010c8..f431c81 100644
--- a/nlp/tokenization.qmd
+++ b/nlp/tokenization.qmd
@@ -33,7 +33,7 @@ tokenized_sentence = sentence.split(" ")
 print(tokenized_sentence)
 ```
 
-Once we have tokenized the sentence, we can start anaylzing it with some simple statistical methods. 
+Once we have tokenized the sentence, we can start analyzing it with some simple statistical methods. 
 For example, in order to figure out what the sentence might be about, we could count the most frequent words. 
 
 ```{python}
@@ -44,7 +44,7 @@ token_counter = Counter(tokenized_sentence)
 print(token_counter.most_common(2))
 ```
 
-Unfortunately, we already realize that we have not done the best job with our "tokenizer": The second occurence of the word `science` is missing do to the punctuation. 
+Unfortunately, we already realize that we have not done the best job with our "tokenizer": The second occurrence of the word `science` is missing do to the punctuation. 
 While this is great as it holds information about the ending of a sentence, it disturbs our analysis here, so let's get rid of it. 
 
 ```{python}
@@ -72,11 +72,65 @@ print(tokenized_sentence)
 
 ## Advanced word tokenization
 
-TODO: Write
+The above ideas illustrate well the idea of tokenization of splitting text into smaller chunks that we can feed to a language model.
+In practice, especially in models like GPT, a critical component is the vocabulary or the set of unique words or tokens the model understands.
+Traditional approaches use fixed-size vocabularies, which means every unique word in the corpus has its own representation (index or embedding) in the model's vocabulary. 
+However, as the vocabulary size increases (for example, by including more languages), so does the memory requirement, which can be impractical for large-scale language models. 
+One solution is the so-called bit-pair encoding.
+Bit pair encoding is a data compression technique specifically designed to tackle the issue of large vocabularies in language models. 
+Instead of assigning a unique index or embedding to each token, bit pair encoding identifies frequent pairs of characters (bits) within the corpus and represents them as a single token. 
+This effectively reduces the size of the vocabulary while preserving the essential information needed for language modeling tasks.
 
 
-From the docs: 
+### How Bit Pair Encoding Works:
 
-https://platform.openai.com/tokenizer
+1. **Tokenization**: The first step in bit pair encoding is tokenization, where the text corpus is broken down into individual tokens. These tokens could be characters, subwords, or words, depending on the tokenization strategy used.
+
+2. **Pair Identification**: Next, the algorithm identifies pairs of characters (bits) that occur frequently within the corpus. These pairs are typically consecutive characters in the text.
+
+3. **Replacement with Single Token**: Once frequent pairs are identified, they are replaced with a single token. This effectively reduces the number of unique tokens in the vocabulary.
+
+4. **Iterative Process**: The process of identifying frequent pairs and replacing them with single tokens is iterative. It continues until a predefined stopping criterion is met, such as reaching a target vocabulary size or when no more frequent pairs can be found.
+
+5. **Vocabulary Construction**: After the iterative process, a vocabulary is constructed, consisting of the single tokens generated through pair replacement, along with any remaining tokens from the original tokenization process.
+
+6. **Encoding and Decoding**: During training and inference, text data is encoded using the constructed vocabulary, where each token is represented by its corresponding index in the vocabulary. During decoding, the indices are mapped back to their respective tokens.
+
+
+::: {.callout-tip}
+It is very illustrative to use the the OpenAI [tokenizer](https://platform.openai.com/tokenizer){.external} to see how a sentence is split up into different token.
+Try mixing languages and standard as well as more rare words and observe how they are split up.
+
+Another detailed example can be found [here](https://www.geeksforgeeks.org/byte-pair-encoding-bpe-in-nlp/){.external}.
+:::
+
+
+
+### Advantages of Bit Pair Encoding:
+
+1. **Efficient Memory Usage**: Bit pair encoding significantly reduces the size of the vocabulary, leading to more efficient memory usage, especially in large-scale language models.
+
+2. **Retains Information**: Despite reducing the vocabulary size, bit pair encoding retains important linguistic information by capturing frequent character pairs.
+
+3. **Flexible**: Bit pair encoding is flexible and can be adapted to different tokenization strategies and corpus characteristics.
+
+
+### Limitations and Considerations:
+
+1. **Computational Overhead**: The iterative nature of bit pair encoding can be computationally intensive, especially for large corpora.
+
+2. **Loss of Granularity**: While bit pair encoding reduces vocabulary size, it may lead to a loss of granularity, especially for rare or out-of-vocabulary words.
+
+3. **Tokenization Strategy**: The effectiveness of bit pair encoding depends on the tokenization strategy used and the characteristics of the corpus.
+
+
+
+::: {.callout-tip}
+__From the [OpenAI Guide](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them){.external}__:
 
 A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).
+:::
+
+
+
+