From d3c921deec8ec458889b6e39247783aab652b186 Mon Sep 17 00:00:00 2001
From: Julian <julian@edyoucated.org>
Date: Sat, 30 Mar 2024 16:22:43 +0100
Subject: [PATCH] added page navigation

---
 _freeze/llm/intro/execute-results/html.json   |   4 +-
 .../nlp/overview/execute-results/html.json    |   4 +-
 _quarto.yml                                   |   3 +
 docs/about/assignment.html                    |  20 ++-
 docs/about/projects.html                      |  20 ++-
 docs/about/schedule.html                      |  20 ++-
 docs/emb_exercise.html                        |   2 +-
 docs/embeddings/applications.html             |  20 ++-
 docs/embeddings/clustering.html               |  20 ++-
 docs/embeddings/embeddings.html               |  20 ++-
 .../exercises/ex_emb_similarity.html          |  20 ++-
 docs/embeddings/visualization.html            |  20 ++-
 docs/ethics/bias.html                         |  20 ++-
 docs/ethics/data_privacy.html                 |  16 +-
 docs/index.html                               |  16 +-
 .../exercises/ex_gpt_parameterization.html    |  20 ++-
 docs/llm/gpt.html                             |  20 ++-
 docs/llm/intro.html                           |  28 ++-
 docs/llm/parameterization.html                |  20 ++-
 docs/nlp/exercises/ex_fuzzy_matching.html     |  20 ++-
 docs/nlp/exercises/ex_tfidf.html              |  20 ++-
 docs/nlp/exercises/ex_tokenization.html       |  20 ++-
 docs/nlp/exercises/ex_word_matching.html      |  20 ++-
 docs/nlp/fuzzy_matching.html                  |  22 ++-
 docs/nlp/overview.html                        |  36 ++--
 docs/nlp/statistical_text_analysis.html       |  20 ++-
 docs/nlp/tokenization.html                    |  20 ++-
 docs/resources.html                           |  12 +-
 docs/search.json                              | 166 +++++++++++++++---
 docs/test.html                                |  12 +-
 docs/test_viz.html                            |   2 +-
 llm/intro.qmd                                 |   2 +-
 nlp/overview.qmd                              |   2 +-
 33 files changed, 569 insertions(+), 118 deletions(-)

diff --git a/_freeze/llm/intro/execute-results/html.json b/_freeze/llm/intro/execute-results/html.json
index 6528817..0849fa2 100644
--- a/_freeze/llm/intro/execute-results/html.json
+++ b/_freeze/llm/intro/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "96d9b1c6acd0d48a4f59c1b1bb2c4b59",
+  "hash": "8dca43e41c2fe7ee9d13d53651464ebf",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Introduction\nformat:\n  html:\n    code-fold: true\n---\n\n",
+    "markdown": "---\ntitle: Introduction to LLM\nformat:\n  html:\n    code-fold: true\n---\n\n",
     "supporting": [
       "intro_files"
     ],
diff --git a/_freeze/nlp/overview/execute-results/html.json b/_freeze/nlp/overview/execute-results/html.json
index d8dddb5..f49bb38 100644
--- a/_freeze/nlp/overview/execute-results/html.json
+++ b/_freeze/nlp/overview/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "cf8a3908662cdc94eef520942522a001",
+  "hash": "511ca1f1badee45c381c211cb06bcc5e",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Overview\nformat:\n  html:\n    code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n  \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n<details>\n<summary>Code example</summary>\n\n::: {#5d217b2c .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n    print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The       | POS Tag: DET  \nToken: sun       | POS Tag: NOUN \nToken: sets      | POS Tag: VERB \nToken: behind    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: mountains | POS Tag: NOUN \nToken: ,         | POS Tag: PUNCT\nToken: casting   | POS Tag: VERB \nToken: a         | POS Tag: DET  \nToken: golden    | POS Tag: ADJ  \nToken: glow      | POS Tag: NOUN \nToken: across    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: sky       | POS Tag: NOUN \nToken: .         | POS Tag: PUNCT\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n<details>\n<summary>Code example</summary>\n\n::: {#9a16816c .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n    print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple                | Label: ORG\nEntity: U.K.                 | Label: GPE\nEntity: London               | Label: GPE\nEntity: $1 billion           | Label: MONEY\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n<details>\n<summary>Code example</summary>\n\n::: {#2e03e0b7 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n    sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n    sentiment_label = \"Negative\"\nelse:\n    sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n</details>\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n<details>\n<summary>Code example</summary>\n\n::: {#bfae8fa2 .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n    \"I love this product!\",\n    \"This product is terrible.\",\n    \"Great service, highly recommended.\",\n    \"I had a bad experience with this company.\",\n]\nlabels = [\n    \"Positive\",\n    \"Negative\",\n    \"Positive\",\n    \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n</details>\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n",
+    "markdown": "---\ntitle: Overview of NLP\nformat:\n  html:\n    code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n  \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n<details>\n<summary>Code example</summary>\n\n::: {#d87df8e1 .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n    print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The       | POS Tag: DET  \nToken: sun       | POS Tag: NOUN \nToken: sets      | POS Tag: VERB \nToken: behind    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: mountains | POS Tag: NOUN \nToken: ,         | POS Tag: PUNCT\nToken: casting   | POS Tag: VERB \nToken: a         | POS Tag: DET  \nToken: golden    | POS Tag: ADJ  \nToken: glow      | POS Tag: NOUN \nToken: across    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: sky       | POS Tag: NOUN \nToken: .         | POS Tag: PUNCT\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n<details>\n<summary>Code example</summary>\n\n::: {#acc47b23 .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n    print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple                | Label: ORG\nEntity: U.K.                 | Label: GPE\nEntity: London               | Label: GPE\nEntity: $1 billion           | Label: MONEY\n```\n:::\n:::\n\n\n</details>\n\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n<details>\n<summary>Code example</summary>\n\n::: {#02b6acb2 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n    sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n    sentiment_label = \"Negative\"\nelse:\n    sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n</details>\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n<details>\n<summary>Code example</summary>\n\n::: {#356c23f3 .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n    \"I love this product!\",\n    \"This product is terrible.\",\n    \"Great service, highly recommended.\",\n    \"I had a bad experience with this company.\",\n]\nlabels = [\n    \"Positive\",\n    \"Negative\",\n    \"Positive\",\n    \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n</details>\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n",
     "supporting": [
       "overview_files"
     ],
diff --git a/_quarto.yml b/_quarto.yml
index 5dceaa2..2ac1b49 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -10,6 +10,9 @@ format:
     theme: cosmo
 
 website:
+  page-navigation: true
+  back-to-top-navigation: true
+  
   navbar:
     tools:
       - icon: github
diff --git a/docs/about/assignment.html b/docs/about/assignment.html
index 15c1087..04f19b8 100644
--- a/docs/about/assignment.html
+++ b/docs/about/assignment.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../nlp/overview.html" rel="next">
+<link href="../about/projects.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +395,7 @@ <h1 class="title">Academic assignment</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">Academic assignment</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../about/projects.html" class="pagination-link" aria-label="Projects">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Projects</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../nlp/overview.html" class="pagination-link" aria-label="Overview of NLP">
+        <span class="nav-page-text">Overview of NLP</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/about/projects.html b/docs/about/projects.html
index 7629e1e..be52437 100644
--- a/docs/about/projects.html
+++ b/docs/about/projects.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../about/assignment.html" rel="next">
+<link href="../about/schedule.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -407,7 +409,7 @@ <h1 class="title">Projects</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -817,6 +819,18 @@ <h1 class="title">Projects</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../about/schedule.html" class="pagination-link" aria-label="Schedule">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Schedule</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../about/assignment.html" class="pagination-link" aria-label="Academic assignment">
+        <span class="nav-page-text">Academic assignment</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/about/schedule.html b/docs/about/schedule.html
index 6cfff57..2c1f1e2 100644
--- a/docs/about/schedule.html
+++ b/docs/about/schedule.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../about/projects.html" rel="next">
+<link href="../index.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +395,7 @@ <h1 class="title">Schedule</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">Schedule</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../index.html" class="pagination-link" aria-label="The Sprint">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">The Sprint</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../about/projects.html" class="pagination-link" aria-label="Projects">
+        <span class="nav-page-text">Projects</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/emb_exercise.html b/docs/emb_exercise.html
index 3cb513d..b018a29 100644
--- a/docs/emb_exercise.html
+++ b/docs/emb_exercise.html
@@ -188,7 +188,7 @@ <h1 class="title">Embeddings</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
diff --git a/docs/embeddings/applications.html b/docs/embeddings/applications.html
index ea5e6af..dcefbf1 100644
--- a/docs/embeddings/applications.html
+++ b/docs/embeddings/applications.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../ethics/bias.html" rel="next">
+<link href="../embeddings/clustering.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +395,7 @@ <h1 class="title">Applications</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">Applications</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../embeddings/clustering.html" class="pagination-link" aria-label="Clustering">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Clustering</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../ethics/bias.html" class="pagination-link" aria-label="Bias">
+        <span class="nav-page-text">Bias</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/embeddings/clustering.html b/docs/embeddings/clustering.html
index 64b1869..1586824 100644
--- a/docs/embeddings/clustering.html
+++ b/docs/embeddings/clustering.html
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../embeddings/applications.html" rel="next">
+<link href="../embeddings/visualization.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -518,7 +520,7 @@ <h1 class="title">Clustering</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -928,6 +930,18 @@ <h1 class="title">Clustering</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../embeddings/visualization.html" class="pagination-link" aria-label="Visualization">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Visualization</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../embeddings/applications.html" class="pagination-link" aria-label="Applications">
+        <span class="nav-page-text">Applications</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/embeddings/embeddings.html b/docs/embeddings/embeddings.html
index 5b7273a..41414f4 100644
--- a/docs/embeddings/embeddings.html
+++ b/docs/embeddings/embeddings.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../embeddings/exercises/ex_emb_similarity.html" rel="next">
+<link href="../llm/exercises/ex_gpt_parameterization.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -403,7 +405,7 @@ <h1 class="title">Embeddings</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -813,6 +815,18 @@ <h1 class="title">Embeddings</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../llm/exercises/ex_gpt_parameterization.html" class="pagination-link" aria-label="Exercise: GPT Parameterization">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: GPT Parameterization</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../embeddings/exercises/ex_emb_similarity.html" class="pagination-link" aria-label="Exercise: Embedding similarity">
+        <span class="nav-page-text">Exercise: Embedding similarity</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/embeddings/exercises/ex_emb_similarity.html b/docs/embeddings/exercises/ex_emb_similarity.html
index 5b93245..7f9ce15 100644
--- a/docs/embeddings/exercises/ex_emb_similarity.html
+++ b/docs/embeddings/exercises/ex_emb_similarity.html
@@ -64,6 +64,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../embeddings/visualization.html" rel="next">
+<link href="../../embeddings/embeddings.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -524,7 +526,7 @@ <h1 class="title">Exercise: Embedding similarity</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -934,6 +936,18 @@ <h1 class="title">Exercise: Embedding similarity</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../embeddings/embeddings.html" class="pagination-link" aria-label="Embeddings">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Embeddings</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../embeddings/visualization.html" class="pagination-link" aria-label="Visualization">
+        <span class="nav-page-text">Visualization</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/embeddings/visualization.html b/docs/embeddings/visualization.html
index 019afb7..67f3c86 100644
--- a/docs/embeddings/visualization.html
+++ b/docs/embeddings/visualization.html
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../embeddings/clustering.html" rel="next">
+<link href="../embeddings/exercises/ex_emb_similarity.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -494,7 +496,7 @@ <h1 class="title">Visualization</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -904,6 +906,18 @@ <h1 class="title">Visualization</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../embeddings/exercises/ex_emb_similarity.html" class="pagination-link" aria-label="Exercise: Embedding similarity">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: Embedding similarity</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../embeddings/clustering.html" class="pagination-link" aria-label="Clustering">
+        <span class="nav-page-text">Clustering</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/ethics/bias.html b/docs/ethics/bias.html
index 0e297bc..c9ced79 100644
--- a/docs/ethics/bias.html
+++ b/docs/ethics/bias.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../ethics/data_privacy.html" rel="next">
+<link href="../embeddings/applications.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +395,7 @@ <h1 class="title">Bias</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">Bias</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../embeddings/applications.html" class="pagination-link" aria-label="Applications">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Applications</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../ethics/data_privacy.html" class="pagination-link" aria-label="Data Privacy">
+        <span class="nav-page-text">Data Privacy</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/ethics/data_privacy.html b/docs/ethics/data_privacy.html
index 04fecd4..b9fc74f 100644
--- a/docs/ethics/data_privacy.html
+++ b/docs/ethics/data_privacy.html
@@ -30,6 +30,7 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../ethics/bias.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +181,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +240,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +394,7 @@ <h1 class="title">Data Privacy</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +804,15 @@ <h1 class="title">Data Privacy</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../ethics/bias.html" class="pagination-link" aria-label="Bias">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Bias</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/index.html b/docs/index.html
index 6a02e69..e45a5f5 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -30,6 +30,7 @@
 <script src="site_libs/quarto-search/fuse.min.js"></script>
 <script src="site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="./">
+<link href="./about/schedule.html" rel="next">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
 <script src="site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +181,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="./nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +240,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="./llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +394,7 @@ <h1 class="title">The Sprint</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +804,15 @@ <h1 class="title">The Sprint</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="./about/schedule.html" class="pagination-link" aria-label="Schedule">
+        <span class="nav-page-text">Schedule</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/llm/exercises/ex_gpt_parameterization.html b/docs/llm/exercises/ex_gpt_parameterization.html
index 47a5f3d..711497b 100644
--- a/docs/llm/exercises/ex_gpt_parameterization.html
+++ b/docs/llm/exercises/ex_gpt_parameterization.html
@@ -30,6 +30,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../embeddings/embeddings.html" rel="next">
+<link href="../../llm/parameterization.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -399,7 +401,7 @@ <h1 class="title">Exercise: GPT Parameterization</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -809,6 +811,18 @@ <h1 class="title">Exercise: GPT Parameterization</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../llm/parameterization.html" class="pagination-link" aria-label="Parameterization of GPT">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Parameterization of GPT</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../embeddings/embeddings.html" class="pagination-link" aria-label="Embeddings">
+        <span class="nav-page-text">Embeddings</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/llm/gpt.html b/docs/llm/gpt.html
index 99d9556..f3778bc 100644
--- a/docs/llm/gpt.html
+++ b/docs/llm/gpt.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../llm/parameterization.html" rel="next">
+<link href="../llm/intro.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -393,7 +395,7 @@ <h1 class="title">GPT</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">GPT</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../llm/intro.html" class="pagination-link" aria-label="Introduction to LLM">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Introduction to LLM</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../llm/parameterization.html" class="pagination-link" aria-label="Parameterization of GPT">
+        <span class="nav-page-text">Parameterization of GPT</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/llm/intro.html b/docs/llm/intro.html
index 61ab42a..e6e3088 100644
--- a/docs/llm/intro.html
+++ b/docs/llm/intro.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>Introduction</title>
+<title>Introduction to LLM</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../llm/gpt.html" rel="next">
+<link href="../nlp/exercises/ex_tfidf.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -121,7 +123,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../llm/intro.html">Introduction</a></li></ol></nav>
+        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../llm/intro.html">Introduction to LLM</a></li></ol></nav>
         <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
         </a>
     </div>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link active">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -371,9 +373,9 @@
 <!-- main -->
 <main class="content" id="quarto-document-content">
 
-<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../llm/intro.html">Introduction</a></li></ol></nav>
+<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../llm/intro.html">Large Language Models</a></li><li class="breadcrumb-item"><a href="../llm/intro.html">Introduction to LLM</a></li></ol></nav>
 <div class="quarto-title">
-<h1 class="title">Introduction</h1>
+<h1 class="title">Introduction to LLM</h1>
 </div>
 
 
@@ -393,7 +395,7 @@ <h1 class="title">Introduction</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -803,6 +805,18 @@ <h1 class="title">Introduction</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../nlp/exercises/ex_tfidf.html" class="pagination-link" aria-label="Exercise: TF-IDF">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: TF-IDF</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../llm/gpt.html" class="pagination-link" aria-label="GPT">
+        <span class="nav-page-text">GPT</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/llm/parameterization.html b/docs/llm/parameterization.html
index d18a70d..c0e58cc 100644
--- a/docs/llm/parameterization.html
+++ b/docs/llm/parameterization.html
@@ -30,6 +30,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../llm/exercises/ex_gpt_parameterization.html" rel="next">
+<link href="../llm/gpt.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -403,7 +405,7 @@ <h1 class="title">Parameterization of GPT</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -813,6 +815,18 @@ <h1 class="title">Parameterization of GPT</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../llm/gpt.html" class="pagination-link" aria-label="GPT">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">GPT</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../llm/exercises/ex_gpt_parameterization.html" class="pagination-link" aria-label="Exercise: GPT Parameterization">
+        <span class="nav-page-text">Exercise: GPT Parameterization</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/exercises/ex_fuzzy_matching.html b/docs/nlp/exercises/ex_fuzzy_matching.html
index 81d4483..6d5fb52 100644
--- a/docs/nlp/exercises/ex_fuzzy_matching.html
+++ b/docs/nlp/exercises/ex_fuzzy_matching.html
@@ -30,6 +30,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../nlp/statistical_text_analysis.html" rel="next">
+<link href="../../nlp/fuzzy_matching.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -180,7 +182,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -239,7 +241,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -399,7 +401,7 @@ <h1 class="title">Exercise: Fuzzy matching</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -809,6 +811,18 @@ <h1 class="title">Exercise: Fuzzy matching</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../nlp/fuzzy_matching.html" class="pagination-link" aria-label="Fuzzy matching">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Fuzzy matching</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../nlp/statistical_text_analysis.html" class="pagination-link" aria-label="Statistical text analysis">
+        <span class="nav-page-text">Statistical text analysis</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/exercises/ex_tfidf.html b/docs/nlp/exercises/ex_tfidf.html
index 28ee154..0181486 100644
--- a/docs/nlp/exercises/ex_tfidf.html
+++ b/docs/nlp/exercises/ex_tfidf.html
@@ -64,6 +64,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../llm/intro.html" rel="next">
+<link href="../../nlp/statistical_text_analysis.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -550,7 +552,7 @@ <h1 class="title">Exercise: TF-IDF</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -960,6 +962,18 @@ <h1 class="title">Exercise: TF-IDF</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../nlp/statistical_text_analysis.html" class="pagination-link" aria-label="Statistical text analysis">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Statistical text analysis</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../llm/intro.html" class="pagination-link" aria-label="Introduction to LLM">
+        <span class="nav-page-text">Introduction to LLM</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/exercises/ex_tokenization.html b/docs/nlp/exercises/ex_tokenization.html
index 9b5329d..1659445 100644
--- a/docs/nlp/exercises/ex_tokenization.html
+++ b/docs/nlp/exercises/ex_tokenization.html
@@ -64,6 +64,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../nlp/exercises/ex_word_matching.html" rel="next">
+<link href="../../nlp/tokenization.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -430,7 +432,7 @@ <h1 class="title">Exercise: Tokenization</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -840,6 +842,18 @@ <h1 class="title">Exercise: Tokenization</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../nlp/tokenization.html" class="pagination-link" aria-label="Tokenization">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Tokenization</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../nlp/exercises/ex_word_matching.html" class="pagination-link" aria-label="Exercise: Word matching">
+        <span class="nav-page-text">Exercise: Word matching</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/exercises/ex_word_matching.html b/docs/nlp/exercises/ex_word_matching.html
index 1d0b77c..908a91a 100644
--- a/docs/nlp/exercises/ex_word_matching.html
+++ b/docs/nlp/exercises/ex_word_matching.html
@@ -64,6 +64,8 @@
 <script src="../../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../../">
+<link href="../../nlp/fuzzy_matching.html" rel="next">
+<link href="../../nlp/exercises/ex_tokenization.html" rel="prev">
 <script src="../../site_libs/quarto-html/quarto.js"></script>
 <script src="../../site_libs/quarto-html/popper.min.js"></script>
 <script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -548,7 +550,7 @@ <h1 class="title">Exercise: Word matching</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -958,6 +960,18 @@ <h1 class="title">Exercise: Word matching</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../../nlp/exercises/ex_tokenization.html" class="pagination-link" aria-label="Exercise: Tokenization">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: Tokenization</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../../nlp/fuzzy_matching.html" class="pagination-link" aria-label="Fuzzy matching">
+        <span class="nav-page-text">Fuzzy matching</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/fuzzy_matching.html b/docs/nlp/fuzzy_matching.html
index 9761f3c..03e92b4 100644
--- a/docs/nlp/fuzzy_matching.html
+++ b/docs/nlp/fuzzy_matching.html
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../nlp/exercises/ex_fuzzy_matching.html" rel="next">
+<link href="../nlp/exercises/ex_word_matching.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -525,7 +527,7 @@
 <!-- -->
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -985,7 +987,19 @@
     }
   }
 });
-</script><div class="modal fade" id="quarto-embedded-source-code-modal" tabindex="-1" aria-labelledby="quarto-embedded-source-code-modal-label" aria-hidden="true"><div class="modal-dialog modal-dialog-scrollable"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="quarto-embedded-source-code-modal-label">Source Code</h5><button class="btn-close" data-bs-dismiss="modal"></button></div><div class="modal-body"><div class="">
+</script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../nlp/exercises/ex_word_matching.html" class="pagination-link" aria-label="Exercise: Word matching">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: Word matching</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../nlp/exercises/ex_fuzzy_matching.html" class="pagination-link" aria-label="Exercise: Fuzzy matching">
+        <span class="nav-page-text">Exercise: Fuzzy matching</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav><div class="modal fade" id="quarto-embedded-source-code-modal" tabindex="-1" aria-labelledby="quarto-embedded-source-code-modal-label" aria-hidden="true"><div class="modal-dialog modal-dialog-scrollable"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="quarto-embedded-source-code-modal-label">Source Code</h5><button class="btn-close" data-bs-dismiss="modal"></button></div><div class="modal-body"><div class="">
 <div class="sourceCode" id="cb7" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Fuzzy matching"</span></span>
 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
diff --git a/docs/nlp/overview.html b/docs/nlp/overview.html
index 39c0288..cac57ab 100644
--- a/docs/nlp/overview.html
+++ b/docs/nlp/overview.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>Overview</title>
+<title>Overview of NLP</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../nlp/tokenization.html" rel="next">
+<link href="../about/assignment.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -155,7 +157,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../nlp/overview.html">Natural Language Processing</a></li><li class="breadcrumb-item"><a href="../nlp/overview.html">Overview</a></li></ol></nav>
+        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../nlp/overview.html">Natural Language Processing</a></li><li class="breadcrumb-item"><a href="../nlp/overview.html">Overview of NLP</a></li></ol></nav>
         <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
         </a>
     </div>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link active">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -405,9 +407,9 @@
 <!-- main -->
 <main class="content" id="quarto-document-content">
 
-<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../nlp/overview.html">Natural Language Processing</a></li><li class="breadcrumb-item"><a href="../nlp/overview.html">Overview</a></li></ol></nav>
+<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../nlp/overview.html">Natural Language Processing</a></li><li class="breadcrumb-item"><a href="../nlp/overview.html">Overview of NLP</a></li></ol></nav>
 <div class="quarto-title">
-<h1 class="title">Overview</h1>
+<h1 class="title">Overview of NLP</h1>
 </div>
 
 
@@ -469,7 +471,7 @@ <h4 class="anchored" data-anchor-id="part-of-speech-tagging">Part-of-Speech Tagg
 <summary>
 Code example
 </summary>
-<div id="5d217b2c" class="cell" data-execution_count="1">
+<div id="d87df8e1" class="cell" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> spacy</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the English language model</span></span>
@@ -515,7 +517,7 @@ <h4 class="anchored" data-anchor-id="named-entity-recognition">Named Entity Reco
 <summary>
 Code example
 </summary>
-<div id="9a16816c" class="cell" data-execution_count="2">
+<div id="acc47b23" class="cell" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> spacy</span>
 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the English language model</span></span>
@@ -550,7 +552,7 @@ <h4 class="anchored" data-anchor-id="sentiment-analysis">Sentiment Analysis</h4>
 <summary>
 Code example
 </summary>
-<div id="2e03e0b7" class="cell" data-execution_count="3">
+<div id="02b6acb2" class="cell" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># python -m textblob.download_corpora</span></span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> textblob <span class="im">import</span> TextBlob</span>
@@ -589,7 +591,7 @@ <h4 class="anchored" data-anchor-id="text-classification">Text Classification</h
 <summary>
 Code example
 </summary>
-<div id="bfae8fa2" class="cell" data-execution_count="4">
+<div id="356c23f3" class="cell" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.feature_extraction.text <span class="im">import</span> TfidfVectorizer</span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.svm <span class="im">import</span> SVC</span>
 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.pipeline <span class="im">import</span> make_pipeline</span>
@@ -653,7 +655,7 @@ <h4 class="anchored" data-anchor-id="question-answering">Question-Answering</h4>
 </section>
 </section>
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -1063,6 +1065,18 @@ <h4 class="anchored" data-anchor-id="question-answering">Question-Answering</h4>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../about/assignment.html" class="pagination-link" aria-label="Academic assignment">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Academic assignment</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../nlp/tokenization.html" class="pagination-link" aria-label="Tokenization">
+        <span class="nav-page-text">Tokenization</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/statistical_text_analysis.html b/docs/nlp/statistical_text_analysis.html
index ea11cd1..2c142c1 100644
--- a/docs/nlp/statistical_text_analysis.html
+++ b/docs/nlp/statistical_text_analysis.html
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../nlp/exercises/ex_tfidf.html" rel="next">
+<link href="../nlp/exercises/ex_fuzzy_matching.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -608,7 +610,7 @@ <h2 class="anchored" data-anchor-id="clustering-of-bow-vectors">Clustering of Bo
 
 </section>
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -1018,6 +1020,18 @@ <h2 class="anchored" data-anchor-id="clustering-of-bow-vectors">Clustering of Bo
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../nlp/exercises/ex_fuzzy_matching.html" class="pagination-link" aria-label="Exercise: Fuzzy matching">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Exercise: Fuzzy matching</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../nlp/exercises/ex_tfidf.html" class="pagination-link" aria-label="Exercise: TF-IDF">
+        <span class="nav-page-text">Exercise: TF-IDF</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/nlp/tokenization.html b/docs/nlp/tokenization.html
index 13c199b..a40b922 100644
--- a/docs/nlp/tokenization.html
+++ b/docs/nlp/tokenization.html
@@ -64,6 +64,8 @@
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
 <script src="../site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="../">
+<link href="../nlp/exercises/ex_tokenization.html" rel="next">
+<link href="../nlp/overview.html" rel="prev">
 <script src="../site_libs/quarto-html/quarto.js"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -214,7 +216,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../nlp/overview.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Overview</span></a>
+ <span class="menu-text">Overview of NLP</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -273,7 +275,7 @@
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="../llm/intro.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">Introduction</span></a>
+ <span class="menu-text">Introduction to LLM</span></a>
   </div>
 </li>
           <li class="sidebar-item">
@@ -481,7 +483,7 @@ <h2 class="anchored" data-anchor-id="advanced-word-tokenization">Advanced word t
 
 </section>
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -891,6 +893,18 @@ <h2 class="anchored" data-anchor-id="advanced-word-tokenization">Advanced word t
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="../nlp/overview.html" class="pagination-link" aria-label="Overview of NLP">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Overview of NLP</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="../nlp/exercises/ex_tokenization.html" class="pagination-link" aria-label="Exercise: Tokenization">
+        <span class="nav-page-text">Exercise: Tokenization</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/resources.html b/docs/resources.html
index 99af009..270463a 100644
--- a/docs/resources.html
+++ b/docs/resources.html
@@ -30,6 +30,7 @@
 <script src="site_libs/quarto-search/fuse.min.js"></script>
 <script src="site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="./">
+<link href="./test.html" rel="next">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
 <script src="site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -160,7 +161,7 @@
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -570,6 +571,15 @@
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+  </div>
+  <div class="nav-page nav-page-next">
+      <a href="./test.html" class="pagination-link" aria-label="Resource 2">
+        <span class="nav-page-text">Resource 2</span> <i class="bi bi-arrow-right-short"></i>
+      </a>
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/search.json b/docs/search.json
index 8140452..44fec3c 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -1,17 +1,28 @@
 [
+  {
+    "objectID": "resources.html",
+    "href": "resources.html",
+    "title": "",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Resources",
+      "Resource 1"
+    ]
+  },
   {
     "objectID": "test_viz.html",
     "href": "test_viz.html",
     "title": "",
     "section": "",
-    "text": "# prerequisites\n\nimport os\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn.manifold import TSNE\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n# Define a list of words to visualize\nwords = [\"python\", \"javascript\", \"c++\", \"reptile\", \"snake\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n    input=words,\n    model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n    n_components=2, \n    random_state=42,\n    perplexity=4 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    plt.scatter(x, y, marker='o', color='red')\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()"
+    "text": "# prerequisites\n\nimport os\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn.manifold import TSNE\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n# Define a list of words to visualize\nwords = [\"python\", \"javascript\", \"c++\", \"reptile\", \"snake\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n    input=words,\n    model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n    n_components=2, \n    random_state=42,\n    perplexity=4 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    plt.scatter(x, y, marker='o', color='red')\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()\n\n\n\n\n\n\n\n\n\n\n\n Back to top"
   },
   {
     "objectID": "nlp/exercises/ex_word_matching.html",
     "href": "nlp/exercises/ex_word_matching.html",
     "title": "Exercise: Word matching",
     "section": "",
-    "text": "Task: For each element of the following list of keywords, determine whether it is contained in the text.\nInstructions:\n\nTransform the text to lower case and use your (or a) tokenizer to split the text into word tokens.\nFirst, use a simple comparison of strings to check whether the keywords match any token. When does this approach fail?\nLemmatize the tokens from your text in order to handle some more matching cases. When does this approach still fail? Hint: Use the different options for pos in order to handle different types of words such as nouns, verbs etc.\n\n\ntext = \"The company's latest quarterly earnings reports exceeded analysts' expectations, driving up the stock price. However, concerns about future growth prospects weighed on investor sentiment. The CEO announced plans to diversify the company's product portfolio and expand into new markets, aiming to sustain long-term profitability. The marketing team launched a new advertising campaign to promote the company's flagship product, targeting key demographics. Despite challenges in the competitive landscape, the company remains committed to innovation and customer satisfaction.\"\n\n\nkeywords = [\n    \"Announce\", \n    \"Aim\",\n    \"Earnings\",\n    \"Quarter\",\n    \"Report\",\n    \"Investor\",\n    \"Analysis\",\n    \"Market\",\n    \"Diversity\",\n    \"Product portfolio\",\n    \"Advertisment\",\n    \"Stock\",\n    \"Landscpe\" # yes, this is here on purpose\n]\n\nSolution:\n\nfrom pprint import pprint\nfrom nltk.tokenize import wordpunct_tokenize\n\ntext_token = wordpunct_tokenize(text=text.lower())\ndetected_words = [\n    (keyword, keyword.lower() in text_token) for keyword in keywords\n]\npprint(detected_words)\nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")\n\n[('Announce', False),\n ('Aim', False),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', False),\n ('Investor', True),\n ('Analysis', False),\n ('Market', False),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 3/13 words.\n\n\n\nfrom nltk.stem import WordNetLemmatizer\n\nwnl = WordNetLemmatizer()\n\nlemmatized_text_token = [\n    wnl.lemmatize(w) for w in text_token\n]\ndetected_words = [\n    (keyword, keyword.lower() in lemmatized_text_token) for keyword in keywords\n]\npprint(detected_words)\nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")\n\n[('Announce', False),\n ('Aim', False),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', True),\n ('Investor', True),\n ('Analysis', False),\n ('Market', True),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 5/13 words.\n\n\n\nfully_lemmatized_text_token = []\n\nfor token in text_token:\n    lemmatized_token = token\n    for pos in [\"n\", \"v\", \"a\"]:\n        lemmatized_token = wnl.lemmatize(token, pos=pos)\n        \n        fully_lemmatized_text_token.append(lemmatized_token)\n\ndetected_words = [\n    (keyword, keyword.lower() in fully_lemmatized_text_token) for keyword in keywords\n]\npprint(detected_words)    \nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")  \n        \n\n[('Announce', True),\n ('Aim', True),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', True),\n ('Investor', True),\n ('Analysis', False),\n ('Market', True),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 7/13 words.",
+    "text": "Task: For each element of the following list of keywords, determine whether it is contained in the text.\nInstructions:\n\nTransform the text to lower case and use your (or a) tokenizer to split the text into word tokens.\nFirst, use a simple comparison of strings to check whether the keywords match any token. When does this approach fail?\nLemmatize the tokens from your text in order to handle some more matching cases. When does this approach still fail? Hint: Use the different options for pos in order to handle different types of words such as nouns, verbs etc.\n\n\ntext = \"The company's latest quarterly earnings reports exceeded analysts' expectations, driving up the stock price. However, concerns about future growth prospects weighed on investor sentiment. The CEO announced plans to diversify the company's product portfolio and expand into new markets, aiming to sustain long-term profitability. The marketing team launched a new advertising campaign to promote the company's flagship product, targeting key demographics. Despite challenges in the competitive landscape, the company remains committed to innovation and customer satisfaction.\"\n\n\nkeywords = [\n    \"Announce\", \n    \"Aim\",\n    \"Earnings\",\n    \"Quarter\",\n    \"Report\",\n    \"Investor\",\n    \"Analysis\",\n    \"Market\",\n    \"Diversity\",\n    \"Product portfolio\",\n    \"Advertisment\",\n    \"Stock\",\n    \"Landscpe\" # yes, this is here on purpose\n]\n\nSolution:\n\nfrom pprint import pprint\nfrom nltk.tokenize import wordpunct_tokenize\n\ntext_token = wordpunct_tokenize(text=text.lower())\ndetected_words = [\n    (keyword, keyword.lower() in text_token) for keyword in keywords\n]\npprint(detected_words)\nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")\n\n[('Announce', False),\n ('Aim', False),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', False),\n ('Investor', True),\n ('Analysis', False),\n ('Market', False),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 3/13 words.\n\n\n\nfrom nltk.stem import WordNetLemmatizer\n\nwnl = WordNetLemmatizer()\n\nlemmatized_text_token = [\n    wnl.lemmatize(w) for w in text_token\n]\ndetected_words = [\n    (keyword, keyword.lower() in lemmatized_text_token) for keyword in keywords\n]\npprint(detected_words)\nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")\n\n[('Announce', False),\n ('Aim', False),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', True),\n ('Investor', True),\n ('Analysis', False),\n ('Market', True),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 5/13 words.\n\n\n\nfully_lemmatized_text_token = []\n\nfor token in text_token:\n    lemmatized_token = token\n    for pos in [\"n\", \"v\", \"a\"]:\n        lemmatized_token = wnl.lemmatize(token, pos=pos)\n        \n        fully_lemmatized_text_token.append(lemmatized_token)\n\ndetected_words = [\n    (keyword, keyword.lower() in fully_lemmatized_text_token) for keyword in keywords\n]\npprint(detected_words)    \nprint(f\"\\nDetected {sum([x[1] for x in detected_words])}/{len(keywords)} words.\")  \n        \n\n[('Announce', True),\n ('Aim', True),\n ('Earnings', True),\n ('Quarter', False),\n ('Report', True),\n ('Investor', True),\n ('Analysis', False),\n ('Market', True),\n ('Diversity', False),\n ('Product portfolio', False),\n ('Advertisment', False),\n ('Stock', True),\n ('Landscpe', False)]\n\nDetected 7/13 words.\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -23,7 +34,7 @@
     "href": "nlp/exercises/ex_tokenization.html",
     "title": "Exercise: Tokenization",
     "section": "",
-    "text": "# write/extend tokenizer",
+    "text": "# write/extend tokenizer \n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -81,37 +92,61 @@
   {
     "objectID": "nlp/overview.html",
     "href": "nlp/overview.html",
-    "title": "Overview",
+    "title": "Overview of NLP",
     "section": "",
     "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
-      "Overview"
+      "Overview of NLP"
     ]
   },
   {
     "objectID": "nlp/overview.html#a-short-history-of-natural-language-processing",
     "href": "nlp/overview.html#a-short-history-of-natural-language-processing",
-    "title": "Overview",
+    "title": "Overview of NLP",
     "section": "",
     "text": "The field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. From its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. Today, it stands as a testament to humanity’s quest to bridge the gap between human language and machine comprehension. The journey through NLP’s history offers profound insights into its trajectory and the challenges encountered along the way.\n\n\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. This period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. However, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. Different languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n\n\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. Researchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. Techniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. Statistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. However, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n\n\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. With the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. These models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n\n\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture. Powered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. Pre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. Their ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n\n\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\nAmbiguity of Language: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\nDifferent Languages: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\nBias: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\nImportance of Context: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\nWorld Knowledge: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\nCommon Sense Reasoning: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n\n\n\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. While challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. As NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
-      "Overview"
+      "Overview of NLP"
     ]
   },
   {
     "objectID": "nlp/overview.html#classic-nlp-tasksapplications",
     "href": "nlp/overview.html#classic-nlp-tasksapplications",
-    "title": "Overview",
+    "title": "Overview of NLP",
     "section": "Classic NLP tasks/applications",
     "text": "Classic NLP tasks/applications\n\nPart-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. For example, in the sentence “The cat is sleeping,” part-of-speech tagging would identify “cat” as a noun and “sleeping” as a verb. This task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. Accurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n\nCode example\n\n\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n    print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n\nToken: The       | POS Tag: DET  \nToken: sun       | POS Tag: NOUN \nToken: sets      | POS Tag: VERB \nToken: behind    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: mountains | POS Tag: NOUN \nToken: ,         | POS Tag: PUNCT\nToken: casting   | POS Tag: VERB \nToken: a         | POS Tag: DET  \nToken: golden    | POS Tag: ADJ  \nToken: glow      | POS Tag: NOUN \nToken: across    | POS Tag: ADP  \nToken: the       | POS Tag: DET  \nToken: sky       | POS Tag: NOUN \nToken: .         | POS Tag: PUNCT\n\n\n\n\n\nNamed Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence “Apple is headquartered in Cupertino,” NER would identify “Apple” as an organization and “Cupertino” as a location. NER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n\nCode example\n\n\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n    print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n\nEntity: Apple                | Label: ORG\nEntity: U.K.                 | Label: GPE\nEntity: London               | Label: GPE\nEntity: $1 billion           | Label: MONEY\n\n\n\n\n\nMachine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. For example, translating a sentence from English to Spanish or vice versa. MT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. MT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n\nSentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. For instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. Sentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n\nCode example\n\n\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score &gt; 0:\n    sentiment_label = \"Positive\"\nelif sentiment_score &lt; 0:\n    sentiment_label = \"Negative\"\nelse:\n    sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n\n\n\n\n\nText Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. For example, classifying news articles into topics like politics, sports, or entertainment. Text Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. It enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n\nCode example\n\n\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n    \"I love this product!\",\n    \"This product is terrible.\",\n    \"Great service, highly recommended.\",\n    \"I had a bad experience with this company.\",\n]\nlabels = [\n    \"Positive\",\n    \"Negative\",\n    \"Positive\",\n    \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n\nText: This product exceeded my expectations.\nPredicted Label: Negative\n\n\n\n\n\nInformation Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. This includes identifying entities, relationships, and events mentioned in the text. For example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. Information Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n\nQuestion-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. These systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. For example, answering factual questions like “Who is the president of the United States?” or “What is the capital of France?”. QA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
-      "Overview"
+      "Overview of NLP"
+    ]
+  },
+  {
+    "objectID": "ethics/data_privacy.html",
+    "href": "ethics/data_privacy.html",
+    "title": "Data Privacy",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "Ethical Considerations",
+      "Data Privacy"
+    ]
+  },
+  {
+    "objectID": "about/schedule.html",
+    "href": "about/schedule.html",
+    "title": "Schedule",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "About",
+      "Schedule"
     ]
   },
   {
@@ -119,7 +154,7 @@
     "href": "about/projects.html",
     "title": "Projects",
     "section": "",
-    "text": "AI Assistant: Chatbot for a specific topic and behavior with GPT (”AI Assistant”)\nDocument tagging / classification: with embeddings and GPT (in any flavor)\nClustering of text-based entities: Based on embeddings and clustering algorithms\nText-based RPG Game: Develop a text-based role-playing game where players interact with characters and navigate through a story generated by GPT. Players make choices that influence the direction of the narrative.\nSentiment Analysis Tool: Build an app that analyzes the sentiment of text inputs (e.g., social media posts, customer reviews) using GPT. Users can input text, and the app provides insights into the overall sentiment expressed in the text.\nText Summarization Tool: Create an application that summarizes long blocks of text into shorter, concise summaries. Users can input articles, essays, or documents, and the tool generates a summarized version.\nLanguage Translation Tool: Build a simple translation app that utilizes GPT to translate text between different languages. Users can input text in one language, and the app outputs the translated text in the desired language. Has to include some nice tweaks.\nStory Generation Game: Develop a storytelling game where users provide prompts or keywords, and GPT generates a short story based on those inputs. Users can then rate the generated stories for creativity and coherence.\nQuestion-Answering Chatbot: Build a chatbot that can answer questions posed by users on a specific topic (e.g., based on documents). Users input their questions, and the chatbot retrieves relevant information from a pre-trained GPT model or specific documents).\nPersonalized Recipe Generator: Develop an app that generates personalized recipes based on user preferences and dietary restrictions. Users input their preferred ingredients and dietary needs, and the app generates custom recipes using GPT.\nLyrics Generator: Create a lyrics generation tool that generates lyrics based on user input such as themes, music style, emotions, or keywords. Users can explore different poetic styles and themes generated by GPT.\nText Summarizer",
+    "text": "AI Assistant: Chatbot for a specific topic and behavior with GPT (”AI Assistant”)\nDocument tagging / classification: with embeddings and GPT (in any flavor)\nClustering of text-based entities: Based on embeddings and clustering algorithms\nText-based RPG Game: Develop a text-based role-playing game where players interact with characters and navigate through a story generated by GPT. Players make choices that influence the direction of the narrative.\nSentiment Analysis Tool: Build an app that analyzes the sentiment of text inputs (e.g., social media posts, customer reviews) using GPT. Users can input text, and the app provides insights into the overall sentiment expressed in the text.\nText Summarization Tool: Create an application that summarizes long blocks of text into shorter, concise summaries. Users can input articles, essays, or documents, and the tool generates a summarized version.\nLanguage Translation Tool: Build a simple translation app that utilizes GPT to translate text between different languages. Users can input text in one language, and the app outputs the translated text in the desired language. Has to include some nice tweaks.\nStory Generation Game: Develop a storytelling game where users provide prompts or keywords, and GPT generates a short story based on those inputs. Users can then rate the generated stories for creativity and coherence.\nQuestion-Answering Chatbot: Build a chatbot that can answer questions posed by users on a specific topic (e.g., based on documents). Users input their questions, and the chatbot retrieves relevant information from a pre-trained GPT model or specific documents).\nPersonalized Recipe Generator: Develop an app that generates personalized recipes based on user preferences and dietary restrictions. Users input their preferred ingredients and dietary needs, and the app generates custom recipes using GPT.\nLyrics Generator: Create a lyrics generation tool that generates lyrics based on user input such as themes, music style, emotions, or keywords. Users can explore different poetic styles and themes generated by GPT.\nText Summarizer\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "About",
@@ -131,7 +166,7 @@
     "href": "embeddings/exercises/ex_emb_similarity.html",
     "title": "Exercise: Embedding similarity",
     "section": "",
-    "text": "Task: Use the OpenAI embeddings API to compute the similarity between two given words or phrases.\nInstructions:\n\nChoose two words or phrases with similar or related meanings.\nUse the OpenAI embeddings API to obtain embeddings for both words or phrases.\nCalculate the cosine similarity between the embeddings to measure their similarity.\nPrint the similarity score and interpret the results.\n\nSolution\n\n\nCode\nimport numpy as np\n\ndef cosine_similarity(vec1: np.array, vec2: np.array) -&gt; float: \n    return np.dot(vec1, vec2) / ( np.linalg.norm(vec1) * np.linalg.norm(vec2) )\n\n\n\n\nCode\nimport os\n\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n# create the embeddings\nword_1 = \"king\"\nword_2 = \"queen\"\n\nresponse_1 = client.embeddings.create(input=word_1, model=MODEL)\nembedding_1 = response_1.data[0].embedding\nresponse_2 = client.embeddings.create(input=word_2, model=MODEL)\nembedding_2 = response_2.data[0].embedding\n\n\n\n\nCode\n# calculate the distance \ndist_12 = cosine_similarity(embedding_1, embedding_2)\nprint(f\"Cosine similarity between {word_1} and {word_2}: {round(dist_12, 3)}.\")\n\n\nCosine similarity between king and queen: 0.915.\n\n\n\n\nCode\nword_3 = \"pawn\"\nembedding_3 = client.embeddings.create(input=word_3, model=MODEL).data[0].embedding\n\ndist_13 = cosine_similarity(embedding_1, embedding_3)\nprint(f\"Cosine similarity between {word_1} and {word_3}: {round(dist_13, 3)}.\")\n\n\nCosine similarity between king and pawn: 0.829.\n\n\nTask: Use the OpenAI embeddings API and simple embedding arithmetics to introduce more context to word similarities.\nInstructions:\n\nCreate embeddings for the following three words: python, snake, javascript using the OpenAI API.\nCalculate the cosine similarity between each pair.\nCreate another embedding for the word reptile and add it to python. You can use numpy for this.\nCalculate the cosine similarity between python and this sum. What do you notice?\n\nSolution\n\n\nCode\nwords = [\"python\", \"snake\", \"javascript\", \"reptile\"]\nresponse = client.embeddings.create(input=words, model=MODEL)\nembeddings = [emb.embedding for emb in response.data]\n\n\n\n\nCode\nprint(f\"Similarity between '{words[0]}' and '{words[1]}': {round(cosine_similarity(embeddings[0], embeddings[1]), 3)}.\")\nprint(f\"Similarity between '{words[0]}' and '{words[2]}': {round(cosine_similarity(embeddings[0], embeddings[2]), 3)}.\")\nprint(f\"Similarity between '{words[0]} + {words[3]}' and '{words[1]}': {round(cosine_similarity(np.array(embeddings[0]) + np.array(embeddings[3]), embeddings[1]), 3)}.\")\n\n\nSimilarity between 'python' and 'snake': 0.841.\nSimilarity between 'python' and 'javascript': 0.85.\nSimilarity between 'python + reptile' and 'snake': 0.894.",
+    "text": "Task: Use the OpenAI embeddings API to compute the similarity between two given words or phrases.\nInstructions:\n\nChoose two words or phrases with similar or related meanings.\nUse the OpenAI embeddings API to obtain embeddings for both words or phrases.\nCalculate the cosine similarity between the embeddings to measure their similarity.\nPrint the similarity score and interpret the results.\n\nSolution\n\n\nCode\nimport numpy as np\n\ndef cosine_similarity(vec1: np.array, vec2: np.array) -&gt; float: \n    return np.dot(vec1, vec2) / ( np.linalg.norm(vec1) * np.linalg.norm(vec2) )\n\n\n\n\nCode\nimport os\n\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n# create the embeddings\nword_1 = \"king\"\nword_2 = \"queen\"\n\nresponse_1 = client.embeddings.create(input=word_1, model=MODEL)\nembedding_1 = response_1.data[0].embedding\nresponse_2 = client.embeddings.create(input=word_2, model=MODEL)\nembedding_2 = response_2.data[0].embedding\n\n\n\n\nCode\n# calculate the distance \ndist_12 = cosine_similarity(embedding_1, embedding_2)\nprint(f\"Cosine similarity between {word_1} and {word_2}: {round(dist_12, 3)}.\")\n\n\nCosine similarity between king and queen: 0.915.\n\n\n\n\nCode\nword_3 = \"pawn\"\nembedding_3 = client.embeddings.create(input=word_3, model=MODEL).data[0].embedding\n\ndist_13 = cosine_similarity(embedding_1, embedding_3)\nprint(f\"Cosine similarity between {word_1} and {word_3}: {round(dist_13, 3)}.\")\n\n\nCosine similarity between king and pawn: 0.829.\n\n\nTask: Use the OpenAI embeddings API and simple embedding arithmetics to introduce more context to word similarities.\nInstructions:\n\nCreate embeddings for the following three words: python, snake, javascript using the OpenAI API.\nCalculate the cosine similarity between each pair.\nCreate another embedding for the word reptile and add it to python. You can use numpy for this.\nCalculate the cosine similarity between python and this sum. What do you notice?\n\nSolution\n\n\nCode\nwords = [\"python\", \"snake\", \"javascript\", \"reptile\"]\nresponse = client.embeddings.create(input=words, model=MODEL)\nembeddings = [emb.embedding for emb in response.data]\n\n\n\n\nCode\nprint(f\"Similarity between '{words[0]}' and '{words[1]}': {round(cosine_similarity(embeddings[0], embeddings[1]), 3)}.\")\nprint(f\"Similarity between '{words[0]}' and '{words[2]}': {round(cosine_similarity(embeddings[0], embeddings[2]), 3)}.\")\nprint(f\"Similarity between '{words[0]} + {words[3]}' and '{words[1]}': {round(cosine_similarity(np.array(embeddings[0]) + np.array(embeddings[3]), embeddings[1]), 3)}.\")\n\n\nSimilarity between 'python' and 'snake': 0.841.\nSimilarity between 'python' and 'javascript': 0.85.\nSimilarity between 'python + reptile' and 'snake': 0.894.\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Embeddings",
@@ -143,7 +178,7 @@
     "href": "embeddings/visualization.html",
     "title": "Visualization",
     "section": "",
-    "text": "# prerequisites\n\nimport os\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n\n# Define a list of words to visualize\nwords = [\"king\", \"queen\", \"man\", \"woman\", \"apple\", \"banana\", \"grapes\", \"cat\", \"dog\", \"happy\", \"sad\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n    input=words,\n    model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n    n_components=2, \n    random_state=42,\n    perplexity=5 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    plt.scatter(x, y, marker='o', color='red')\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()\n\n\n\n\nt-SNE visualization of word embeddings",
+    "text": "# prerequisites\n\nimport os\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n    model=MODEL,\n    config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n\n# Define a list of words to visualize\nwords = [\"king\", \"queen\", \"man\", \"woman\", \"apple\", \"banana\", \"grapes\", \"cat\", \"dog\", \"happy\", \"sad\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n    input=words,\n    model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n    n_components=2, \n    random_state=42,\n    perplexity=5 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    plt.scatter(x, y, marker='o', color='red')\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()\n\n\n\n\nt-SNE visualization of word embeddings\n\n\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Embeddings",
@@ -155,19 +190,31 @@
     "href": "embeddings/clustering.html",
     "title": "Clustering",
     "section": "",
-    "text": "# prerequisites\n\nimport os\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n  model=MODEL,\n  config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n\n# Define a list of words to cluster\nwords = [\"king\", \"queen\", \"man\", \"woman\", \"apple\", \"banana\", \"grapes\", \"cat\", \"dog\", \"happy\", \"sad\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n  input=words,\n  model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n\n# do the clustering\nimport numpy as np\nfrom sklearn.cluster import KMeans\n\nn_clusters = 5\n\n# define the model\nkmeans = KMeans(\n  n_clusters=n_clusters,\n  n_init=\"auto\",\n  random_state=2 # do this to get the same output\n)\n\n# fit the model to the data\nkmeans.fit(np.array(embeddings))\n\n# get the cluster labels\ncluster_labels = kmeans.labels_\n\n::: {#cell-tsne-visualization of clustering .cell execution_count=5}\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n  n_components=2, \n  random_state=42,\n  perplexity=5 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Define a color map for clusters\ncolors = plt.cm.viridis(np.linspace(0, 1, n_clusters))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    cluster_label = cluster_labels[i]\n    color = colors[cluster_label]\n    plt.scatter(x, y, marker='o', color=color)\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()\n\n\n\n\nt-SNE visualization of clustering word embeddings\n\n\n\n:::",
+    "text": "# prerequisites\n\nimport os\nfrom llm_utils.client import get_openai_client, OpenAIModels\n\nMODEL = OpenAIModels.EMBED.value\n\n# get the OpenAI client\nclient = get_openai_client(\n  model=MODEL,\n  config_path=os.environ.get(\"CONFIG_PATH\")\n)\n\n\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n\n# Define a list of words to cluster\nwords = [\"king\", \"queen\", \"man\", \"woman\", \"apple\", \"banana\", \"grapes\", \"cat\", \"dog\", \"happy\", \"sad\"]\n\n# Get embeddings for the words\nresponse = client.embeddings.create(\n  input=words,\n  model=MODEL\n)\n\nembeddings = [emb.embedding for emb in response.data]\n\n\n# do the clustering\nimport numpy as np\nfrom sklearn.cluster import KMeans\n\nn_clusters = 5\n\n# define the model\nkmeans = KMeans(\n  n_clusters=n_clusters,\n  n_init=\"auto\",\n  random_state=2 # do this to get the same output\n)\n\n# fit the model to the data\nkmeans.fit(np.array(embeddings))\n\n# get the cluster labels\ncluster_labels = kmeans.labels_\n\n::: {#cell-tsne-visualization of clustering .cell execution_count=5}\nimport matplotlib.pyplot as plt\n\nfrom sklearn.manifold import TSNE\n\n# Apply t-SNE dimensionality reduction\ntsne = TSNE(\n  n_components=2, \n  random_state=42,\n  perplexity=5 # see documentation to set this correctly\n)\nembeddings_2d = tsne.fit_transform(np.array(embeddings))\n\n# Define a color map for clusters\ncolors = plt.cm.viridis(np.linspace(0, 1, n_clusters))\n\n# Plot the embeddings in a two-dimensional scatter plot\nplt.figure(figsize=(10, 8))\nfor i, word in enumerate(words):\n    x, y = embeddings_2d[i]\n    cluster_label = cluster_labels[i]\n    color = colors[cluster_label]\n    plt.scatter(x, y, marker='o', color=color)\n    plt.text(x, y, word, fontsize=9)\n\nplt.xlabel(\"t-SNE dimension 1\")\nplt.ylabel(\"t-SNE dimension 2\")\nplt.grid(True)\nplt.xticks([])\nplt.yticks([])\nplt.show()\n\n\n\n\nt-SNE visualization of clustering word embeddings\n\n\n\n:::\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Embeddings",
       "Clustering"
     ]
   },
+  {
+    "objectID": "llm/gpt.html",
+    "href": "llm/gpt.html",
+    "title": "GPT",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "Large Language Models",
+      "GPT"
+    ]
+  },
   {
     "objectID": "llm/parameterization.html",
     "href": "llm/parameterization.html",
     "title": "Parameterization of GPT",
     "section": "",
-    "text": "Temperature: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It’s often used to balance between generating safe, conservative responses and more novel, imaginative ones.\nMax Tokens: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.\nTop P (Nucleus Sampling): Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It’s particularly useful for generating diverse and contextually relevant responses.\nFrequency Penalty: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.\nPresence Penalty: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It’s useful for generating more creative and novel outputs that are not directly predictable from the input.\nStop Sequence: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.\nRepetition Penalty: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It’s particularly useful for generating coherent and diverse text without excessive redundancy.\nLength Penalty: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.",
+    "text": "Temperature: Temperature is a parameter that controls the randomness of the generated text. Lower temperatures result in more deterministic outputs, where the model tends to choose the most likely tokens at each step. Higher temperatures introduce more randomness, allowing the model to explore less likely tokens and produce more creative outputs. It’s often used to balance between generating safe, conservative responses and more novel, imaginative ones.\nMax Tokens: Max Tokens limits the maximum length of the generated text by specifying the maximum number of tokens (words or subwords) allowed in the output. This parameter helps to control the length of the response and prevent the model from generating overly long or verbose outputs, which may not be suitable for certain applications or contexts.\nTop P (Nucleus Sampling): Top P, also known as nucleus sampling, dynamically selects a subset of the most likely tokens based on their cumulative probability until the cumulative probability exceeds a certain threshold (specified by the parameter). This approach ensures diversity in the generated text while still prioritizing tokens with higher probabilities. It’s particularly useful for generating diverse and contextually relevant responses.\nFrequency Penalty: Frequency Penalty penalizes tokens based on their frequency in the generated text. Tokens that appear more frequently are assigned higher penalties, discouraging the model from repeatedly generating common or redundant tokens. This helps to promote diversity in the generated text and prevent the model from producing overly repetitive outputs.\nPresence Penalty: Presence Penalty penalizes tokens that are already present in the input prompt. By discouraging the model from simply echoing or replicating the input text, this parameter encourages the generation of responses that go beyond the provided context. It’s useful for generating more creative and novel outputs that are not directly predictable from the input.\nStop Sequence: Stop Sequence specifies a sequence of tokens that, if generated by the model, signals it to stop generating further text. This parameter is commonly used to indicate the desired ending or conclusion of the generated text. It helps to control the length of the response and ensure that the model generates text that aligns with specific requirements or constraints.\nRepetition Penalty: Repetition Penalty penalizes repeated tokens in the generated text by assigning higher penalties to tokens that appear multiple times within a short context window. This encourages the model to produce more varied outputs by avoiding unnecessary repetition of tokens. It’s particularly useful for generating coherent and diverse text without excessive redundancy.\nLength Penalty: Length Penalty penalizes the length of the generated text by applying a penalty factor to longer sequences. This helps to balance between generating concise and informative responses while avoiding excessively long or verbose outputs. Length Penalty is often used to control the length of the generated text and ensure that it remains coherent and contextually relevant.\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
@@ -179,7 +226,7 @@
     "href": "script_venv/lib/python3.8/site-packages/httpx-0.27.0.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "Copyright © 2019, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
+    "text": "Copyright © 2019, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
   },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/nltk-3.8.1.dist-info/AUTHORS.html",
@@ -214,14 +261,14 @@
     "href": "script_venv/lib/python3.8/site-packages/soupsieve-2.5.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "MIT License\nCopyright (c) 2018 - 2023 Isaac Muse isaacmuse@gmail.com\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."
+    "text": "MIT License\nCopyright (c) 2018 - 2023 Isaac Muse isaacmuse@gmail.com\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n\n\n Back to top"
   },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
     "href": "script_venv/lib/python3.8/site-packages/wasabi/tests/test-data/wasabi-test-notebook.html",
     "title": "",
     "section": "",
-    "text": "import sys\nimport wasabi\n\nwasabi.msg.warn(\"This is a test. This is only a test.\")\nif sys.version_info &gt;= (3, 7):\n    assert wasabi.util.supports_ansi()\n\nprint(sys.stdout)"
+    "text": "import sys\nimport wasabi\n\nwasabi.msg.warn(\"This is a test. This is only a test.\")\nif sys.version_info &gt;= (3, 7):\n    assert wasabi.util.supports_ansi()\n\nprint(sys.stdout)\n\n\n\n\n Back to top"
   },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/QtPy-2.4.1.dist-info/AUTHORS.html",
@@ -251,12 +298,19 @@
     "section": "",
     "text": "The QtPy Contributors"
   },
+  {
+    "objectID": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
+    "href": "script_venv/lib/python3.8/site-packages/cffi/recompiler.html",
+    "title": "",
+    "section": "",
+    "text": "Back to top"
+  },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
     "href": "script_venv/lib/python3.8/site-packages/httpcore-1.0.4.dist-info/licenses/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "Copyright © 2020, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
+    "text": "Copyright © 2020, Encode OSS Ltd. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
   },
   {
     "objectID": "script_venv/lib/python3.8/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.html",
@@ -291,14 +345,26 @@
     "href": "script_venv/lib/python3.8/site-packages/idna-3.6.dist-info/LICENSE.html",
     "title": "",
     "section": "",
-    "text": "BSD 3-Clause License\nCopyright (c) 2013-2023, Kim Davies and contributors. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
+    "text": "BSD 3-Clause License\nCopyright (c) 2013-2023, Kim Davies and contributors. All rights reserved.\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\nRedistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\nNeither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n\n\n Back to top"
+  },
+  {
+    "objectID": "llm/intro.html",
+    "href": "llm/intro.html",
+    "title": "Introduction to LLM",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "Large Language Models",
+      "Introduction to LLM"
+    ]
   },
   {
     "objectID": "llm/exercises/ex_gpt_parameterization.html",
     "href": "llm/exercises/ex_gpt_parameterization.html",
     "title": "Exercise: GPT Parameterization",
     "section": "",
-    "text": "Task: Explore the parameterization possibilities of the OpenAI API for GPT.\nInstructions:\n\nSome instructions\n\nTODO: Finalize this!",
+    "text": "Task: Explore the parameterization possibilities of the OpenAI API for GPT.\nInstructions:\n\nSome instructions\n\nTODO: Finalize this!\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Large Language Models",
@@ -310,19 +376,67 @@
     "href": "embeddings/embeddings.html",
     "title": "Embeddings",
     "section": "",
-    "text": "Word Embeddings: Word embeddings are dense vector representations of words in a continuous vector space. Each word is mapped to a high-dimensional vector where words with similar meanings or contexts are closer together in the vector space.\nContextual Embeddings: Contextual embeddings, also known as contextualized word embeddings, capture the contextual information of words based on their surrounding context in a given sentence or document. Unlike traditional word embeddings, contextual embeddings vary depending on the context in which a word appears.\nBERT Embeddings: BERT (Bidirectional Encoder Representations from Transformers) embeddings are contextual embeddings provided by OpenAI. They are generated by pre-training a Transformer-based neural network model on a large corpus of text data using masked language modeling and next sentence prediction tasks.\nGPT Embeddings: GPT (Generative Pre-trained Transformer) embeddings are also contextual embeddings provided by OpenAI. They are generated by pre-training a Transformer-based neural network model on a large corpus of text data using an autoregressive language modeling objective.\nTransformer-based Architecture: Both BERT and GPT embeddings are derived from Transformer-based architectures, which consist of multiple layers of self-attention mechanisms and feed-forward neural networks. These architectures excel at capturing long-range dependencies and contextual information in sequential data.\nPre-trained Models: OpenAI provides pre-trained BERT and GPT models that have been trained on large-scale text corpora. These pre-trained models can be fine-tuned on specific tasks or domains with labeled data to adapt their knowledge and capabilities to new applications.\nTransfer Learning: BERT and GPT embeddings support transfer learning, where the pre-trained models are used as feature extractors for downstream NLP tasks. By fine-tuning these models on task-specific data, users can leverage the knowledge encoded in the embeddings to achieve state-of-the-art performance on various natural language processing tasks.\nApplications: BERT and GPT embeddings have a wide range of applications in natural language processing tasks such as text classification, named entity recognition, sentiment analysis, question-answering, and more. They provide powerful representations of text data that capture both semantic and syntactic information.",
+    "text": "Word Embeddings: Word embeddings are dense vector representations of words in a continuous vector space. Each word is mapped to a high-dimensional vector where words with similar meanings or contexts are closer together in the vector space.\nContextual Embeddings: Contextual embeddings, also known as contextualized word embeddings, capture the contextual information of words based on their surrounding context in a given sentence or document. Unlike traditional word embeddings, contextual embeddings vary depending on the context in which a word appears.\nBERT Embeddings: BERT (Bidirectional Encoder Representations from Transformers) embeddings are contextual embeddings provided by OpenAI. They are generated by pre-training a Transformer-based neural network model on a large corpus of text data using masked language modeling and next sentence prediction tasks.\nGPT Embeddings: GPT (Generative Pre-trained Transformer) embeddings are also contextual embeddings provided by OpenAI. They are generated by pre-training a Transformer-based neural network model on a large corpus of text data using an autoregressive language modeling objective.\nTransformer-based Architecture: Both BERT and GPT embeddings are derived from Transformer-based architectures, which consist of multiple layers of self-attention mechanisms and feed-forward neural networks. These architectures excel at capturing long-range dependencies and contextual information in sequential data.\nPre-trained Models: OpenAI provides pre-trained BERT and GPT models that have been trained on large-scale text corpora. These pre-trained models can be fine-tuned on specific tasks or domains with labeled data to adapt their knowledge and capabilities to new applications.\nTransfer Learning: BERT and GPT embeddings support transfer learning, where the pre-trained models are used as feature extractors for downstream NLP tasks. By fine-tuning these models on task-specific data, users can leverage the knowledge encoded in the embeddings to achieve state-of-the-art performance on various natural language processing tasks.\nApplications: BERT and GPT embeddings have a wide range of applications in natural language processing tasks such as text classification, named entity recognition, sentiment analysis, question-answering, and more. They provide powerful representations of text data that capture both semantic and syntactic information.\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Embeddings",
       "Embeddings"
     ]
   },
+  {
+    "objectID": "embeddings/applications.html",
+    "href": "embeddings/applications.html",
+    "title": "Applications",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "Embeddings",
+      "Applications"
+    ]
+  },
   {
     "objectID": "emb_exercise.html",
     "href": "emb_exercise.html",
     "title": "Embeddings",
     "section": "",
-    "text": "Code\nclass Test: \n    pass"
+    "text": "Code\nclass Test: \n    pass\n\n\n\n\n\n Back to top"
+  },
+  {
+    "objectID": "about/assignment.html",
+    "href": "about/assignment.html",
+    "title": "Academic assignment",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "About",
+      "Academic assignment"
+    ]
+  },
+  {
+    "objectID": "index.html",
+    "href": "index.html",
+    "title": "The Sprint",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "About",
+      "The Sprint"
+    ]
+  },
+  {
+    "objectID": "ethics/bias.html",
+    "href": "ethics/bias.html",
+    "title": "Bias",
+    "section": "",
+    "text": "Back to top",
+    "crumbs": [
+      "Seminar",
+      "Ethical Considerations",
+      "Bias"
+    ]
   },
   {
     "objectID": "nlp/tokenization.html",
@@ -365,7 +479,7 @@
     "href": "nlp/exercises/ex_fuzzy_matching.html",
     "title": "Exercise: Fuzzy matching",
     "section": "",
-    "text": "Task: Use fuzzy matching and the rapidfuzz library to find the keywords in the text.\nInstructions:\n\nSome keywords with multiple words, partial ratio etc.\n\nTODO: Finalize this!",
+    "text": "Task: Use fuzzy matching and the rapidfuzz library to find the keywords in the text.\nInstructions:\n\nSome keywords with multiple words, partial ratio etc.\n\nTODO: Finalize this!\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -377,7 +491,7 @@
     "href": "nlp/exercises/ex_tfidf.html",
     "title": "Exercise: TF-IDF",
     "section": "",
-    "text": "Task: Extend the code for the bag of words to TF-IDF.\nInstructions:\n\nSome instructions\n\nTODO: Finalize this!\n\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\nfrom typing import List\n\nfrom nltk.corpus import stopwords\n# python -m nltk.downloader stopwords -&gt; run this in your console once to get the stopwords\n\ndef preprocess_text(text: str) -&gt; List[str]:\n    # tokenize text\n    tokens = wordpunct_tokenize(text.lower())\n\n    # remove punctuation\n    tokens = [t for t in tokens if t not in punctuation]\n\n    # remove stopwords\n    stop_words = stopwords.words(\"english\")\n    tokens = [t for t in tokens if t not in stop_words]\n\n    return tokens\n\n\nfrom collections import Counter\nimport math\n\n\ndef calculate_tf(word_counts, total_words):\n    # Calculate Term Frequency (TF)\n    tf = {}\n    for word, count in word_counts.items():\n        tf[word] = count / total_words\n    return tf\n\ndef calculate_idf(word_counts, num_documents):\n    # Calculate Inverse Document Frequency (IDF)\n    idf = {}\n    for word, count in word_counts.items():\n        idf[word] = math.log((1 + num_documents) / (1 + count))\n    return idf\n\ndef create_tf_idf(texts):\n    # Count the frequency of each word in the corpus and total number of words\n    word_counts = Counter()\n    total_words = 0\n    for text in texts:\n        # Preprocess the text\n        words = preprocess_text(text)\n        \n        # Update word counts and total number of words\n        word_counts.update(words)\n        total_words += len(words)\n    \n    # Create sorted vocabulary\n    vocabulary = sorted(word_counts.keys())\n    \n    # Calculate TF-IDF for each document\n    tf_idf_vectors = []\n    num_documents = len(texts)\n    for text in texts:\n        # Preprocess the text\n        words = preprocess_text(text)\n        \n        # Calculate TF for the document\n        tf = calculate_tf(Counter(words), len(words))\n        \n        # Calculate IDF based on word counts across all documents\n        idf = calculate_idf(word_counts, num_documents)\n        \n        # Calculate TF-IDF for the document\n        tf_idf_vector = {}\n        for word in vocabulary:\n            tf_idf_vector[word] = round(tf.get(word, 0) * idf[word], 2)\n        \n        # Sort the IFIDF vector based on the vocabulary order\n        sorted_tfidf_vector = [tf_idf_vector[word] for word in vocabulary]\n        \n        # Append the BoW vector to the list\n        tf_idf_vectors.append(sorted_tfidf_vector)\n    \n    return vocabulary, tf_idf_vectors\n\n# Example texts\ntexts = [\n    \"This is the first document.\",\n    \"This document is the second document.\",\n    \"And this is the third one.\",\n    \"Is this the first document?\",\n]\n\n# Create TF-IDF vectors\nvocabulary, tf_idf_vectors = create_tf_idf(texts)\n\n# Print vocabulary\nprint(\"Vocabulary:\")\nprint(vocabulary)\n\n# Print TF-IDF vectors\nprint(\"\\nTF-IDF Vectors:\")\nfor i, tf_idf_vector in enumerate(tf_idf_vectors):\n    print(f\"Document {i + 1}: {tf_idf_vector}\")\n\nVocabulary:\n['document', 'first', 'one', 'second', 'third']\n\nTF-IDF Vectors:\nDocument 1: [0.0, 0.26, 0.0, 0.0, 0.0]\nDocument 2: [0.0, 0.0, 0.0, 0.31, 0.0]\nDocument 3: [0.0, 0.0, 0.46, 0.0, 0.46]\nDocument 4: [0.0, 0.26, 0.0, 0.0, 0.0]\n\n\nTask: Find some documents and apply this to it.\nInstructions:\n\nFind the closest matching documents.\n\nTODO: Finalize this!",
+    "text": "Task: Extend the code for the bag of words to TF-IDF.\nInstructions:\n\nSome instructions\n\nTODO: Finalize this!\n\nfrom nltk.tokenize import wordpunct_tokenize\nfrom string import punctuation\nfrom typing import List\n\nfrom nltk.corpus import stopwords\n# python -m nltk.downloader stopwords -&gt; run this in your console once to get the stopwords\n\ndef preprocess_text(text: str) -&gt; List[str]:\n    # tokenize text\n    tokens = wordpunct_tokenize(text.lower())\n\n    # remove punctuation\n    tokens = [t for t in tokens if t not in punctuation]\n\n    # remove stopwords\n    stop_words = stopwords.words(\"english\")\n    tokens = [t for t in tokens if t not in stop_words]\n\n    return tokens\n\n\nfrom collections import Counter\nimport math\n\n\ndef calculate_tf(word_counts, total_words):\n    # Calculate Term Frequency (TF)\n    tf = {}\n    for word, count in word_counts.items():\n        tf[word] = count / total_words\n    return tf\n\ndef calculate_idf(word_counts, num_documents):\n    # Calculate Inverse Document Frequency (IDF)\n    idf = {}\n    for word, count in word_counts.items():\n        idf[word] = math.log((1 + num_documents) / (1 + count))\n    return idf\n\ndef create_tf_idf(texts):\n    # Count the frequency of each word in the corpus and total number of words\n    word_counts = Counter()\n    total_words = 0\n    for text in texts:\n        # Preprocess the text\n        words = preprocess_text(text)\n        \n        # Update word counts and total number of words\n        word_counts.update(words)\n        total_words += len(words)\n    \n    # Create sorted vocabulary\n    vocabulary = sorted(word_counts.keys())\n    \n    # Calculate TF-IDF for each document\n    tf_idf_vectors = []\n    num_documents = len(texts)\n    for text in texts:\n        # Preprocess the text\n        words = preprocess_text(text)\n        \n        # Calculate TF for the document\n        tf = calculate_tf(Counter(words), len(words))\n        \n        # Calculate IDF based on word counts across all documents\n        idf = calculate_idf(word_counts, num_documents)\n        \n        # Calculate TF-IDF for the document\n        tf_idf_vector = {}\n        for word in vocabulary:\n            tf_idf_vector[word] = round(tf.get(word, 0) * idf[word], 2)\n        \n        # Sort the IFIDF vector based on the vocabulary order\n        sorted_tfidf_vector = [tf_idf_vector[word] for word in vocabulary]\n        \n        # Append the BoW vector to the list\n        tf_idf_vectors.append(sorted_tfidf_vector)\n    \n    return vocabulary, tf_idf_vectors\n\n# Example texts\ntexts = [\n    \"This is the first document.\",\n    \"This document is the second document.\",\n    \"And this is the third one.\",\n    \"Is this the first document?\",\n]\n\n# Create TF-IDF vectors\nvocabulary, tf_idf_vectors = create_tf_idf(texts)\n\n# Print vocabulary\nprint(\"Vocabulary:\")\nprint(vocabulary)\n\n# Print TF-IDF vectors\nprint(\"\\nTF-IDF Vectors:\")\nfor i, tf_idf_vector in enumerate(tf_idf_vectors):\n    print(f\"Document {i + 1}: {tf_idf_vector}\")\n\nVocabulary:\n['document', 'first', 'one', 'second', 'third']\n\nTF-IDF Vectors:\nDocument 1: [0.0, 0.26, 0.0, 0.0, 0.0]\nDocument 2: [0.0, 0.0, 0.0, 0.31, 0.0]\nDocument 3: [0.0, 0.0, 0.46, 0.0, 0.46]\nDocument 4: [0.0, 0.26, 0.0, 0.0, 0.0]\n\n\nTask: Find some documents and apply this to it.\nInstructions:\n\nFind the closest matching documents.\n\nTODO: Finalize this!\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -389,7 +503,7 @@
     "href": "nlp/fuzzy_matching.html",
     "title": "Fuzzy matching",
     "section": "",
-    "text": "As can be seen from the previous example, the detection of certain keywords from a text can prove more difficult than one might expect. The key issues stem from the fact that natural language has many facets such as conjugation, singular and plural forms, adjectives vs. adverbs etc. But even when these are handled, there remain challenges for keywords detection. In the previous example, our detection still fails when:\n\nkeywords consist of multiple words (product portfolio),\nkeywords have different forms but mean the same (advertisment vs. advertising),\nkeywords have wrong spelling (langscpe vs. landscape),\nkeywords and target words are not exactly the same thing but closely related (analysis vs. analyst).\n\nThe former case can be handled by using so called n-grams. In contrast to the single words we used for word tokens, n-grams are sequences of n consecutive words in a text, thus capturing some more of the context in a simple way. Let’s see a simple example for 2-grams:\n\nfrom nltk import ngrams\n\nsentence = \"The CEO announced plans to diversify the company's product portfolio...\"\n\nfor n_gram in ngrams(sentence.split(\" \"), n=2):\n  print(n_gram)\n\n('The', 'CEO')\n('CEO', 'announced')\n('announced', 'plans')\n('plans', 'to')\n('to', 'diversify')\n('diversify', 'the')\n('the', \"company's\")\n(\"company's\", 'product')\n('product', 'portfolio...')\n\n\nIn order to detect keywords consisting of more than a single word we can now split our text into n-grams for different n(e.g., 2, 3, 4) and compare these to our keywords.\nIn order to handle the other three cases, we a different approach. So let us first notice that, for all three cases, the word we are trying to compare are very similar (in terms of the contained letters) but not exactly equal. So what if we had a way to define a similarity between words and texts or, more generally, between any strings? One solution for this is fuzzy matching. Instead of considering two strings a match if they are exactly equal, fuzzy matching assigns a score to the pair. If the score is high enough, we might consider the pair a match.\n\n\nSome details about fuzzy matching\n\nFuzzy string matching is a technique used to find strings that are approximately similar to a given pattern, even if there are differences in spelling, punctuation, or word order. It is particularly useful in situations where exact string matching is not feasible due to variations or errors in the data. Fuzzy matching algorithms compute a similarity score between pairs of strings, typically based on criteria such as character similarity, substring matching, or token-based similarity. These algorithms often employ techniques like Levenshtein distance, which measures the minimum number of single-character edits required to transform one string into another, or tokenization to compare sets or sorted versions of tokens. Overall, fuzzy string matching enables the identification of similar strings, facilitating tasks such as record linkage, spell checking, and approximate string matching in various applications, including natural language processing, data cleaning, and information retrieval.\n\nLet’s see how this works using the package rapidfuzz.\n\nfrom rapidfuzz import fuzz\n\nword_pairs = [\n  (\"advertisment\", \"advertising\"),\n  (\"landscpe\", \"landscape\"),\n  (\"analysis\", \"analyst\")\n]\n\nfor word_pair in word_pairs:\n  ratio = fuzz.ratio(\n    s1=word_pair[0], \n    s2=word_pair[1]\n  )\n  print(f\"Similarity score '{word_pair[0]} - '{word_pair[1]}': {round(ratio, 2)}.\")\n\nSimilarity score 'advertisment - 'advertising': 78.26.\nSimilarity score 'landscpe - 'landscape': 94.12.\nSimilarity score 'analysis - 'analyst': 80.0.\n\n\nLet us use fuzzy matching on order to detect some of the missing keywords from the previous example.\n\nfrom pprint import pprint\nfrom nltk.tokenize import wordpunct_tokenize\n\ntokenized_text = wordpunct_tokenize(text=text)\n\nmin_score = 75\n\nmatches = []\nfor token in tokenized_text:\n  for keyword in keywords:\n    ratio = fuzz.ratio(\n      s1=token.lower(), \n      s2=keyword.lower()\n    )\n    if ratio &gt;= min_score:\n      matches.append(\n        (keyword, token, round(ratio, 2))\n      )\n\npprint(matches)\n\n[('Quarter', 'quarterly', 87.5),\n ('Earnings', 'earnings', 100.0),\n ('Report', 'reports', 92.31),\n ('Analysis', 'analysts', 87.5),\n ('Stock', 'stock', 100.0),\n ('Investor', 'investor', 100.0),\n ('Announce', 'announced', 94.12),\n ('Diversity', 'diversify', 88.89),\n ('Market', 'markets', 92.31),\n ('Market', 'marketing', 80.0),\n ('Advertisment', 'advertising', 78.26),\n ('Landscpe', 'landscape', 94.12)]\n\n\nAs we can see we have now successfully found most of the keywords we were looking for. However, we can also see a new caveat: We have now detected two possible matches for Market: Marketing and Markets. In this case, we can simply pick the one with the higher score and we are good, but there will be cases where it is more difficult to decide, whether a match, even with a higher score, is actually a match.\nFuzzy matching can, of course, also be used to compare n-grams or even entire texts to each other (see also the documentation of rapidfuzz and the next exercise); however there are certain limits to how practical it can be. But the concept in general already gives us some good evidence that, in order to compare words and text to each other, we would like to be able to somehow calculate with text. In the next sections, we will see ways how to do that more efficiently.",
+    "text": "As can be seen from the previous example, the detection of certain keywords from a text can prove more difficult than one might expect. The key issues stem from the fact that natural language has many facets such as conjugation, singular and plural forms, adjectives vs. adverbs etc. But even when these are handled, there remain challenges for keywords detection. In the previous example, our detection still fails when:\n\nkeywords consist of multiple words (product portfolio),\nkeywords have different forms but mean the same (advertisment vs. advertising),\nkeywords have wrong spelling (langscpe vs. landscape),\nkeywords and target words are not exactly the same thing but closely related (analysis vs. analyst).\n\nThe former case can be handled by using so called n-grams. In contrast to the single words we used for word tokens, n-grams are sequences of n consecutive words in a text, thus capturing some more of the context in a simple way. Let’s see a simple example for 2-grams:\n\nfrom nltk import ngrams\n\nsentence = \"The CEO announced plans to diversify the company's product portfolio...\"\n\nfor n_gram in ngrams(sentence.split(\" \"), n=2):\n  print(n_gram)\n\n('The', 'CEO')\n('CEO', 'announced')\n('announced', 'plans')\n('plans', 'to')\n('to', 'diversify')\n('diversify', 'the')\n('the', \"company's\")\n(\"company's\", 'product')\n('product', 'portfolio...')\n\n\nIn order to detect keywords consisting of more than a single word we can now split our text into n-grams for different n(e.g., 2, 3, 4) and compare these to our keywords.\nIn order to handle the other three cases, we a different approach. So let us first notice that, for all three cases, the word we are trying to compare are very similar (in terms of the contained letters) but not exactly equal. So what if we had a way to define a similarity between words and texts or, more generally, between any strings? One solution for this is fuzzy matching. Instead of considering two strings a match if they are exactly equal, fuzzy matching assigns a score to the pair. If the score is high enough, we might consider the pair a match.\n\n\nSome details about fuzzy matching\n\nFuzzy string matching is a technique used to find strings that are approximately similar to a given pattern, even if there are differences in spelling, punctuation, or word order. It is particularly useful in situations where exact string matching is not feasible due to variations or errors in the data. Fuzzy matching algorithms compute a similarity score between pairs of strings, typically based on criteria such as character similarity, substring matching, or token-based similarity. These algorithms often employ techniques like Levenshtein distance, which measures the minimum number of single-character edits required to transform one string into another, or tokenization to compare sets or sorted versions of tokens. Overall, fuzzy string matching enables the identification of similar strings, facilitating tasks such as record linkage, spell checking, and approximate string matching in various applications, including natural language processing, data cleaning, and information retrieval.\n\nLet’s see how this works using the package rapidfuzz.\n\nfrom rapidfuzz import fuzz\n\nword_pairs = [\n  (\"advertisment\", \"advertising\"),\n  (\"landscpe\", \"landscape\"),\n  (\"analysis\", \"analyst\")\n]\n\nfor word_pair in word_pairs:\n  ratio = fuzz.ratio(\n    s1=word_pair[0], \n    s2=word_pair[1]\n  )\n  print(f\"Similarity score '{word_pair[0]} - '{word_pair[1]}': {round(ratio, 2)}.\")\n\nSimilarity score 'advertisment - 'advertising': 78.26.\nSimilarity score 'landscpe - 'landscape': 94.12.\nSimilarity score 'analysis - 'analyst': 80.0.\n\n\nLet us use fuzzy matching on order to detect some of the missing keywords from the previous example.\n\nfrom pprint import pprint\nfrom nltk.tokenize import wordpunct_tokenize\n\ntokenized_text = wordpunct_tokenize(text=text)\n\nmin_score = 75\n\nmatches = []\nfor token in tokenized_text:\n  for keyword in keywords:\n    ratio = fuzz.ratio(\n      s1=token.lower(), \n      s2=keyword.lower()\n    )\n    if ratio &gt;= min_score:\n      matches.append(\n        (keyword, token, round(ratio, 2))\n      )\n\npprint(matches)\n\n[('Quarter', 'quarterly', 87.5),\n ('Earnings', 'earnings', 100.0),\n ('Report', 'reports', 92.31),\n ('Analysis', 'analysts', 87.5),\n ('Stock', 'stock', 100.0),\n ('Investor', 'investor', 100.0),\n ('Announce', 'announced', 94.12),\n ('Diversity', 'diversify', 88.89),\n ('Market', 'markets', 92.31),\n ('Market', 'marketing', 80.0),\n ('Advertisment', 'advertising', 78.26),\n ('Landscpe', 'landscape', 94.12)]\n\n\nAs we can see we have now successfully found most of the keywords we were looking for. However, we can also see a new caveat: We have now detected two possible matches for Market: Marketing and Markets. In this case, we can simply pick the one with the higher score and we are good, but there will be cases where it is more difficult to decide, whether a match, even with a higher score, is actually a match.\nFuzzy matching can, of course, also be used to compare n-grams or even entire texts to each other (see also the documentation of rapidfuzz and the next exercise); however there are certain limits to how practical it can be. But the concept in general already gives us some good evidence that, in order to compare words and text to each other, we would like to be able to somehow calculate with text. In the next sections, we will see ways how to do that more efficiently.\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Seminar",
       "Natural Language Processing",
@@ -401,7 +515,7 @@
     "href": "test.html",
     "title": "Quarto Basics",
     "section": "",
-    "text": "For a demonstration of a line plot on a polar axis, see Figure 1.\n\n\nCode\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nr = np.arange(0, 2, 0.01)\ntheta = 2 * np.pi * r\nfig, ax = plt.subplots(\n  subplot_kw = {'projection': 'polar'} \n)\nax.plot(theta, r)\nax.set_rticks([0.5, 1, 1.5, 2])\nax.grid(True)\nplt.show()\n\n\n\n\n\n\n\n\nFigure 1: A line plot on a polar axis",
+    "text": "For a demonstration of a line plot on a polar axis, see Figure 1.\n\n\nCode\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nr = np.arange(0, 2, 0.01)\ntheta = 2 * np.pi * r\nfig, ax = plt.subplots(\n  subplot_kw = {'projection': 'polar'} \n)\nax.plot(theta, r)\nax.set_rticks([0.5, 1, 1.5, 2])\nax.grid(True)\nplt.show()\n\n\n\n\n\n\n\n\nFigure 1: A line plot on a polar axis\n\n\n\n\n\n\n\n\n Back to top",
     "crumbs": [
       "Resources",
       "Resource 2"
diff --git a/docs/test.html b/docs/test.html
index 74982eb..8ffa607 100644
--- a/docs/test.html
+++ b/docs/test.html
@@ -64,6 +64,7 @@
 <script src="site_libs/quarto-search/fuse.min.js"></script>
 <script src="site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="./">
+<link href="./resources.html" rel="prev">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
 <script src="site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -240,7 +241,7 @@ <h1 class="title">Quarto Basics</h1>
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
@@ -650,6 +651,15 @@ <h1 class="title">Quarto Basics</h1>
   }
 });
 </script>
+<nav class="page-navigation">
+  <div class="nav-page nav-page-previous">
+      <a href="./resources.html" class="pagination-link" aria-label="Resource 1">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Resource 1</span>
+      </a>          
+  </div>
+  <div class="nav-page nav-page-next">
+  </div>
+</nav>
 </div> <!-- /content -->
 <footer class="footer">
   <div class="nav-footer">
diff --git a/docs/test_viz.html b/docs/test_viz.html
index 01dd4b1..1c99f77 100644
--- a/docs/test_viz.html
+++ b/docs/test_viz.html
@@ -224,7 +224,7 @@
 
 
 
-</main> <!-- /main -->
+<a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
diff --git a/llm/intro.qmd b/llm/intro.qmd
index 9875be0..460f55a 100644
--- a/llm/intro.qmd
+++ b/llm/intro.qmd
@@ -1,5 +1,5 @@
 ---
-title: "Introduction"
+title: "Introduction to LLM"
 format:
   html:
     code-fold: true
diff --git a/nlp/overview.qmd b/nlp/overview.qmd
index 6a9008f..bf6e4e1 100644
--- a/nlp/overview.qmd
+++ b/nlp/overview.qmd
@@ -1,5 +1,5 @@
 ---
-title: "Overview"
+title: "Overview of NLP"
 format:
   html:
     code-fold: false