From d3c921deec8ec458889b6e39247783aab652b186 Mon Sep 17 00:00:00 2001 From: Julian Date: Sat, 30 Mar 2024 16:22:43 +0100 Subject: [PATCH] added page navigation --- _freeze/llm/intro/execute-results/html.json | 4 +- .../nlp/overview/execute-results/html.json | 4 +- _quarto.yml | 3 + docs/about/assignment.html | 20 ++- docs/about/projects.html | 20 ++- docs/about/schedule.html | 20 ++- docs/emb_exercise.html | 2 +- docs/embeddings/applications.html | 20 ++- docs/embeddings/clustering.html | 20 ++- docs/embeddings/embeddings.html | 20 ++- .../exercises/ex_emb_similarity.html | 20 ++- docs/embeddings/visualization.html | 20 ++- docs/ethics/bias.html | 20 ++- docs/ethics/data_privacy.html | 16 +- docs/index.html | 16 +- .../exercises/ex_gpt_parameterization.html | 20 ++- docs/llm/gpt.html | 20 ++- docs/llm/intro.html | 28 ++- docs/llm/parameterization.html | 20 ++- docs/nlp/exercises/ex_fuzzy_matching.html | 20 ++- docs/nlp/exercises/ex_tfidf.html | 20 ++- docs/nlp/exercises/ex_tokenization.html | 20 ++- docs/nlp/exercises/ex_word_matching.html | 20 ++- docs/nlp/fuzzy_matching.html | 22 ++- docs/nlp/overview.html | 36 ++-- docs/nlp/statistical_text_analysis.html | 20 ++- docs/nlp/tokenization.html | 20 ++- docs/resources.html | 12 +- docs/search.json | 166 +++++++++++++++--- docs/test.html | 12 +- docs/test_viz.html | 2 +- llm/intro.qmd | 2 +- nlp/overview.qmd | 2 +- 33 files changed, 569 insertions(+), 118 deletions(-) diff --git a/_freeze/llm/intro/execute-results/html.json b/_freeze/llm/intro/execute-results/html.json index 6528817..0849fa2 100644 --- a/_freeze/llm/intro/execute-results/html.json +++ b/_freeze/llm/intro/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "96d9b1c6acd0d48a4f59c1b1bb2c4b59", + "hash": "8dca43e41c2fe7ee9d13d53651464ebf", "result": { "engine": "jupyter", - "markdown": "---\ntitle: Introduction\nformat:\n html:\n code-fold: true\n---\n\n", + "markdown": "---\ntitle: Introduction to LLM\nformat:\n html:\n code-fold: true\n---\n\n", "supporting": [ "intro_files" ], diff --git a/_freeze/nlp/overview/execute-results/html.json b/_freeze/nlp/overview/execute-results/html.json index d8dddb5..f49bb38 100644 --- a/_freeze/nlp/overview/execute-results/html.json +++ b/_freeze/nlp/overview/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "cf8a3908662cdc94eef520942522a001", + "hash": "511ca1f1badee45c381c211cb06bcc5e", "result": { "engine": "jupyter", - "markdown": "---\ntitle: Overview\nformat:\n html:\n code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n
\nCode example\n\n::: {#5d217b2c .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The | POS Tag: DET \nToken: sun | POS Tag: NOUN \nToken: sets | POS Tag: VERB \nToken: behind | POS Tag: ADP \nToken: the | POS Tag: DET \nToken: mountains | POS Tag: NOUN \nToken: , | POS Tag: PUNCT\nToken: casting | POS Tag: VERB \nToken: a | POS Tag: DET \nToken: golden | POS Tag: ADJ \nToken: glow | POS Tag: NOUN \nToken: across | POS Tag: ADP \nToken: the | POS Tag: DET \nToken: sky | POS Tag: NOUN \nToken: . | POS Tag: PUNCT\n```\n:::\n:::\n\n\n
\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n
\nCode example\n\n::: {#9a16816c .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple | Label: ORG\nEntity: U.K. | Label: GPE\nEntity: London | Label: GPE\nEntity: $1 billion | Label: MONEY\n```\n:::\n:::\n\n\n
\n\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n
\nCode example\n\n::: {#2e03e0b7 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n sentiment_label = \"Negative\"\nelse:\n sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n
\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n
\nCode example\n\n::: {#bfae8fa2 .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n \"I love this product!\",\n \"This product is terrible.\",\n \"Great service, highly recommended.\",\n \"I had a bad experience with this company.\",\n]\nlabels = [\n \"Positive\",\n \"Negative\",\n \"Positive\",\n \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n
\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n", + "markdown": "---\ntitle: Overview of NLP\nformat:\n html:\n code-fold: false\n---\n\n## A short history of Natural Language Processing\n\nThe field of Natural Language Processing (NLP) has undergone a remarkable evolution, spanning decades and driven by the convergence of computer science, artificial intelligence, and linguistics. \nFrom its nascent stages to its current state, NLP has witnessed transformative shifts, propelled by groundbreaking research and technological advancements. \nToday, it stands as a testament to humanity's quest to bridge the gap between human language and machine comprehension. \nThe journey through NLP's history offers profound insights into its trajectory and the challenges encountered along the way.\n\n#### Early Days: Rule-Based Approaches (1960s-1980s)\nIn its infancy, NLP relied heavily on rule-based approaches, where researchers painstakingly crafted sets of linguistic rules to analyze and manipulate text. \nThis period, spanning from the 1960s to the 1980s, saw significant efforts in tasks such as part-of-speech tagging, named entity recognition, and machine translation. \nHowever, rule-based systems struggled to cope with the inherent ambiguity and complexity of natural language. \nDifferent languages presented unique challenges, necessitating the development of language-specific rulesets. Despite their limitations, rule-based approaches laid the groundwork for future advancements in NLP.\n\n#### Rise of Statistical Methods (1990s-2000s)\nThe 1990s marked a pivotal shift in NLP with the emergence of statistical methods as a viable alternative to rule-based approaches. \nResearchers began harnessing the power of statistics and probabilistic models to analyze large corpora of text. \nTechniques like Hidden Markov Models and Conditional Random Fields gained prominence, offering improved performance in tasks such as text classification, sentiment analysis, and information extraction. \nStatistical methods represented a departure from rigid rule-based systems, allowing for greater flexibility and adaptability. \nHowever, they still grappled with the nuances and intricacies of human language, particularly in handling ambiguity and context.\n\n#### Machine Learning Revolution (2010s)\nThe advent of the 2010s witnessed a revolution in NLP fueled by the rise of machine learning, particularly deep learning. \nWith the availability of vast amounts of annotated data and unprecedented computational power, researchers explored neural network architectures tailored for NLP tasks. \nRecurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) gained traction, demonstrating impressive capabilities in tasks such as sentiment analysis, text classification, and sequence generation. \nThese models represented a significant leap forward in NLP, enabling more nuanced and context-aware language processing.\n\n#### Large Language Models: Transformers (2010s-Present)\nThe latter half of the 2010s heralded the rise of large language models, epitomized by the revolutionary Transformer architecture.\nPowered by self-attention mechanisms, Transformers excel at capturing long-range dependencies in text and generating coherent and contextually relevant responses. \nPre-trained on massive text corpora, models like GPT (Generative Pretrained Transformer) have achieved unprecedented performance across a wide range of NLP tasks, including machine translation, question-answering, and language understanding. \nTheir ability to leverage vast amounts of data and learn intricate patterns has propelled NLP to new heights of sophistication.\n\n#### Challenges in NLP\nDespite the remarkable progress, NLP grapples with a myriad of challenges that continue to shape its trajectory:\n\n- **Ambiguity of Language**: The inherent ambiguity of natural language poses significant challenges in accurately interpreting meaning, especially in tasks like sentiment analysis and named entity recognition.\n \n- **Different Languages**: NLP systems often struggle with languages other than English, facing variations in syntax, semantics, and cultural nuances, requiring tailored approaches for each language.\n\n- **Bias**: NLP models can perpetuate biases present in the training data, leading to unfair or discriminatory outcomes, particularly in tasks like text classification and machine translation.\n\n- **Importance of Context**: Understanding context is paramount for NLP tasks, as the meaning of words and phrases can vary drastically depending on the surrounding context.\n\n- **World Knowledge**: NLP systems lack comprehensive world knowledge, hindering their ability to understand references, idioms, and cultural nuances embedded in text.\n\n- **Common Sense Reasoning**: Despite advancements, NLP models still struggle with common sense reasoning, often producing nonsensical or irrelevant responses in complex scenarios.\n\n#### Conclusion\nThe journey of NLP from rule-based systems to large language models has been marked by remarkable progress and continuous innovation. \nWhile challenges persist, ongoing research and development efforts hold the promise of overcoming these obstacles and unlocking new frontiers in language understanding. \nAs NLP continues to evolve, driven by advancements in machine learning and computational resources, it brings us closer to the realization of truly intelligent systems capable of understanding and interacting with human language in profound ways.\n\n\n## Classic NLP tasks/applications\n\n#### Part-of-Speech Tagging\nPart-of-speech tagging involves labeling each word in a sentence with its corresponding grammatical category, such as noun, verb, adjective, or adverb. \nFor example, in the sentence \"The cat is sleeping,\" part-of-speech tagging would identify \"cat\" as a noun and \"sleeping\" as a verb. \nThis task is crucial for many NLP applications, including language understanding, information retrieval, and machine translation. \nAccurate part-of-speech tagging lays the foundation for deeper linguistic analysis and improves the performance of downstream tasks.\n\n
\nCode example\n\n::: {#d87df8e1 .cell execution_count=1}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"The sun sets behind the mountains, casting a golden glow across the sky.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Find the maximum length of token text and POS tag\nmax_token_length = max(len(token.text) for token in doc)\nmax_pos_length = max(len(token.pos_) for token in doc)\n\n# Print each token along with its part-of-speech tag\nfor token in doc:\n print(f\"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nToken: The | POS Tag: DET \nToken: sun | POS Tag: NOUN \nToken: sets | POS Tag: VERB \nToken: behind | POS Tag: ADP \nToken: the | POS Tag: DET \nToken: mountains | POS Tag: NOUN \nToken: , | POS Tag: PUNCT\nToken: casting | POS Tag: VERB \nToken: a | POS Tag: DET \nToken: golden | POS Tag: ADJ \nToken: glow | POS Tag: NOUN \nToken: across | POS Tag: ADP \nToken: the | POS Tag: DET \nToken: sky | POS Tag: NOUN \nToken: . | POS Tag: PUNCT\n```\n:::\n:::\n\n\n
\n\n\n\n#### Named Entity Recognition\nNamed Entity Recognition (NER) involves identifying and classifying named entities in text, such as people, organizations, locations, dates, and more. For instance, in the sentence \"Apple is headquartered in Cupertino,\" NER would identify \"Apple\" as an organization and \"Cupertino\" as a location. \nNER is essential for various applications, including information retrieval, document summarization, and question-answering systems. Accurate NER enables machines to extract meaningful information from unstructured text data.\n\n
\nCode example\n\n::: {#acc47b23 .cell execution_count=2}\n``` {.python .cell-code}\nimport spacy\n\n# Load the English language model\nnlp = spacy.load(\"en_core_web_sm\")\n\n# Example text\ntext = \"Apple is considering buying a startup called U.K. based company in London for $1 billion.\"\n\n# Process the text with spaCy\ndoc = nlp(text)\n\n# Print each token along with its Named Entity label\nfor ent in doc.ents:\n print(f\"Entity: {ent.text.ljust(20)} | Label: {ent.label_}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEntity: Apple | Label: ORG\nEntity: U.K. | Label: GPE\nEntity: London | Label: GPE\nEntity: $1 billion | Label: MONEY\n```\n:::\n:::\n\n\n
\n\n\n\n#### Machine Translation\nMachine Translation (MT) aims to automatically translate text from one language to another, facilitating communication across language barriers. \nFor example, translating a sentence from English to Spanish or vice versa. \nMT systems utilize sophisticated algorithms and linguistic models to generate accurate translations while preserving the original meaning and nuances of the text. \nMT has numerous practical applications, including cross-border communication, localization of software and content, and global commerce.\n\n#### Sentiment Analysis\nSentiment Analysis involves analyzing text data to determine the sentiment or opinion expressed within it, such as positive, negative, or neutral. \nFor instance, analyzing product reviews to gauge customer satisfaction or monitoring social media sentiment towards a brand. \nSentiment Analysis employs machine learning algorithms to classify text based on sentiment, enabling businesses to understand customer feedback, track public opinion, and make data-driven decisions.\n\n
\nCode example\n\n::: {#02b6acb2 .cell execution_count=3}\n``` {.python .cell-code}\n# python -m textblob.download_corpora\n\nfrom textblob import TextBlob\n\n# Example text\ntext = \"I love TextBlob! It's an amazing library for natural language processing.\"\n\n# Perform sentiment analysis with TextBlob\nblob = TextBlob(text)\nsentiment_score = blob.sentiment.polarity\n\n# Determine sentiment label based on sentiment score\nif sentiment_score > 0:\n sentiment_label = \"Positive\"\nelif sentiment_score < 0:\n sentiment_label = \"Negative\"\nelse:\n sentiment_label = \"Neutral\"\n\n# Print sentiment analysis results\nprint(f\"Text: {text}\")\nprint(f\"Sentiment Score: {sentiment_score:.2f}\")\nprint(f\"Sentiment Label: {sentiment_label}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: I love TextBlob! It's an amazing library for natural language processing.\nSentiment Score: 0.44\nSentiment Label: Positive\n```\n:::\n:::\n\n\n
\n\n\n#### Text Classification\nText Classification is the task of automatically categorizing text documents into predefined categories or classes. \nFor example, classifying news articles into topics like politics, sports, or entertainment. \nText Classification is widely used in various domains, including email spam detection, sentiment analysis, and content categorization. \nIt enables organizations to organize and process large volumes of textual data efficiently, leading to improved decision-making and information retrieval.\n\n
\nCode example\n\n::: {#356c23f3 .cell execution_count=4}\n``` {.python .cell-code}\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.svm import SVC\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import LabelEncoder\n\n# Example labeled dataset\ntexts = [\n \"I love this product!\",\n \"This product is terrible.\",\n \"Great service, highly recommended.\",\n \"I had a bad experience with this company.\",\n]\nlabels = [\n \"Positive\",\n \"Negative\",\n \"Positive\",\n \"Negative\",\n]\n\n# Create a TF-IDF vectorizer\nvectorizer = TfidfVectorizer()\n\n# Encode labels as integers\nlabel_encoder = LabelEncoder()\nencoded_labels = label_encoder.fit_transform(labels)\n\n# Create a pipeline with TF-IDF vectorizer and SVM classifier\nclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))\n\n# Train the classifier\nclassifier.fit(texts, encoded_labels)\n\n# Example test text\ntest_text = \"This product exceeded my expectations.\"\n\n# Predict the label for the test text\npredicted_label = classifier.predict([test_text])[0]\n\n# Decode the predicted label back to original label\npredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]\n\n# Print the predicted label\nprint(f\"Text: {test_text}\")\nprint(f\"Predicted Label: {predicted_label_text}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nText: This product exceeded my expectations.\nPredicted Label: Negative\n```\n:::\n:::\n\n\n
\n\n\n#### Information Extraction\nInformation Extraction involves automatically extracting structured information from unstructured text data, such as documents, articles, or web pages. \nThis includes identifying entities, relationships, and events mentioned in the text. \nFor example, extracting names of people mentioned in news articles or detecting company acquisitions from financial reports. \nInformation Extraction plays a crucial role in tasks like knowledge base construction, data integration, and business intelligence.\n\n#### Question-Answering\nQuestion-Answering (QA) systems aim to automatically generate accurate answers to user queries posed in natural language. \nThese systems comprehend the meaning of questions and retrieve relevant information from a knowledge base or text corpus to provide precise responses. \nFor example, answering factual questions like \"Who is the president of the United States?\" or \"What is the capital of France?\". \nQA systems are essential for information retrieval, virtual assistants, and educational applications, enabling users to access information quickly and efficiently.\n\n", "supporting": [ "overview_files" ], diff --git a/_quarto.yml b/_quarto.yml index 5dceaa2..2ac1b49 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -10,6 +10,9 @@ format: theme: cosmo website: + page-navigation: true + back-to-top-navigation: true + navbar: tools: - icon: github diff --git a/docs/about/assignment.html b/docs/about/assignment.html index 15c1087..04f19b8 100644 --- a/docs/about/assignment.html +++ b/docs/about/assignment.html @@ -30,6 +30,8 @@ + + @@ -180,7 +182,7 @@