diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index 45a24c6..a6ac663 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "488d3d8551c5374ae9788e59b60b9192", + "hash": "3efa4ac4acbec4b77441388f2d04213b", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Seminar: Large Language Models\"\nformat:\n html:\n code-fold: true\njupyter: python3\n---\n\n\n\n\n![Robot by DALL-E](assets/dall-e-robot.jpeg){width=350 fig-align=\"left\"}\n\n\nHello and welcome to the seminar **Large Language Models** in the winter semester of 2024/25 at the University of Applied Sciences in Münster.\nOn this website, you will find all the information you need about and around the seminar. \n\n\n### About the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. \nIn the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. \nWe will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models and their applications themselves.\nAlready during the theory, we will make sure to code in `Python` alongside all the concepts and see coding examples to get familiar with it.\n\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. \nYou will solve a few (coding) exercises around all the topics yourselves. \nTo get everyone fired up as quickly as possible, we have prepared a [Jupyterlab](https://jupyter.fh-muenster.de/){.external} environment that everyone can use for the solution of the exercises.\n\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in our own projects.\nAll participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model.\nMore information and ideas for these projects can be found [here](about/projects.qmd).\n\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. \nHowever, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\n### How to use this script\nThis script is meant to give a comprehensive overview right away from the start.\nFeel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. \nAll exercises that we will solve together in this seminar are contained in this script as well, *including their solution*. \nFor all exercises, the (or more precisely, a) solution is hidden behind a *Show solution* button. \nFor the sake of your own learning process, try to solve the exercises yourselves first!\nIf you're stuck, ask for a quick hint. \nIf you still feel like you do not advance any more, *then* check out the solution and try to understand it. \nThe solution of the exercises is not part of the evaluation, so it's really for your own progress!\nA \"summary\" of all exercises can be found [here](/resources/exercises.qmd).\n\n:::::: {.callout-important}\nA small disclaimer: This script is not (yet) ridiculously comprehensive.\nAnd, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note. \n:::\n\n\n### What you will learn\nAs this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. \nInstead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. \nThe content can already be seen from the navigation bar, but here's a quick walk-through.\nMore precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis.\nAfterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI.\nEventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models.\nHave fun! \n\n\n### The schedule\nTBD\n\n#### After the seminar (~1d):\n - Prototype refinement\n - Code review & documentation\n - Refine business case & potential applications of prototype\n - Reflections & lessons learned\n→ *Hand in 2-page summary*\n\n\n### Evaluation\nAll seminar participants will be evaluated in the following way.\n\n- Your presentation on the last day of the seminar: 25%\n- Your prototype: 35%\n- Your summary: 25%\n- Your activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. \nThis seminar is designed for everyone to participate, so the more you do, the more fun it will be! \n\n#### What is the summary? \nAs mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings.\nThe summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information:\n- What is your prototype? What can I do? \n- What could be a business case for your prototype, or where can it be applied?\n- What are current limitations of your prototype and how could you overcome them?\n- What have been your main learnings during the creation of your prototype (and/or) the seminar itself?\n\nJust hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.\n\n\n:::::: {.callout-note}\nHas this seminar been created with a little help of language models? Absolutely, why wouldn't it? :)\n:::\n\n", + "markdown": "---\ntitle: \"Seminar: Large Language Models\"\nformat:\n html:\n code-fold: true\njupyter: python3\n---\n\n\n\n\n![Robot by DALL-E](assets/dall-e-robot.jpeg){width=350 fig-align=\"left\"}\n\n\nHello and welcome to the seminar **Large Language Models** in the winter semester of 2024/25 at the University of Applied Sciences in Münster.\nOn this website, you will find all the information you need about and around the seminar. \n\n\n### About the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. \nIn the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. \nWe will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models and their applications themselves.\nAlready during the theory, we will make sure to code in `Python` alongside all the concepts and see coding examples to get familiar with it.\n\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. \nYou will solve a few (coding) exercises around all the topics yourselves. \nTo get everyone fired up as quickly as possible, we have prepared a [Jupyterlab](https://jupyter.fh-muenster.de/){.external} environment that everyone can use for the solution of the exercises.\n\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in our own projects.\nAll participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model.\nMore information and ideas for these projects can be found [here](about/projects.qmd).\n\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. \nHowever, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\n### How to use this script\nThis script is meant to give a comprehensive overview right away from the start.\nFeel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. \nAll exercises that we will solve together in this seminar are contained in this script as well, *including their solution*. \nFor all exercises, the (or more precisely, a) solution is hidden behind a *Show solution* button. \nFor the sake of your own learning process, try to solve the exercises yourselves first!\nIf you're stuck, ask for a quick hint. \nIf you still feel like you do not advance any more, *then* check out the solution and try to understand it. \nThe solution of the exercises is not part of the evaluation, so it's really for your own progress!\nA \"summary\" of all exercises can be found [here](/resources/exercises.qmd).\n\n:::::: {.callout-important}\nA small disclaimer: This script is not (yet) ridiculously comprehensive.\nAnd, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note. \n:::\n\n\n### What you will learn\nAs this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. \nInstead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. \nThe content can already be seen from the navigation bar, but here's a quick walk-through.\nMore precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis.\nAfterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI.\nEventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models.\nHave fun! \n\n\n### A rough schedule\n- Introduction & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria\n- Introduction to the general topic & Python & Jupyter \n- Introduction NLP (tokenization, matching, statistical analysis)\n- Introduction to LLM & OpenAI API\n- Prompting\n- Embeddings\n- Advanced GPT topics (image data, parameterization, tool calling)\n- Real-world examples of applications (& implementation) & limitations\n- *App concept & Group brainstorming*\n- *Project work on prototype & mentoring*\n- *Project presentations* & reflections on the seminar\n- Backup: Ethics and data privacy \n\n\n#### After the seminar (~1d):\n - Prototype refinement\n - Code review & documentation\n - Refine business case & potential applications of prototype\n - Reflections & lessons learned\n→ *Hand in 2-page summary*\n\n\n### Evaluation\nAll seminar participants will be evaluated in the following way.\n\n- Your presentation on the last day of the seminar: 25%\n- Your prototype: 35%\n- Your summary: 25%\n- Your activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. \nThis seminar is designed for everyone to participate, so the more you do, the more fun it will be! \n\n#### What is the summary? \nAs mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings.\nThe summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information:\n- What is your prototype? What can I do? \n- What could be a business case for your prototype, or where can it be applied?\n- What are current limitations of your prototype and how could you overcome them?\n- What have been your main learnings during the creation of your prototype (and/or) the seminar itself?\n\nJust hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.\n\n\n:::::: {.callout-note}\nHas this seminar been created with a little help of language models? Absolutely, why wouldn't it? :)\n:::\n\n", "supporting": [ - "index_files/figure-html" + "index_files" ], "filters": [], "includes": {} diff --git a/_freeze/llm/gpt_api/execute-results/html.json b/_freeze/llm/gpt_api/execute-results/html.json index c30fd5c..18af113 100644 --- a/_freeze/llm/gpt_api/execute-results/html.json +++ b/_freeze/llm/gpt_api/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "d5b6718e4746c52206e82e5d1124cb12", + "hash": "9699c6de3e7ef87cd4298dc99e46caa3", "result": { "engine": "jupyter", - "markdown": "---\ntitle: The OpenAI API\nformat:\n html:\n code-fold: false\n---\n\n::: {.callout-note}\nResource: [OpenAI API docs](https://platform.openai.com/docs/introduction){.external}\n:::\n\n\nLet's finally get started working with GPT. \nIn this seminar, we will use the OpenAI API to work with, but there are many alternatives out there. \nWe have collected a few in the [resources](../resources/apis.qmd).\n\n\n### Authentication\n\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. \nOnce you've registered, you'll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. \nThis key is essential for ensuring secure communication between your application and OpenAI's servers. \nWithout proper authentication, your requests will be rejected.\nYou can create your own account, but for the seminar we will provide the client with the credential within the University's [Jupyterlab](https://jupyter.fh-muenster.de/){.external}.\n\n::: {#168232d8 .cell execution_count=1}\n``` {.python .cell-code}\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n```\n:::\n\n\n### Requesting Completions\n\nMost interaction with GPT and other models consist in generating completions for prompts, i.e., providing some text with instructions and letting the language model **complete** the text one token after the other as seen [here](../llm/intro.qmd).\n\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. \nThese requests are structured to include various parameters that guide the generation of text completions. \nThe most fundamental parameter is the prompt text, which sets the context for the completion. \nAdditionally, you can specify the desired model configuration, such as the engine to use (e.g., \"gpt-4\"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n::: {#8533d111 .cell execution_count=2}\n``` {.python .cell-code}\n# creating a completion\nchat_completion = client.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"How old is the earth?\",\n }\n ],\n model=\"gpt-3.5-turbo\" # choose the model\n)\n```\n:::\n\n\n### Processing\n\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. \nThis process involves analyzing the context provided by the prompt and leveraging the model's pre-trained knowledge to generate text completions. \nThe model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. \nBy drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n### Response\n\nAfter processing your request, the OpenAI API returns a response containing the generated text completions. \nDepending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as the amount of token processed in the request, the reason why the model stopped the answer etc. \nThis response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application's behavior accordingly.\nLet's check it out briefly, before you explore the response object more in-depth in your next exercise.\n\n::: {#4968b9cd .cell execution_count=3}\n``` {.python .cell-code}\n# check out the type of the response\n\nprint(f\"Response object type: {type(chat_completion)}\") # a ChatCompletion object\n\n# print the message we want\nprint(f\"\\nResponse message: {chat_completion.choices[0].message.content}\")\n\n# check the tokens used \nprint(f\"\\nTotal tokens used: {chat_completion.usage.total_tokens}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nResponse object type: \n\nResponse message: The Earth is approximately 4.54 billion years old.\n\nTotal tokens used: 25\n```\n:::\n:::\n\n\n### Error Handling\n\nWhile interacting with the OpenAI API (or any API for that matter), it's crucial to implement some robust error handling mechanisms to manage any potential issues that may arise. \nThe kind of classic errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. \nBut for language models in particular, there are plenty more problems that can arise simply involving the answer we get from the model. \nSome examples are requests involving explicit language or content or restricted content etc. which are typically blocked by the API.\nOther times it might simply happen that a model does not respond in a way you expected, for example, just repeating your input instead of responding properly, or not responding in the format you requested. \nWhenever we are using language model for applications, we need to be aware of this and implement the right measures to handle these situations. \n\n", + "markdown": "---\ntitle: \"The OpenAI API\"\nformat:\n html:\n code-fold: false\njupyter: python3\n---\n\n\n\n\n::: {.callout-note}\nResource: [OpenAI API docs](https://platform.openai.com/docs/introduction){.external}\n:::\n\n\nLet's finally get started working with GPT. \nIn this seminar, we will use the OpenAI API to work with, but there are many alternatives out there. \nWe have collected a few in the [resources](../resources/apis.qmd).\n\n\n### Authentication\n\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. \nOnce you've registered, you'll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. \nThis key is essential for ensuring secure communication between your application and OpenAI's servers. \nWithout proper authentication, your requests will be rejected.\nYou can create your own account, but for the seminar we will provide the client with the credential within the University's [Jupyterlab](https://jupyter.fh-muenster.de/){.external}.\n\n::: {#9f22d549 .cell execution_count=1}\n``` {.python .cell-code}\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n```\n:::\n\n\n### Requesting Completions\n\nMost interaction with GPT and other models consist in generating completions for prompts, i.e., providing some text with instructions and letting the language model **complete** the text one token after the other as seen [here](../llm/intro.qmd).\n\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. \nThese requests are structured to include various parameters that guide the generation of text completions. \nThe most fundamental parameter is the prompt text, which sets the context for the completion. \nAdditionally, you can specify the desired model configuration, such as the engine to use (e.g., \"gpt-4\"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity ([here](../llm/parameterization.qmd)).\n\n::: {#284368a4 .cell execution_count=2}\n``` {.python .cell-code}\n# creating a completion\nchat_completion = client.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"How old is the earth?\",\n }\n ],\n model=\"gpt-3.5-turbo\" # choose the model\n)\n```\n:::\n\n\n### Processing\n\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. \nThis process involves analyzing the context provided by the prompt and leveraging the model's pre-trained knowledge to generate text completions. \nThe model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. \nBy drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n### Response\n\nAfter processing your request, the OpenAI API returns a response containing the generated text completions. \nDepending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as the amount of token processed in the request, the reason why the model stopped the answer etc. \nThis response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application's behavior accordingly.\nLet's check it out briefly, before you explore the response object more in-depth in your next exercise.\n\n::: {#8930d3ad .cell execution_count=3}\n``` {.python .cell-code}\n# check out the type of the response\n\nprint(f\"Response object type: {type(chat_completion)}\") # a ChatCompletion object\n\n# print the message we want\nprint(f\"\\nResponse message: {chat_completion.choices[0].message.content}\")\n\n# check the tokens used \nprint(f\"\\nTotal tokens used: {chat_completion.usage.total_tokens}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nResponse object type: \n\nResponse message: The Earth is estimated to be around 4.5 billion years old.\n\nTotal tokens used: 28\n```\n:::\n:::\n\n\n### Error Handling\n\nWhile interacting with the OpenAI API (or any API for that matter), it's crucial to implement some robust error handling mechanisms to manage any potential issues that may arise. \nThe kind of classic errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. \nBut for language models in particular, there are plenty more problems that can arise simply involving the answer we get from the model. \nSome examples are requests involving explicit language or content or restricted content etc. which are typically blocked by the API.\nOther times it might simply happen that a model does not respond in a way you expected, for example, just repeating your input instead of responding properly, or not responding in the format you requested. \nWhenever we are using language model for applications, we need to be aware of this and implement the right measures to handle these situations. \n\n", "supporting": [ - "gpt_api_files" + "gpt_api_files/figure-html" ], "filters": [], "includes": {} diff --git a/_freeze/python_intro/overview/execute-results/html.json b/_freeze/python_intro/overview/execute-results/html.json index e3342a7..1f8fbd5 100644 --- a/_freeze/python_intro/overview/execute-results/html.json +++ b/_freeze/python_intro/overview/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "517a467b7ba488cc7f7db6a23a802594", + "hash": "84c74b90f33e3375e1bbae10aeeb1b45", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Introduction to Python\"\nformat:\n html:\n code-fold: false\njupyter: python3\n---\n\n\n\n\n# TODO: Some introduction\n\n\n# Installing Python on Windows and macOS\n\n### Installing Python on Windows\n\n1. **Download the Installer:**\n - Go to the [official Python website](https://www.python.org/downloads/).\n - Click on the “Download Python” button. This will download the latest version for Windows.\n\n2. **Run the Installer:**\n - Locate the downloaded `.exe` file in your downloads folder and double-click it to run the installer.\n - **Important:** Check the box that says “Add Python to PATH” at the bottom of the installation window.\n - Choose \"Install Now\" for a standard installation or \"Customize installation\" for more options.\n\n3. **Verify Installation:**\n - Open the Command Prompt by searching for `cmd` in the Start menu.\n - Type `python --version` and press Enter. You should see the installed version of Python.\n\n### Installing Python on macOS\n\n1. **Download the Installer:**\n - Visit the [official Python website](https://www.python.org/downloads/).\n - Click on the “Download Python” button, which will get the latest version for macOS.\n\n2. **Run the Installer:**\n - Locate the downloaded `.pkg` file and double-click it to launch the installer.\n - Follow the on-screen instructions to complete the installation.\n\n3. **Verify Installation:**\n - Open the Terminal application (you can find it using Spotlight Search by pressing `Command + Space` and typing \"Terminal\").\n - Type `python3 --version` and press Enter. You should see the installed version of Python.\n\n### Additional Setup (Optional)\n\nAfter installing Python, it’s a good idea to install **pip**, Python's package manager, which is included by default in the latest Python versions. You can use pip to install additional libraries and packages as needed.\n\nFor Windows:\n- To install a package, open Command Prompt and type:\n ```bash\n pip install package_name\n ```\n\nFor macOS:\n- Open Terminal and type:\n ```bash\n pip3 install package_name\n ```\n\nThat’s it! You’re now ready to start programming in Python. \n\n\n# Using VSCode\n\nVisual Studio Code (VSCode) is a powerful and popular code editor developed by Microsoft. \nIt is highly extensible, lightweight, and supports a wide range of programming languages, including Python. \nWith its robust features such as IntelliSense, debugging capabilities, and integrated terminal, VSCode is an excellent choice for Python development.\n\n### Getting Started\n\nTo start using Python in VSCode, follow these steps:\n\n1. **Install VSCode**: If you haven’t already, download and install Visual Studio Code from [the official website](https://code.visualstudio.com/).\n\n2. **Install the Python Extension**: \n - Open VSCode.\n - Go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side or pressing `Ctrl + Shift + X`.\n - Search for \"Python\" and install the official extension provided by Microsoft. This extension adds rich support for Python development, including IntelliSense and linting.\n\n3. **Select the Python Interpreter**:\n - After installing the extension, you need to select the Python interpreter. Press `Ctrl + Shift + P` to open the Command Palette, then type and select **Python: Select Interpreter**.\n - Choose the interpreter that matches your Python installation.\n\n### Writing and Running Python Code\n\n1. **Create a New File**: \n - You can create a new Python file by clicking on `File > New File` or pressing `Ctrl + N`. \n - Save it with a `.py` extension (e.g., `script.py`).\n\n2. **Write Your Code**: \n - Begin writing your Python code in the editor. For example:\n ```python\n print(\"Hello, VSCode!\")\n ```\n\n3. **Run Your Code**:\n - There are multiple ways to run your Python code:\n - **Using the Terminal**: Open the integrated terminal by selecting `View > Terminal` or pressing `` Ctrl + ` `` (backtick). In the terminal, type `python script.py` (replacing `script.py` with your file name) to execute the script.\n - **Run Code Action**: You can also run your code directly from the editor by clicking the play button (▶️) that appears above the code or using the shortcut `Shift + Enter`.\n\n### Debugging in VSCode\n\nVSCode provides powerful debugging features to help you troubleshoot your code:\n\n1. **Set Breakpoints**: Click in the gutter next to the line numbers to set breakpoints where you want the execution to pause.\n\n2. **Start Debugging**: Press `F5` or go to the Debug view by clicking on the Debug icon in the Activity Bar. \n3. You can then start debugging your Python script. The Debug Console will allow you to inspect variables, step through code, and evaluate expressions.\n\n### Using Extensions and Features\n\nVSCode has a wide variety of extensions to enhance your Python development experience:\n\n- **Linting**: The Python extension includes linting capabilities that help you catch errors and enforce coding standards. You can enable it in the settings (`Settings > Python > Linting`).\n\n- **IntelliSense**: Take advantage of IntelliSense for code suggestions, autocompletions, and quick documentation. Simply start typing, and relevant suggestions will appear.\n\n- **Jupyter Notebooks**: If you want to work with Jupyter Notebooks directly in VSCode, install the Jupyter extension. This allows you to create, edit, and run notebooks seamlessly.\n\n\n---\n\n# Jupyter Notebooks\nJupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. \nIt is widely used in data science, machine learning, and scientific computing, making it a versatile tool for both beginners and advanced users.\nIn a Jupyter Notebook, you can write and execute code in a variety of programming languages, including Python. \nIt provides an interactive environment where you can document your thought process alongside your code, visualize data, and quickly test ideas without the need for a complete development setup.\n\n### Getting Started\n\nOnce you have Jupyter Notebook up and running, you will typically start by opening a new notebook. Here are the key components and features of Jupyter Notebook to help you navigate and utilize it effectively:\n\n\n\n### The User Interface\n\nUpon launching Jupyter Notebook, you’ll be greeted with a dashboard showing your files and notebooks. You can create a new notebook by selecting \"New\" and then choosing the desired kernel (like Python 3).\n\n- **Notebook Cells:** The main area consists of cells where you can write your code or text. There are two main types of cells:\n - **Code Cells:** Where you write and execute code.\n - **Markdown Cells:** Where you can write formatted text, including headers, lists, and links.\n\n\n\n### Writing and Executing Code\n\nTo write code in a code cell:\n\n1. Click on a cell to make it active.\n2. Type your code into the cell.\n\n#### Example:\n\n::: {#54126b1e .cell execution_count=1}\n``` {.python .cell-code}\nprint(\"Hello, Jupyter!\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nHello, Jupyter!\n```\n:::\n:::\n\n\nTo execute the code, you can either click the \"Run\" button in the toolbar or press `Shift + Enter`. This will run the code and display the output directly below the cell.\n\n\n\n### Using Markdown for Documentation\n\nMarkdown cells allow you to document your code using plain text. You can format your text using Markdown syntax. \n\n#### Example:\nTo create a markdown cell with a header, simply type:\n```markdown\n# My Jupyter Notebook\n```\nAfter running the cell, it will render as a formatted header.\n\nYou can also create bullet points, numbered lists, links, and more:\n```markdown\n## Key Features\n- Interactive coding\n- Inline visualizations\n- Rich text support\n```\n\n\n\n### Visualization and Output\n\nJupyter Notebook supports various visualization libraries like Matplotlib, Seaborn, and Plotly, allowing you to create plots and graphs inline.\n\n#### Example:\n\n::: {#765a42d6 .cell execution_count=2}\n``` {.python .cell-code}\nimport matplotlib.pyplot as plt\n\n# Sample data\nx = [1, 2, 3, 4, 5]\ny = [2, 3, 5, 7, 11]\n\n# Creating a plot\nplt.plot(x, y)\nplt.title(\"Sample Plot\")\nplt.xlabel(\"X-axis\")\nplt.ylabel(\"Y-axis\")\nplt.show()\n```\n\n::: {.cell-output .cell-output-display}\n![](overview_files/figure-html/cell-3-output-1.png){width=585 height=449}\n:::\n:::\n\n\nAfter running this code, the plot will be displayed directly beneath the code cell.\n\n\n\n### Saving and Sharing Notebooks\n\nYou can save your notebook by clicking the save icon or using the shortcut `Ctrl + S` (or `Cmd + S` on Mac). Jupyter Notebooks are saved with a `.ipynb` extension.\n\nTo share your notebook, you can export it to different formats, such as HTML or PDF, by using the \"File\" menu. You can also share the `.ipynb` file directly, which can be opened in any Jupyter environment.\n\n\n\n### Keyboard Shortcuts\n\nJupyter Notebook has many handy keyboard shortcuts that can improve your efficiency. Here are a few essential ones:\n\n- `Enter`: Edit the selected cell.\n- `Esc`: Command mode (no editing).\n- `A`: Insert a new cell above.\n- `B`: Insert a new cell below.\n- `DD`: Delete the selected cell.\n- `Z`: Undo the last cell deletion.\n- `Shift + Enter`: Run the current cell and move to the next one.\n- `Ctrl + Enter`: Run the current cell and stay in it.\n\n", + "markdown": "---\ntitle: \"Introduction to Python\"\nformat:\n html:\n code-fold: false\njupyter: python3\n---\n\n\n\n\n\n# Installing Python on Windows and macOS\n\n### Installing Python on Windows\n\n1. **Download the Installer:**\n - Go to the [official Python website](https://www.python.org/downloads/).\n - Click on the “Download Python” button. This will download the latest version for Windows.\n\n2. **Run the Installer:**\n - Locate the downloaded `.exe` file in your downloads folder and double-click it to run the installer.\n - **Important:** Check the box that says “Add Python to PATH” at the bottom of the installation window.\n - Choose \"Install Now\" for a standard installation or \"Customize installation\" for more options.\n\n3. **Verify Installation:**\n - Open the Command Prompt by searching for `cmd` in the Start menu.\n - Type `python --version` and press Enter. You should see the installed version of Python.\n\n### Installing Python on macOS\n\n1. **Download the Installer:**\n - Visit the [official Python website](https://www.python.org/downloads/).\n - Click on the “Download Python” button, which will get the latest version for macOS.\n\n2. **Run the Installer:**\n - Locate the downloaded `.pkg` file and double-click it to launch the installer.\n - Follow the on-screen instructions to complete the installation.\n\n3. **Verify Installation:**\n - Open the Terminal application (you can find it using Spotlight Search by pressing `Command + Space` and typing \"Terminal\").\n - Type `python3 --version` and press Enter. You should see the installed version of Python.\n\n\n### Additional Setup (Optional)\n\nAfter installing Python, it’s a good idea to install **pip**, Python's package manager, which is included by default in the latest Python versions. You can use pip to install additional libraries and packages as needed.\n\nFor Windows:\n- To install a package, open Command Prompt and type:\n ```bash\n pip install package_name\n ```\n\nFor macOS:\n- Open Terminal and type:\n ```bash\n pip3 install package_name\n ```\n\nThat’s it! You’re now ready to start programming in Python. \n\n\n# Using VSCode\n\nVisual Studio Code (VSCode) is a powerful and popular code editor developed by Microsoft. \nIt is highly extensible, lightweight, and supports a wide range of programming languages, including Python. \nWith its robust features such as IntelliSense, debugging capabilities, and integrated terminal, VSCode is an excellent choice for Python development.\n\n### Getting Started\n\nTo start using Python in VSCode, follow these steps:\n\n1. **Install VSCode**: If you haven’t already, download and install Visual Studio Code from [the official website](https://code.visualstudio.com/).\n\n2. **Install the Python Extension**: \n - Open VSCode.\n - Go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side or pressing `Ctrl + Shift + X`.\n - Search for \"Python\" and install the official extension provided by Microsoft. This extension adds rich support for Python development, including IntelliSense and linting.\n\n3. **Select the Python Interpreter**:\n - After installing the extension, you need to select the Python interpreter. Press `Ctrl + Shift + P` to open the Command Palette, then type and select **Python: Select Interpreter**.\n - Choose the interpreter that matches your Python installation.\n\n### Writing and Running Python Code\n\n1. **Create a New File**: \n - You can create a new Python file by clicking on `File > New File` or pressing `Ctrl + N`. \n - Save it with a `.py` extension (e.g., `script.py`).\n\n2. **Write Your Code**: \n - Begin writing your Python code in the editor. For example:\n ```python\n print(\"Hello, VSCode!\")\n ```\n\n3. **Run Your Code**:\n - There are multiple ways to run your Python code:\n - **Using the Terminal**: Open the integrated terminal by selecting `View > Terminal` or pressing `` Ctrl + ` `` (backtick). In the terminal, type `python script.py` (replacing `script.py` with your file name) to execute the script.\n - **Run Code Action**: You can also run your code directly from the editor by clicking the play button (▶️) that appears above the code or using the shortcut `Shift + Enter`.\n\n### Debugging in VSCode\n\nVSCode provides powerful debugging features to help you troubleshoot your code:\n\n1. **Set Breakpoints**: Click in the gutter next to the line numbers to set breakpoints where you want the execution to pause.\n\n2. **Start Debugging**: Press `F5` or go to the Debug view by clicking on the Debug icon in the Activity Bar. \n3. You can then start debugging your Python script. The Debug Console will allow you to inspect variables, step through code, and evaluate expressions.\n\n### Using Extensions and Features\n\nVSCode has a wide variety of extensions to enhance your Python development experience:\n\n- **Linting**: The Python extension includes linting capabilities that help you catch errors and enforce coding standards. You can enable it in the settings (`Settings > Python > Linting`).\n\n- **IntelliSense**: Take advantage of IntelliSense for code suggestions, autocompletions, and quick documentation. Simply start typing, and relevant suggestions will appear.\n\n- **Jupyter Notebooks**: If you want to work with Jupyter Notebooks directly in VSCode, install the Jupyter extension. This allows you to create, edit, and run notebooks seamlessly.\n\n\n---\n\n# Jupyter Notebooks\nJupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. \nIt is widely used in data science, machine learning, and scientific computing, making it a versatile tool for both beginners and advanced users.\nIn a Jupyter Notebook, you can write and execute code in a variety of programming languages, including Python. \nIt provides an interactive environment where you can document your thought process alongside your code, visualize data, and quickly test ideas without the need for a complete development setup.\n\n### Getting Started\n\nOnce you have Jupyter Notebook up and running, you will typically start by opening a new notebook. Here are the key components and features of Jupyter Notebook to help you navigate and utilize it effectively:\n\n\n\n### The User Interface\n\nUpon launching Jupyter Notebook, you’ll be greeted with a dashboard showing your files and notebooks. You can create a new notebook by selecting \"New\" and then choosing the desired kernel (like Python 3).\n\n- **Notebook Cells:** The main area consists of cells where you can write your code or text. There are two main types of cells:\n - **Code Cells:** Where you write and execute code.\n - **Markdown Cells:** Where you can write formatted text, including headers, lists, and links.\n\n\n\n### Writing and Executing Code\n\nTo write code in a code cell:\n\n1. Click on a cell to make it active.\n2. Type your code into the cell.\n\n#### Example:\n\n::: {#9c54e726 .cell execution_count=1}\n``` {.python .cell-code}\nprint(\"Hello, Jupyter!\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nHello, Jupyter!\n```\n:::\n:::\n\n\nTo execute the code, you can either click the \"Run\" button in the toolbar or press `Shift + Enter`. This will run the code and display the output directly below the cell.\n\n\n\n### Using Markdown for Documentation\n\nMarkdown cells allow you to document your code using plain text. You can format your text using Markdown syntax. \n\n#### Example:\nTo create a markdown cell with a header, simply type:\n```markdown\n# My Jupyter Notebook\n```\nAfter running the cell, it will render as a formatted header.\n\nYou can also create bullet points, numbered lists, links, and more:\n```markdown\n## Key Features\n- Interactive coding\n- Inline visualizations\n- Rich text support\n```\n\n\n\n### Visualization and Output\n\nJupyter Notebook supports various visualization libraries like Matplotlib, Seaborn, and Plotly, allowing you to create plots and graphs inline.\n\n#### Example:\n\n::: {#87025db7 .cell execution_count=2}\n``` {.python .cell-code}\nimport matplotlib.pyplot as plt\n\n# Sample data\nx = [1, 2, 3, 4, 5]\ny = [2, 3, 5, 7, 11]\n\n# Creating a plot\nplt.plot(x, y)\nplt.title(\"Sample Plot\")\nplt.xlabel(\"X-axis\")\nplt.ylabel(\"Y-axis\")\nplt.show()\n```\n\n::: {.cell-output .cell-output-display}\n![](overview_files/figure-html/cell-3-output-1.png){width=585 height=449}\n:::\n:::\n\n\nAfter running this code, the plot will be displayed directly beneath the code cell.\n\n\n\n### Saving and Sharing Notebooks\n\nYou can save your notebook by clicking the save icon or using the shortcut `Ctrl + S` (or `Cmd + S` on Mac). Jupyter Notebooks are saved with a `.ipynb` extension.\n\nTo share your notebook, you can export it to different formats, such as HTML or PDF, by using the \"File\" menu. You can also share the `.ipynb` file directly, which can be opened in any Jupyter environment.\n\n\n\n### Keyboard Shortcuts\n\nJupyter Notebook has many handy keyboard shortcuts that can improve your efficiency. Here are a few essential ones:\n\n- `Enter`: Edit the selected cell.\n- `Esc`: Command mode (no editing).\n- `A`: Insert a new cell above.\n- `B`: Insert a new cell below.\n- `DD`: Delete the selected cell.\n- `Z`: Undo the last cell deletion.\n- `Shift + Enter`: Run the current cell and move to the next one.\n- `Ctrl + Enter`: Run the current cell and stay in it.\n\n", "supporting": [ - "overview_files" + "overview_files/figure-html" ], "filters": [], "includes": {} diff --git a/_freeze/resources/apis/execute-results/html.json b/_freeze/resources/apis/execute-results/html.json index 7c4b5a0..ce919f8 100644 --- a/_freeze/resources/apis/execute-results/html.json +++ b/_freeze/resources/apis/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "3eccaea4c86c5372a56cf81316f62a97", + "hash": "6a1cbccadc45049a431c3116976e7b92", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"Language Model APIs\"\nformat:\n html:\n code-fold: true\njupyter: python3\n---\n\n\n\n\nCertainly! Here's the list with each title linked to its respective website, followed by the description:\n\n1. [**Google Cloud Natural Language API**](https://cloud.google.com/natural-language) Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API.\n \n2. [**Microsoft Azure Cognitive Services - Text Analytics**](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn't offer large pre-trained language models, it's suitable for various text analysis tasks and integrates well with other Azure services.\n \n3. [**IBM Watson Natural Language Understanding**](https://www.ibm.com/cloud/watson-natural-language-understanding) Watson Natural Language Understanding is part of IBM's suite of AI-powered tools. It provides advanced text analysis capabilities, including sentiment analysis, entity recognition, concept extraction, and categorization. While it doesn't offer large-scale language generation models like GPT, it's suitable for analyzing and extracting insights from text data.\n \n4. [**Hugging Face Transformers**](https://huggingface.co/transformers/) Hugging Face Transformers is an open-source library that provides a wide range of pre-trained models for natural language processing tasks. It includes popular models like GPT, BERT, and RoBERTa, along with support for fine-tuning and custom model development. While it's not a hosted API service like OpenAI, it offers powerful tools for developers to work with state-of-the-art NLP models.\n \n5. [**DeepAI Text Generation API**](https://deepai.org/machine-learning-model/text-generator) DeepAI offers a Text Generation API that allows users to generate human-like text based on a given prompt. While it may not provide the scale or versatility of GPT-like models, it's suitable for tasks such as generating short-form content, creative writing, and text completion.\n \n6. [**LLama**](https://llama.ai/) LLama is an AI platform that offers large language models and other NLP capabilities for developers and enterprises. It provides access to pre-trained models, including GPT-like architectures, as well as tools for fine-tuning models on custom datasets. LLama aims to democratize access to advanced AI technologies and support a wide range of NLP applications.\n \n7. [**Gemini**](https://gemini.cortical.io/) Gemini by Cortical.io is an AI-based platform that offers natural language understanding and text analytics capabilities. It utilizes semantic folding technology, inspired by the human brain, to analyze and process text data efficiently. While it may not provide large-scale language generation models like GPT, Gemini offers powerful tools for semantic analysis, document clustering, and similarity detection.\n\n", + "markdown": "---\ntitle: \"Language Model APIs\"\nformat:\n html:\n code-fold: true\njupyter: python3\n---\n\n\n\n\n1. [**Google Cloud Natural Language API**](https://cloud.google.com/natural-language) Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API.\n \n2. [**Microsoft Azure Cognitive Services - Text Analytics**](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn't offer large pre-trained language models, it's suitable for various text analysis tasks and integrates well with other Azure services.\n \n3. [**IBM Watson Natural Language Understanding**](https://www.ibm.com/cloud/watson-natural-language-understanding) Watson Natural Language Understanding is part of IBM's suite of AI-powered tools. It provides advanced text analysis capabilities, including sentiment analysis, entity recognition, concept extraction, and categorization. While it doesn't offer large-scale language generation models like GPT, it's suitable for analyzing and extracting insights from text data.\n \n4. [**Hugging Face Transformers**](https://huggingface.co/transformers/) Hugging Face Transformers is an open-source library that provides a wide range of pre-trained models for natural language processing tasks. It includes popular models like GPT, BERT, and RoBERTa, along with support for fine-tuning and custom model development. While it's not a hosted API service like OpenAI, it offers powerful tools for developers to work with state-of-the-art NLP models.\n \n5. [**DeepAI Text Generation API**](https://deepai.org/machine-learning-model/text-generator) DeepAI offers a Text Generation API that allows users to generate human-like text based on a given prompt. While it may not provide the scale or versatility of GPT-like models, it's suitable for tasks such as generating short-form content, creative writing, and text completion.\n \n6. [**LLama**](https://llama.ai/) LLama is an AI platform that offers large language models and other NLP capabilities for developers and enterprises. It provides access to pre-trained models, including GPT-like architectures, as well as tools for fine-tuning models on custom datasets. LLama aims to democratize access to advanced AI technologies and support a wide range of NLP applications.\n \n7. [**Gemini**](https://gemini.cortical.io/) Gemini by Cortical.io is an AI-based platform that offers natural language understanding and text analytics capabilities. It utilizes semantic folding technology, inspired by the human brain, to analyze and process text data efficiently. While it may not provide large-scale language generation models like GPT, Gemini offers powerful tools for semantic analysis, document clustering, and similarity detection.\n\n", "supporting": [ "apis_files" ], diff --git a/_freeze/resources/exercises/execute-results/html.json b/_freeze/resources/exercises/execute-results/html.json index 2a29623..22bb2c3 100644 --- a/_freeze/resources/exercises/execute-results/html.json +++ b/_freeze/resources/exercises/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "91edb69715744c4b8f8de511a89ade3c", + "hash": "c32b99a1801049bc45f684b0378f85c3", "result": { "engine": "jupyter", - "markdown": "---\ntitle: List of exercises\nformat:\n html:\n code-fold: true\n---\n\n#### Natural Language Processing \n[Exercise: Sentence tokenization](../nlp/exercises/ex_tokenization.ipynb)\n\n[Exercise: TF-IDF](../nlp/exercises/ex_tfidf.ipynb)\n\n[Exercise: Word matching](../nlp/exercises/ex_word_matching.ipynb)\n\n[Exercise: Fuzzy matching](../nlp/exercises/ex_fuzzy_matching.ipynb)\n\n\n#### Large Language Models with OpenAI\n[Exercise: OpenAI - Getting started](../llm/exercises/ex_gpt_start.ipynb)\n\n[Exercise: GPT Chatbot](../llm/exercises/ex_gpt_chatbot.ipynb)\n\n[Exercise: GPT Parameterization](../llm/exercises/ex_gpt_parameterization.ipynb)\n\n[Exercise: NER with tool calling](../llm/exercises/ex_gpt_ner_with_function_calls.ipynb)\n\n\n#### Embeddings \n[Exercise: Embedding similarity](../embeddings/exercises/ex_emb_similarity.ipynb)\n\n", + "markdown": "---\ntitle: \"List of exercises\"\nformat:\n html:\n code-fold: true\njupyter: python3\n---\n\n\n\n\n#### Introduction to Python\n[Exercise: Data types](../python_intro/exercises/data_types.ipynb)\n\n[Exercise: String manipulations](../python_intro/exercises/strings.ipynb)\n\n[Exercise: Lists and loops](../python_intro/exercises/lists_and_loops.ipynb)\n\n[Exercise: Conditional statements](../python_intro/exercises/conditional_statements.ipynb)\n\n[Exercise: Functions](../python_intro/exercises/functions.ipynb)\n\n[Exercise: Dictionaries](../python_intro/exercises/dictionaries.ipynb)\n\n[Exercise: Classes](../python_intro/exercises/classes.ipynb)\n\n\n#### Natural Language Processing \n[Exercise: Sentence tokenization](../nlp/exercises/ex_tokenization.ipynb)\n\n[Exercise: TF-IDF](../nlp/exercises/ex_tfidf.ipynb)\n\n[Exercise: Word matching](../nlp/exercises/ex_word_matching.ipynb)\n\n[Exercise: Fuzzy matching](../nlp/exercises/ex_fuzzy_matching.ipynb)\n\n\n#### Large Language Models with OpenAI\n[Exercise: OpenAI - Getting started](../llm/exercises/ex_gpt_start.ipynb)\n\n[Exercise: GPT Chatbot](../llm/exercises/ex_gpt_chatbot.ipynb)\n\n[Exercise: GPT Parameterization](../llm/exercises/ex_gpt_parameterization.ipynb)\n\n[Exercise: NER with tool calling](../llm/exercises/ex_gpt_ner_with_function_calls.ipynb)\n\n\n#### Embeddings \n[Exercise: Embedding similarity](../embeddings/exercises/ex_emb_similarity.ipynb)\n\n", "supporting": [ "exercises_files" ], diff --git a/_freeze/slides/python_intro/introduction/execute-results/html.json b/_freeze/slides/python_intro/introduction/execute-results/html.json new file mode 100644 index 0000000..d2e2766 --- /dev/null +++ b/_freeze/slides/python_intro/introduction/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "1a9e5606af6f60b8c3b490514c474005", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: \"Introduction to Python\"\nformat: \n revealjs:\n theme: default\n chalkboard: true\n footer: \"Seminar: LLM, WiSe 2024/25\"\n logo: ../../assets/logo.svg\n slideNumber: true\n smaller: true\n---\n\n\n# Installing Python on Windows and macOS\n\n## Installing Python on Windows\n\n1. **Download the Installer:**\n - Go to the [official Python website](https://www.python.org/downloads/).\n - Click on “Download Python” for Windows.\n\n2. **Run the Installer:**\n - Double-click the `.exe` file to run the installer.\n - **Important**: Check “Add Python to PATH”.\n - Choose \"Install Now\" or \"Customize installation\".\n\n3. **Verify Installation:**\n - Open Command Prompt and type `python --version`.\n\n---\n\n## Installing Python on macOS\n\n1. **Download the Installer:**\n - Visit the [official Python website](https://www.python.org/downloads/).\n - Download the macOS `.pkg` installer.\n\n2. **Run the Installer:**\n - Double-click the `.pkg` file and follow the installation steps.\n\n3. **Verify Installation:**\n - Open Terminal and type `python3 --version`.\n\n---\n\n## Additional Setup (Optional)\n\n### Install pip\n\n- **Windows**: \n ```bash\n pip install package_name\n ```\n- **macOS**:\n ```bash\n pip3 install package_name\n ```\n\nNow you're ready to start programming in Python!\n\n---\n\n# Using VSCode\n\nVisual Studio Code (VSCode) is a lightweight, powerful code editor that supports Python through extensions.\n\n## Getting Started\n\n1. **Install VSCode**: Download it from [the official website](https://code.visualstudio.com/).\n2. **Install Python Extension**:\n - Open VSCode.\n - Search for \"Python\" in Extensions (`Ctrl + Shift + X`).\n - Install the official Python extension by Microsoft.\n3. **Select Python Interpreter**:\n - Press `Ctrl + Shift + P` → type **Python: Select Interpreter**.\n\n---\n\n## Writing and Running Python Code\n\n1. **Create a New File**:\n - Click `File > New File` or `Ctrl + N`.\n - Save with `.py` extension.\n2. **Write Code**:\n ```python\n print(\"Hello, VSCode!\")\n ```\n3. **Run Your Code**:\n - Open Terminal (`Ctrl + \\`), type `python script.py`, or press `Shift + Enter`.\n\n---\n\n## Debugging in VSCode\n\n1. **Set Breakpoints**: Click next to line numbers to set breakpoints.\n2. **Start Debugging**: Press `F5` or go to the Debug view.\n3. **Inspect Code**: Use the Debug Console to evaluate variables and step through the code.\n\n---\n\n## Extensions and Features\n\n### Linting\n- Helps catch errors and enforce standards.\n- Enable it under `Settings > Python > Linting`.\n\n### IntelliSense\n- Offers code suggestions and autocompletion.\n\n### Jupyter Notebooks\n- Install the Jupyter extension for notebook support directly in VSCode.\n\n\n\n\n# Jupyter Notebooks\n\nJupyter Notebook is a popular tool used for interactive programming, especially in data science.\n\n### Key Features\n- **Interactive coding**\n- **Inline visualizations**\n- **Rich text support** (Markdown)\n\n\n## The User Interface\n\n1. **Notebook Cells**:\n - **Code Cells**: Write and execute code.\n - **Markdown Cells**: Write formatted text (headers, lists, etc.).\n\n2. **Execute Code**: Click \"Run\" or press `Shift + Enter`.\n\n---\n\n## Writing and Executing Code\n\nExample:\n\n::: {#ee0edab2 .cell execution_count=1}\n\n::: {.cell-output .cell-output-stdout}\n```\nHello, Jupyter!\n```\n:::\n:::\n\n\n- Runs directly inside the notebook.\n- Output appears below the code cell.\n\n---\n\n## Markdown Documentation\n\nUse Markdown cells for documentation.\n\n#### Example:\n# My Jupyter Notebook\n- Bullet points\n- Lists\n\n- Markdown renders as formatted text when the cell is run.\n\n---\n\n## Visualization and Output\n\nJupyter supports libraries like Matplotlib for creating plots inline.\n\n#### Example:\n\n::: {#2b04ceb0 .cell execution_count=2}\n\n::: {.cell-output .cell-output-display}\n![](introduction_files/figure-revealjs/cell-3-output-1.png){width=789 height=411}\n:::\n:::\n\n\n- The plot is displayed below the code cell.\n\n---\n\n## Saving and Sharing Notebooks\n\n1. **Save**: Use `Ctrl + S` (or `Cmd + S` on Mac).\n2. **Export**: Export notebooks to formats like HTML or PDF.\n3. **Sharing**: Share `.ipynb` files for use in Jupyter environments.\n\n---\n\n## Keyboard Shortcuts\n\n- `Enter`: Edit the selected cell.\n- `Esc`: Command mode (no editing).\n- `A`: Insert a cell above.\n- `B`: Insert a cell below.\n- `DD`: Delete selected cell.\n- `Z`: Undo last cell deletion.\n- `Shift + Enter`: Run current cell, move to the next.\n- `Ctrl + Enter`: Run current cell, stay in it.\n\n", + "supporting": [ + "introduction_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/slides/python_intro/introduction/figure-revealjs/cell-3-output-1.png b/_freeze/slides/python_intro/introduction/figure-revealjs/cell-3-output-1.png new file mode 100644 index 0000000..8ab1b59 Binary files /dev/null and b/_freeze/slides/python_intro/introduction/figure-revealjs/cell-3-output-1.png differ diff --git a/docs/index.html b/docs/index.html index 5298895..0fbca77 100644 --- a/docs/index.html +++ b/docs/index.html @@ -497,9 +497,22 @@

How to use this scr

What you will learn

As this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. Instead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. The content can already be seen from the navigation bar, but here’s a quick walk-through. More precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis. Afterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI. Eventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models. Have fun!

-
-

The schedule

-

TBD

+
+

A rough schedule

+
    +
  • Introduction & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria
  • +
  • Introduction to the general topic & Python & Jupyter
  • +
  • Introduction NLP (tokenization, matching, statistical analysis)
  • +
  • Introduction to LLM & OpenAI API
  • +
  • Prompting
  • +
  • Embeddings
  • +
  • Advanced GPT topics (image data, parameterization, tool calling)
  • +
  • Real-world examples of applications (& implementation) & limitations
  • +
  • App concept & Group brainstorming
  • +
  • Project work on prototype & mentoring
  • +
  • Project presentations & reflections on the seminar
  • +
  • Backup: Ethics and data privacy
  • +

After the seminar (~1d):

    diff --git a/docs/llm/gpt_api.html b/docs/llm/gpt_api.html index 0f72ac0..b14b00c 100644 --- a/docs/llm/gpt_api.html +++ b/docs/llm/gpt_api.html @@ -514,7 +514,7 @@

    The OpenAI API

    Authentication

    Getting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the University’s Jupyterlab.

    -
    +
    # setting up the client in Python
     
     import os
    @@ -528,8 +528,8 @@ 

    Authentication

    Requesting Completions

    Most interaction with GPT and other models consist in generating completions for prompts, i.e., providing some text with instructions and letting the language model complete the text one token after the other as seen here.

    -

    To request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)

    -
    +

    To request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (here).

    +
    # creating a completion
     chat_completion = client.chat.completions.create(
         messages=[
    @@ -549,7 +549,7 @@ 

    Processing

    Response

    After processing your request, the OpenAI API returns a response containing the generated text completions. Depending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as the amount of token processed in the request, the reason why the model stopped the answer etc. This response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application’s behavior accordingly. Let’s check it out briefly, before you explore the response object more in-depth in your next exercise.

    -
    +
    # check out the type of the response
     
     print(f"Response object type: {type(chat_completion)}") # a ChatCompletion object
    @@ -562,9 +562,9 @@ 

    Response

    Response object type: <class 'openai.types.chat.chat_completion.ChatCompletion'>
     
    -Response message: The Earth is approximately 4.54 billion years old.
    +Response message: The Earth is estimated to be around 4.5 billion years old.
     
    -Total tokens used: 25
    +Total tokens used: 28
    diff --git a/docs/python_intro/overview.html b/docs/python_intro/overview.html index 8439ca4..d0b0b00 100644 --- a/docs/python_intro/overview.html +++ b/docs/python_intro/overview.html @@ -497,9 +497,6 @@

    Introduction to Python

    -
    -

    TODO: Some introduction

    -

    Installing Python on Windows and macOS

    @@ -642,7 +639,7 @@

    Writing and Exe

    Example:

    -
    +
    print("Hello, Jupyter!")
    Hello, Jupyter!
    @@ -671,7 +668,7 @@

    Visualization and

    Jupyter Notebook supports various visualization libraries like Matplotlib, Seaborn, and Plotly, allowing you to create plots and graphs inline.

    Example:

    -
    +
    import matplotlib.pyplot as plt
     
     # Sample data
    diff --git a/docs/resources/apis.html b/docs/resources/apis.html
    index 29d4cfb..a40afce 100644
    --- a/docs/resources/apis.html
    +++ b/docs/resources/apis.html
    @@ -187,7 +187,6 @@ 

    Language Model APIs

    -

    Certainly! Here’s the list with each title linked to its respective website, followed by the description:

    1. Google Cloud Natural Language API Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API.

    2. Microsoft Azure Cognitive Services - Text Analytics Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn’t offer large pre-trained language models, it’s suitable for various text analysis tasks and integrates well with other Azure services.

    3. diff --git a/docs/resources/exercises.html b/docs/resources/exercises.html index dee2cf9..e6cfb9a 100644 --- a/docs/resources/exercises.html +++ b/docs/resources/exercises.html @@ -187,6 +187,16 @@

      List of exercises

      +
      +

      Introduction to Python

      +

      Exercise: Data types

      +

      Exercise: String manipulations

      +

      Exercise: Lists and loops

      +

      Exercise: Conditional statements

      +

      Exercise: Functions

      +

      Exercise: Dictionaries

      +

      Exercise: Classes

      +

      Natural Language Processing

      Exercise: Sentence tokenization

      diff --git a/docs/search.json b/docs/search.json index f604e05..40d0cda 100644 --- a/docs/search.json +++ b/docs/search.json @@ -715,7 +715,7 @@ "href": "resources/exercises.html", "title": "List of exercises", "section": "", - "text": "Natural Language Processing\nExercise: Sentence tokenization\nExercise: TF-IDF\nExercise: Word matching\nExercise: Fuzzy matching\n\n\nLarge Language Models with OpenAI\nExercise: OpenAI - Getting started\nExercise: GPT Chatbot\nExercise: GPT Parameterization\nExercise: NER with tool calling\n\n\nEmbeddings\nExercise: Embedding similarity\n\n\n\n\n Back to top", + "text": "Introduction to Python\nExercise: Data types\nExercise: String manipulations\nExercise: Lists and loops\nExercise: Conditional statements\nExercise: Functions\nExercise: Dictionaries\nExercise: Classes\n\n\nNatural Language Processing\nExercise: Sentence tokenization\nExercise: TF-IDF\nExercise: Word matching\nExercise: Fuzzy matching\n\n\nLarge Language Models with OpenAI\nExercise: OpenAI - Getting started\nExercise: GPT Chatbot\nExercise: GPT Parameterization\nExercise: NER with tool calling\n\n\nEmbeddings\nExercise: Embedding similarity\n\n\n\n\n Back to top", "crumbs": [ "Resources", "List of exercises" @@ -977,7 +977,7 @@ "href": "python_intro/overview.html", "title": "Introduction to Python", "section": "", - "text": "TODO: Some introduction\n\n\nInstalling Python on Windows and macOS\n\nInstalling Python on Windows\n\nDownload the Installer:\n\nGo to the official Python website.\nClick on the “Download Python” button. This will download the latest version for Windows.\n\nRun the Installer:\n\nLocate the downloaded .exe file in your downloads folder and double-click it to run the installer.\nImportant: Check the box that says “Add Python to PATH” at the bottom of the installation window.\nChoose “Install Now” for a standard installation or “Customize installation” for more options.\n\nVerify Installation:\n\nOpen the Command Prompt by searching for cmd in the Start menu.\nType python --version and press Enter. You should see the installed version of Python.\n\n\n\n\nInstalling Python on macOS\n\nDownload the Installer:\n\nVisit the official Python website.\nClick on the “Download Python” button, which will get the latest version for macOS.\n\nRun the Installer:\n\nLocate the downloaded .pkg file and double-click it to launch the installer.\nFollow the on-screen instructions to complete the installation.\n\nVerify Installation:\n\nOpen the Terminal application (you can find it using Spotlight Search by pressing Command + Space and typing “Terminal”).\nType python3 --version and press Enter. You should see the installed version of Python.\n\n\n\n\nAdditional Setup (Optional)\nAfter installing Python, it’s a good idea to install pip, Python’s package manager, which is included by default in the latest Python versions. You can use pip to install additional libraries and packages as needed.\nFor Windows: - To install a package, open Command Prompt and type: bash pip install package_name\nFor macOS: - Open Terminal and type: bash pip3 install package_name\nThat’s it! You’re now ready to start programming in Python.\n\n\n\nUsing VSCode\nVisual Studio Code (VSCode) is a powerful and popular code editor developed by Microsoft. It is highly extensible, lightweight, and supports a wide range of programming languages, including Python. With its robust features such as IntelliSense, debugging capabilities, and integrated terminal, VSCode is an excellent choice for Python development.\n\nGetting Started\nTo start using Python in VSCode, follow these steps:\n\nInstall VSCode: If you haven’t already, download and install Visual Studio Code from the official website.\nInstall the Python Extension:\n\nOpen VSCode.\nGo to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side or pressing Ctrl + Shift + X.\nSearch for “Python” and install the official extension provided by Microsoft. This extension adds rich support for Python development, including IntelliSense and linting.\n\nSelect the Python Interpreter:\n\nAfter installing the extension, you need to select the Python interpreter. Press Ctrl + Shift + P to open the Command Palette, then type and select Python: Select Interpreter.\nChoose the interpreter that matches your Python installation.\n\n\n\n\nWriting and Running Python Code\n\nCreate a New File:\n\nYou can create a new Python file by clicking on File > New File or pressing Ctrl + N.\nSave it with a .py extension (e.g., script.py).\n\nWrite Your Code:\n\nBegin writing your Python code in the editor. For example:\n\nprint(\"Hello, VSCode!\")\nRun Your Code:\n\nThere are multiple ways to run your Python code:\n\nUsing the Terminal: Open the integrated terminal by selecting View > Terminal or pressing Ctrl + ` (backtick). In the terminal, type python script.py (replacing script.py with your file name) to execute the script.\nRun Code Action: You can also run your code directly from the editor by clicking the play button (▶️) that appears above the code or using the shortcut Shift + Enter.\n\n\n\n\n\nDebugging in VSCode\nVSCode provides powerful debugging features to help you troubleshoot your code:\n\nSet Breakpoints: Click in the gutter next to the line numbers to set breakpoints where you want the execution to pause.\nStart Debugging: Press F5 or go to the Debug view by clicking on the Debug icon in the Activity Bar.\nYou can then start debugging your Python script. The Debug Console will allow you to inspect variables, step through code, and evaluate expressions.\n\n\n\nUsing Extensions and Features\nVSCode has a wide variety of extensions to enhance your Python development experience:\n\nLinting: The Python extension includes linting capabilities that help you catch errors and enforce coding standards. You can enable it in the settings (Settings > Python > Linting).\nIntelliSense: Take advantage of IntelliSense for code suggestions, autocompletions, and quick documentation. Simply start typing, and relevant suggestions will appear.\nJupyter Notebooks: If you want to work with Jupyter Notebooks directly in VSCode, install the Jupyter extension. This allows you to create, edit, and run notebooks seamlessly.\n\n\n\n\n\nJupyter Notebooks\nJupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used in data science, machine learning, and scientific computing, making it a versatile tool for both beginners and advanced users. In a Jupyter Notebook, you can write and execute code in a variety of programming languages, including Python. It provides an interactive environment where you can document your thought process alongside your code, visualize data, and quickly test ideas without the need for a complete development setup.\n\nGetting Started\nOnce you have Jupyter Notebook up and running, you will typically start by opening a new notebook. Here are the key components and features of Jupyter Notebook to help you navigate and utilize it effectively:\n\n\nThe User Interface\nUpon launching Jupyter Notebook, you’ll be greeted with a dashboard showing your files and notebooks. You can create a new notebook by selecting “New” and then choosing the desired kernel (like Python 3).\n\nNotebook Cells: The main area consists of cells where you can write your code or text. There are two main types of cells:\n\nCode Cells: Where you write and execute code.\nMarkdown Cells: Where you can write formatted text, including headers, lists, and links.\n\n\n\n\nWriting and Executing Code\nTo write code in a code cell:\n\nClick on a cell to make it active.\nType your code into the cell.\n\n\nExample:\n\nprint(\"Hello, Jupyter!\")\n\nHello, Jupyter!\n\n\nTo execute the code, you can either click the “Run” button in the toolbar or press Shift + Enter. This will run the code and display the output directly below the cell.\n\n\n\nUsing Markdown for Documentation\nMarkdown cells allow you to document your code using plain text. You can format your text using Markdown syntax.\n\nExample:\nTo create a markdown cell with a header, simply type:\n# My Jupyter Notebook\nAfter running the cell, it will render as a formatted header.\nYou can also create bullet points, numbered lists, links, and more:\n## Key Features\n- Interactive coding\n- Inline visualizations\n- Rich text support\n\n\n\nVisualization and Output\nJupyter Notebook supports various visualization libraries like Matplotlib, Seaborn, and Plotly, allowing you to create plots and graphs inline.\n\nExample:\n\nimport matplotlib.pyplot as plt\n\n# Sample data\nx = [1, 2, 3, 4, 5]\ny = [2, 3, 5, 7, 11]\n\n# Creating a plot\nplt.plot(x, y)\nplt.title(\"Sample Plot\")\nplt.xlabel(\"X-axis\")\nplt.ylabel(\"Y-axis\")\nplt.show()\n\n\n\n\n\n\n\n\nAfter running this code, the plot will be displayed directly beneath the code cell.\n\n\n\nSaving and Sharing Notebooks\nYou can save your notebook by clicking the save icon or using the shortcut Ctrl + S (or Cmd + S on Mac). Jupyter Notebooks are saved with a .ipynb extension.\nTo share your notebook, you can export it to different formats, such as HTML or PDF, by using the “File” menu. You can also share the .ipynb file directly, which can be opened in any Jupyter environment.\n\n\nKeyboard Shortcuts\nJupyter Notebook has many handy keyboard shortcuts that can improve your efficiency. Here are a few essential ones:\n\nEnter: Edit the selected cell.\nEsc: Command mode (no editing).\nA: Insert a new cell above.\nB: Insert a new cell below.\nDD: Delete the selected cell.\nZ: Undo the last cell deletion.\nShift + Enter: Run the current cell and move to the next one.\nCtrl + Enter: Run the current cell and stay in it.\n\n\n\n\n\n\n Back to top", + "text": "Installing Python on Windows and macOS\n\nInstalling Python on Windows\n\nDownload the Installer:\n\nGo to the official Python website.\nClick on the “Download Python” button. This will download the latest version for Windows.\n\nRun the Installer:\n\nLocate the downloaded .exe file in your downloads folder and double-click it to run the installer.\nImportant: Check the box that says “Add Python to PATH” at the bottom of the installation window.\nChoose “Install Now” for a standard installation or “Customize installation” for more options.\n\nVerify Installation:\n\nOpen the Command Prompt by searching for cmd in the Start menu.\nType python --version and press Enter. You should see the installed version of Python.\n\n\n\n\nInstalling Python on macOS\n\nDownload the Installer:\n\nVisit the official Python website.\nClick on the “Download Python” button, which will get the latest version for macOS.\n\nRun the Installer:\n\nLocate the downloaded .pkg file and double-click it to launch the installer.\nFollow the on-screen instructions to complete the installation.\n\nVerify Installation:\n\nOpen the Terminal application (you can find it using Spotlight Search by pressing Command + Space and typing “Terminal”).\nType python3 --version and press Enter. You should see the installed version of Python.\n\n\n\n\nAdditional Setup (Optional)\nAfter installing Python, it’s a good idea to install pip, Python’s package manager, which is included by default in the latest Python versions. You can use pip to install additional libraries and packages as needed.\nFor Windows: - To install a package, open Command Prompt and type: bash pip install package_name\nFor macOS: - Open Terminal and type: bash pip3 install package_name\nThat’s it! You’re now ready to start programming in Python.\n\n\n\nUsing VSCode\nVisual Studio Code (VSCode) is a powerful and popular code editor developed by Microsoft. It is highly extensible, lightweight, and supports a wide range of programming languages, including Python. With its robust features such as IntelliSense, debugging capabilities, and integrated terminal, VSCode is an excellent choice for Python development.\n\nGetting Started\nTo start using Python in VSCode, follow these steps:\n\nInstall VSCode: If you haven’t already, download and install Visual Studio Code from the official website.\nInstall the Python Extension:\n\nOpen VSCode.\nGo to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side or pressing Ctrl + Shift + X.\nSearch for “Python” and install the official extension provided by Microsoft. This extension adds rich support for Python development, including IntelliSense and linting.\n\nSelect the Python Interpreter:\n\nAfter installing the extension, you need to select the Python interpreter. Press Ctrl + Shift + P to open the Command Palette, then type and select Python: Select Interpreter.\nChoose the interpreter that matches your Python installation.\n\n\n\n\nWriting and Running Python Code\n\nCreate a New File:\n\nYou can create a new Python file by clicking on File > New File or pressing Ctrl + N.\nSave it with a .py extension (e.g., script.py).\n\nWrite Your Code:\n\nBegin writing your Python code in the editor. For example:\n\nprint(\"Hello, VSCode!\")\nRun Your Code:\n\nThere are multiple ways to run your Python code:\n\nUsing the Terminal: Open the integrated terminal by selecting View > Terminal or pressing Ctrl + ` (backtick). In the terminal, type python script.py (replacing script.py with your file name) to execute the script.\nRun Code Action: You can also run your code directly from the editor by clicking the play button (▶️) that appears above the code or using the shortcut Shift + Enter.\n\n\n\n\n\nDebugging in VSCode\nVSCode provides powerful debugging features to help you troubleshoot your code:\n\nSet Breakpoints: Click in the gutter next to the line numbers to set breakpoints where you want the execution to pause.\nStart Debugging: Press F5 or go to the Debug view by clicking on the Debug icon in the Activity Bar.\nYou can then start debugging your Python script. The Debug Console will allow you to inspect variables, step through code, and evaluate expressions.\n\n\n\nUsing Extensions and Features\nVSCode has a wide variety of extensions to enhance your Python development experience:\n\nLinting: The Python extension includes linting capabilities that help you catch errors and enforce coding standards. You can enable it in the settings (Settings > Python > Linting).\nIntelliSense: Take advantage of IntelliSense for code suggestions, autocompletions, and quick documentation. Simply start typing, and relevant suggestions will appear.\nJupyter Notebooks: If you want to work with Jupyter Notebooks directly in VSCode, install the Jupyter extension. This allows you to create, edit, and run notebooks seamlessly.\n\n\n\n\n\nJupyter Notebooks\nJupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used in data science, machine learning, and scientific computing, making it a versatile tool for both beginners and advanced users. In a Jupyter Notebook, you can write and execute code in a variety of programming languages, including Python. It provides an interactive environment where you can document your thought process alongside your code, visualize data, and quickly test ideas without the need for a complete development setup.\n\nGetting Started\nOnce you have Jupyter Notebook up and running, you will typically start by opening a new notebook. Here are the key components and features of Jupyter Notebook to help you navigate and utilize it effectively:\n\n\nThe User Interface\nUpon launching Jupyter Notebook, you’ll be greeted with a dashboard showing your files and notebooks. You can create a new notebook by selecting “New” and then choosing the desired kernel (like Python 3).\n\nNotebook Cells: The main area consists of cells where you can write your code or text. There are two main types of cells:\n\nCode Cells: Where you write and execute code.\nMarkdown Cells: Where you can write formatted text, including headers, lists, and links.\n\n\n\n\nWriting and Executing Code\nTo write code in a code cell:\n\nClick on a cell to make it active.\nType your code into the cell.\n\n\nExample:\n\nprint(\"Hello, Jupyter!\")\n\nHello, Jupyter!\n\n\nTo execute the code, you can either click the “Run” button in the toolbar or press Shift + Enter. This will run the code and display the output directly below the cell.\n\n\n\nUsing Markdown for Documentation\nMarkdown cells allow you to document your code using plain text. You can format your text using Markdown syntax.\n\nExample:\nTo create a markdown cell with a header, simply type:\n# My Jupyter Notebook\nAfter running the cell, it will render as a formatted header.\nYou can also create bullet points, numbered lists, links, and more:\n## Key Features\n- Interactive coding\n- Inline visualizations\n- Rich text support\n\n\n\nVisualization and Output\nJupyter Notebook supports various visualization libraries like Matplotlib, Seaborn, and Plotly, allowing you to create plots and graphs inline.\n\nExample:\n\nimport matplotlib.pyplot as plt\n\n# Sample data\nx = [1, 2, 3, 4, 5]\ny = [2, 3, 5, 7, 11]\n\n# Creating a plot\nplt.plot(x, y)\nplt.title(\"Sample Plot\")\nplt.xlabel(\"X-axis\")\nplt.ylabel(\"Y-axis\")\nplt.show()\n\n\n\n\n\n\n\n\nAfter running this code, the plot will be displayed directly beneath the code cell.\n\n\n\nSaving and Sharing Notebooks\nYou can save your notebook by clicking the save icon or using the shortcut Ctrl + S (or Cmd + S on Mac). Jupyter Notebooks are saved with a .ipynb extension.\nTo share your notebook, you can export it to different formats, such as HTML or PDF, by using the “File” menu. You can also share the .ipynb file directly, which can be opened in any Jupyter environment.\n\n\nKeyboard Shortcuts\nJupyter Notebook has many handy keyboard shortcuts that can improve your efficiency. Here are a few essential ones:\n\nEnter: Edit the selected cell.\nEsc: Command mode (no editing).\nA: Insert a new cell above.\nB: Insert a new cell below.\nDD: Delete the selected cell.\nZ: Undo the last cell deletion.\nShift + Enter: Run the current cell and move to the next one.\nCtrl + Enter: Run the current cell and stay in it.\n\n\n\n\n\n\n Back to top", "crumbs": [ "Seminar", "Python Crash Course", @@ -1073,7 +1073,7 @@ "href": "llm/gpt_api.html", "title": "The OpenAI API", "section": "", - "text": "Note\n\n\n\nResource: OpenAI API docs\n\n\nLet’s finally get started working with GPT. In this seminar, we will use the OpenAI API to work with, but there are many alternatives out there. We have collected a few in the resources.\n\nAuthentication\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the University’s Jupyterlab.\n\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n\n\n\nRequesting Completions\nMost interaction with GPT and other models consist in generating completions for prompts, i.e., providing some text with instructions and letting the language model complete the text one token after the other as seen here.\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization)\n\n# creating a completion\nchat_completion = client.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"How old is the earth?\",\n }\n ],\n model=\"gpt-3.5-turbo\" # choose the model\n)\n\n\n\nProcessing\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. This process involves analyzing the context provided by the prompt and leveraging the model’s pre-trained knowledge to generate text completions. The model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. By drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n\nResponse\nAfter processing your request, the OpenAI API returns a response containing the generated text completions. Depending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as the amount of token processed in the request, the reason why the model stopped the answer etc. This response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application’s behavior accordingly. Let’s check it out briefly, before you explore the response object more in-depth in your next exercise.\n\n# check out the type of the response\n\nprint(f\"Response object type: {type(chat_completion)}\") # a ChatCompletion object\n\n# print the message we want\nprint(f\"\\nResponse message: {chat_completion.choices[0].message.content}\")\n\n# check the tokens used \nprint(f\"\\nTotal tokens used: {chat_completion.usage.total_tokens}\")\n\nResponse object type: <class 'openai.types.chat.chat_completion.ChatCompletion'>\n\nResponse message: The Earth is approximately 4.54 billion years old.\n\nTotal tokens used: 25\n\n\n\n\nError Handling\nWhile interacting with the OpenAI API (or any API for that matter), it’s crucial to implement some robust error handling mechanisms to manage any potential issues that may arise. The kind of classic errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. But for language models in particular, there are plenty more problems that can arise simply involving the answer we get from the model. Some examples are requests involving explicit language or content or restricted content etc. which are typically blocked by the API. Other times it might simply happen that a model does not respond in a way you expected, for example, just repeating your input instead of responding properly, or not responding in the format you requested. Whenever we are using language model for applications, we need to be aware of this and implement the right measures to handle these situations.\n\n\n\n\n Back to top", + "text": "Note\n\n\n\nResource: OpenAI API docs\n\n\nLet’s finally get started working with GPT. In this seminar, we will use the OpenAI API to work with, but there are many alternatives out there. We have collected a few in the resources.\n\nAuthentication\nGetting started with the OpenAI Chat Completions API requires signing up for an account on the OpenAI platform. Once you’ve registered, you’ll gain access to an API key, which serves as a unique identifier for your application to authenticate requests to the API. This key is essential for ensuring secure communication between your application and OpenAI’s servers. Without proper authentication, your requests will be rejected. You can create your own account, but for the seminar we will provide the client with the credential within the University’s Jupyterlab.\n\n# setting up the client in Python\n\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n api_key=os.environ.get(\"OPENAI_API_KEY\")\n)\n\n\n\nRequesting Completions\nMost interaction with GPT and other models consist in generating completions for prompts, i.e., providing some text with instructions and letting the language model complete the text one token after the other as seen here.\nTo request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. Additionally, you can specify the desired model configuration, such as the engine to use (e.g., “gpt-4”), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (here).\n\n# creating a completion\nchat_completion = client.chat.completions.create(\n messages=[\n {\n \"role\": \"user\",\n \"content\": \"How old is the earth?\",\n }\n ],\n model=\"gpt-3.5-turbo\" # choose the model\n)\n\n\n\nProcessing\nOnce the OpenAI API receives your request, it proceeds to process the provided prompt using the specified model. This process involves analyzing the context provided by the prompt and leveraging the model’s pre-trained knowledge to generate text completions. The model employs advanced natural language processing techniques to ensure that the generated completions are coherent and contextually relevant. By drawing from its extensive training data and understanding of human language, the model aims to produce responses that closely align with human-like communication.\n\n\nResponse\nAfter processing your request, the OpenAI API returns a response containing the generated text completions. Depending on the specifics of your request, you may receive multiple completions, each accompanied by additional information such as the amount of token processed in the request, the reason why the model stopped the answer etc. This response provides valuable insights into the quality and relevance of the completions, allowing you to tailor your application’s behavior accordingly. Let’s check it out briefly, before you explore the response object more in-depth in your next exercise.\n\n# check out the type of the response\n\nprint(f\"Response object type: {type(chat_completion)}\") # a ChatCompletion object\n\n# print the message we want\nprint(f\"\\nResponse message: {chat_completion.choices[0].message.content}\")\n\n# check the tokens used \nprint(f\"\\nTotal tokens used: {chat_completion.usage.total_tokens}\")\n\nResponse object type: <class 'openai.types.chat.chat_completion.ChatCompletion'>\n\nResponse message: The Earth is estimated to be around 4.5 billion years old.\n\nTotal tokens used: 28\n\n\n\n\nError Handling\nWhile interacting with the OpenAI API (or any API for that matter), it’s crucial to implement some robust error handling mechanisms to manage any potential issues that may arise. The kind of classic errors include providing invalid parameters, experiencing authentication failures due to an incorrect API key, or encountering rate limiting restrictions. But for language models in particular, there are plenty more problems that can arise simply involving the answer we get from the model. Some examples are requests involving explicit language or content or restricted content etc. which are typically blocked by the API. Other times it might simply happen that a model does not respond in a way you expected, for example, just repeating your input instead of responding properly, or not responding in the format you requested. Whenever we are using language model for applications, we need to be aware of this and implement the right measures to handle these situations.\n\n\n\n\n Back to top", "crumbs": [ "Seminar", "Large Language Models", @@ -1556,7 +1556,7 @@ "href": "resources/apis.html", "title": "Language Model APIs", "section": "", - "text": "Certainly! Here’s the list with each title linked to its respective website, followed by the description:\n\nGoogle Cloud Natural Language API Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API.\nMicrosoft Azure Cognitive Services - Text Analytics Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn’t offer large pre-trained language models, it’s suitable for various text analysis tasks and integrates well with other Azure services.\nIBM Watson Natural Language Understanding Watson Natural Language Understanding is part of IBM’s suite of AI-powered tools. It provides advanced text analysis capabilities, including sentiment analysis, entity recognition, concept extraction, and categorization. While it doesn’t offer large-scale language generation models like GPT, it’s suitable for analyzing and extracting insights from text data.\nHugging Face Transformers Hugging Face Transformers is an open-source library that provides a wide range of pre-trained models for natural language processing tasks. It includes popular models like GPT, BERT, and RoBERTa, along with support for fine-tuning and custom model development. While it’s not a hosted API service like OpenAI, it offers powerful tools for developers to work with state-of-the-art NLP models.\nDeepAI Text Generation API DeepAI offers a Text Generation API that allows users to generate human-like text based on a given prompt. While it may not provide the scale or versatility of GPT-like models, it’s suitable for tasks such as generating short-form content, creative writing, and text completion.\nLLama LLama is an AI platform that offers large language models and other NLP capabilities for developers and enterprises. It provides access to pre-trained models, including GPT-like architectures, as well as tools for fine-tuning models on custom datasets. LLama aims to democratize access to advanced AI technologies and support a wide range of NLP applications.\nGemini Gemini by Cortical.io is an AI-based platform that offers natural language understanding and text analytics capabilities. It utilizes semantic folding technology, inspired by the human brain, to analyze and process text data efficiently. While it may not provide large-scale language generation models like GPT, Gemini offers powerful tools for semantic analysis, document clustering, and similarity detection.\n\n\n\n\n Back to top", + "text": "Google Cloud Natural Language API Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API.\nMicrosoft Azure Cognitive Services - Text Analytics Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn’t offer large pre-trained language models, it’s suitable for various text analysis tasks and integrates well with other Azure services.\nIBM Watson Natural Language Understanding Watson Natural Language Understanding is part of IBM’s suite of AI-powered tools. It provides advanced text analysis capabilities, including sentiment analysis, entity recognition, concept extraction, and categorization. While it doesn’t offer large-scale language generation models like GPT, it’s suitable for analyzing and extracting insights from text data.\nHugging Face Transformers Hugging Face Transformers is an open-source library that provides a wide range of pre-trained models for natural language processing tasks. It includes popular models like GPT, BERT, and RoBERTa, along with support for fine-tuning and custom model development. While it’s not a hosted API service like OpenAI, it offers powerful tools for developers to work with state-of-the-art NLP models.\nDeepAI Text Generation API DeepAI offers a Text Generation API that allows users to generate human-like text based on a given prompt. While it may not provide the scale or versatility of GPT-like models, it’s suitable for tasks such as generating short-form content, creative writing, and text completion.\nLLama LLama is an AI platform that offers large language models and other NLP capabilities for developers and enterprises. It provides access to pre-trained models, including GPT-like architectures, as well as tools for fine-tuning models on custom datasets. LLama aims to democratize access to advanced AI technologies and support a wide range of NLP applications.\nGemini Gemini by Cortical.io is an AI-based platform that offers natural language understanding and text analytics capabilities. It utilizes semantic folding technology, inspired by the human brain, to analyze and process text data efficiently. While it may not provide large-scale language generation models like GPT, Gemini offers powerful tools for semantic analysis, document clustering, and similarity detection.\n\n\n\n\n Back to top", "crumbs": [ "Resources", "Language model APIs" @@ -1578,7 +1578,7 @@ "href": "index.html", "title": "Seminar: Large Language Models", "section": "", - "text": "Robot by DALL-E\n\n\nHello and welcome to the seminar Large Language Models in the winter semester of 2024/25 at the University of Applied Sciences in Münster. On this website, you will find all the information you need about and around the seminar.\n\nAbout the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. In the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. We will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models and their applications themselves. Already during the theory, we will make sure to code in Python alongside all the concepts and see coding examples to get familiar with it.\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. You will solve a few (coding) exercises around all the topics yourselves. To get everyone fired up as quickly as possible, we have prepared a Jupyterlab environment that everyone can use for the solution of the exercises.\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in our own projects. All participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model. More information and ideas for these projects can be found here.\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. However, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\nHow to use this script\nThis script is meant to give a comprehensive overview right away from the start. Feel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. All exercises that we will solve together in this seminar are contained in this script as well, including their solution. For all exercises, the (or more precisely, a) solution is hidden behind a Show solution button. For the sake of your own learning process, try to solve the exercises yourselves first! If you’re stuck, ask for a quick hint. If you still feel like you do not advance any more, then check out the solution and try to understand it. The solution of the exercises is not part of the evaluation, so it’s really for your own progress! A “summary” of all exercises can be found here.\n\n\n\n\n\n\nImportant\n\n\n\nA small disclaimer: This script is not (yet) ridiculously comprehensive. And, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note.\n\n\n\n\nWhat you will learn\nAs this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. Instead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. The content can already be seen from the navigation bar, but here’s a quick walk-through. More precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis. Afterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI. Eventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models. Have fun!\n\n\nThe schedule\nTBD\n\nAfter the seminar (~1d):\n\nPrototype refinement\nCode review & documentation\nRefine business case & potential applications of prototype\nReflections & lessons learned → Hand in 2-page summary\n\n\n\n\nEvaluation\nAll seminar participants will be evaluated in the following way.\n\nYour presentation on the last day of the seminar: 25%\nYour prototype: 35%\nYour summary: 25%\nYour activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. This seminar is designed for everyone to participate, so the more you do, the more fun it will be!\n\nWhat is the summary?\nAs mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings. The summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information: - What is your prototype? What can I do? - What could be a business case for your prototype, or where can it be applied? - What are current limitations of your prototype and how could you overcome them? - What have been your main learnings during the creation of your prototype (and/or) the seminar itself?\nJust hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.\n\n\n\n\n\n\nNote\n\n\n\nHas this seminar been created with a little help of language models? Absolutely, why wouldn’t it? :)\n\n\n\n\n\n\n\n Back to top", + "text": "Robot by DALL-E\n\n\nHello and welcome to the seminar Large Language Models in the winter semester of 2024/25 at the University of Applied Sciences in Münster. On this website, you will find all the information you need about and around the seminar.\n\nAbout the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. In the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. We will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models and their applications themselves. Already during the theory, we will make sure to code in Python alongside all the concepts and see coding examples to get familiar with it.\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. You will solve a few (coding) exercises around all the topics yourselves. To get everyone fired up as quickly as possible, we have prepared a Jupyterlab environment that everyone can use for the solution of the exercises.\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in our own projects. All participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model. More information and ideas for these projects can be found here.\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. However, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\nHow to use this script\nThis script is meant to give a comprehensive overview right away from the start. Feel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. All exercises that we will solve together in this seminar are contained in this script as well, including their solution. For all exercises, the (or more precisely, a) solution is hidden behind a Show solution button. For the sake of your own learning process, try to solve the exercises yourselves first! If you’re stuck, ask for a quick hint. If you still feel like you do not advance any more, then check out the solution and try to understand it. The solution of the exercises is not part of the evaluation, so it’s really for your own progress! A “summary” of all exercises can be found here.\n\n\n\n\n\n\nImportant\n\n\n\nA small disclaimer: This script is not (yet) ridiculously comprehensive. And, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note.\n\n\n\n\nWhat you will learn\nAs this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. Instead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. The content can already be seen from the navigation bar, but here’s a quick walk-through. More precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis. Afterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI. Eventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models. Have fun!\n\n\nA rough schedule\n\nIntroduction & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria\nIntroduction to the general topic & Python & Jupyter\nIntroduction NLP (tokenization, matching, statistical analysis)\nIntroduction to LLM & OpenAI API\nPrompting\nEmbeddings\nAdvanced GPT topics (image data, parameterization, tool calling)\nReal-world examples of applications (& implementation) & limitations\nApp concept & Group brainstorming\nProject work on prototype & mentoring\nProject presentations & reflections on the seminar\nBackup: Ethics and data privacy\n\n\nAfter the seminar (~1d):\n\nPrototype refinement\nCode review & documentation\nRefine business case & potential applications of prototype\nReflections & lessons learned → Hand in 2-page summary\n\n\n\n\nEvaluation\nAll seminar participants will be evaluated in the following way.\n\nYour presentation on the last day of the seminar: 25%\nYour prototype: 35%\nYour summary: 25%\nYour activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. This seminar is designed for everyone to participate, so the more you do, the more fun it will be!\n\nWhat is the summary?\nAs mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings. The summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information: - What is your prototype? What can I do? - What could be a business case for your prototype, or where can it be applied? - What are current limitations of your prototype and how could you overcome them? - What have been your main learnings during the creation of your prototype (and/or) the seminar itself?\nJust hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.\n\n\n\n\n\n\nNote\n\n\n\nHas this seminar been created with a little help of language models? Absolutely, why wouldn’t it? :)\n\n\n\n\n\n\n\n Back to top", "crumbs": [ "Seminar", "About", @@ -1928,87 +1928,80 @@ "section": "Dictionary Comprehensions", "text": "Dictionary Comprehensions\n\n\nWhat Are Dictionary Comprehensions?\n\nA concise way to create dictionaries by transforming or filtering data.\n\nSyntax\n{key_expression: value_expression for item in iterable if condition}\n\n\nExample of Dictionary Comprehension\n\nsquares = {x: x ** 2 for x in range(5)}\nprint(squares) # Output: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}\n\n{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}\n\n\n\n\n\n\n\nSeminar: LLM, WiSe 2024/25" }, + { + "objectID": "slides/about/intro.html#why-should-i-be-here", + "href": "slides/about/intro.html#why-should-i-be-here", + "title": "Seminar: Large Language Models", + "section": "Why should I be here?", + "text": "Why should I be here?\n\n\nUnderstand how LLMs work and where their limitations are!\nLearn how to automate tasks with a language model\nLearn how to process large data amounts with a language model\nLearn how to use a large language model in a product" + }, { "objectID": "slides/about/intro.html#about-the-seminar", "href": "slides/about/intro.html#about-the-seminar", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "About the seminar", "text": "About the seminar\n\nRoughly divided into 3 parts: theory, training, and application\nTheory:\n\nLearn about important topics in natural language processing\nTopics include tokenization, matching, statistical text analysis, language models, and embeddings\nCoding examples in Python alongside theoretical concepts" }, { - "objectID": "slides/about/intro.html#goal", - "href": "slides/about/intro.html#goal", - "title": "Sprint: Large Language Models", - "section": "Goal", - "text": "Goal\n\n4-day introduction to language models\nAim: Simple overview to start working with language model APIs\nUnderstand basics, know how to apply in own applications\nHave fun!" + "objectID": "slides/about/intro.html#intended-learning-outcomes-part-1", + "href": "slides/about/intro.html#intended-learning-outcomes-part-1", + "title": "Seminar: Large Language Models", + "section": "Intended learning outcomes (Part 1)", + "text": "Intended learning outcomes (Part 1)\n\nUnderstand the basics of natural language processing, including its tasks and challenges\nUnderstand the general concept of LLMs and why it makes the above so much easier\nWrite simple Python programs / scripts and use basic data and control structures" + }, + { + "objectID": "slides/about/intro.html#intended-learning-outcomes-part-2", + "href": "slides/about/intro.html#intended-learning-outcomes-part-2", + "title": "Seminar: Large Language Models", + "section": "Intended learning outcomes (Part 2)", + "text": "Intended learning outcomes (Part 2)\n\nAccess an LLM via the OpenAI API and how to work with the result\nUnderstand the concept of text embeddings and use them via the OpenAI API\nUnderstand why and how LLMs can be used for process automation\nUse an LLM in a small application\nHave fun!" }, { "objectID": "slides/about/intro.html#content-1", "href": "slides/about/intro.html#content-1", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "Content 1", - "text": "Content 1\n\nQuick overview of classic NLP\n\nText processing (Tokenization, Lemmatization, etc.)\nApplications (Classification, Sentiment Analysis, Matching, etc.)\nChallenges\n\nIntroduction to LLM\n\nText processing with neural networks\nSequence generation & language modeling" + "text": "Content 1\n\nShort introduction to Programming in Python in the context of Natural Language Processing and Large Language Models\n\nBasics (Syntax, Variables, Data Types, Conditional Statements etc.)\nLists & Loops\nDictionaries & Classes" }, { "objectID": "slides/about/intro.html#content-2", "href": "slides/about/intro.html#content-2", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "Content 2", - "text": "Content 2\n\nIntroduction to the OpenAI API\n\nPrompting\nParameterization\nFunction calling\n\nIntroduction to embeddings\n\nSimilarity\nVisualization & Clustering\n\n(Ethics & Privacy)" + "text": "Content 2\n\nQuick overview of classic NLP\n\nText processing (Tokenization, Lemmatization, etc.)\nApplications (Classification, Sentiment Analysis, Matching, etc.)\nChallenges\n\nIntroduction to LLM\n\nText processing with neural networks\nSequence generation & language modeling" }, { - "objectID": "slides/about/intro.html#day-1-24.04.2024", - "href": "slides/about/intro.html#day-1-24.04.2024", - "title": "Sprint: Large Language Models", - "section": "Day 1 (24.04.2024):", - "text": "Day 1 (24.04.2024):\n\nGetting to know each other + intro survey (experiences & expectations) (1h)\nLearning goals & final evaluation criteria (0.5h)\nIntroduction & overview of the topic (0.5h)\nIntroduction to natural language processing & setup of the development environment (4h)\nIntroduction to LLM & getting to know the OpenAI API: Part 1 (2h)" - }, - { - "objectID": "slides/about/intro.html#day-2-25.04.2024", - "href": "slides/about/intro.html#day-2-25.04.2024", - "title": "Sprint: Large Language Models", - "section": "Day 2 (25.04.2024):", - "text": "Day 2 (25.04.2024):\n\nIntroduction to LLM & getting to know the OpenAI API: Part 2 (3h)\nPrompting (1h)\nEmbeddings (2h)\nGroup brainstorming session: Designing a simple app concept involving GPT (2h)\n\n→ At home until next week: refine project ideas (1h)" - }, - { - "objectID": "slides/about/intro.html#day-3-30.04.2024", - "href": "slides/about/intro.html#day-3-30.04.2024", - "title": "Sprint: Large Language Models", - "section": "Day 3 (30.04.2024):", - "text": "Day 3 (30.04.2024):\n\nAdvanced GPT-related topics (1h)\nBusiness-related topics (1h)\nTeam building for hackathon → develop app concepts (1h)\nWork on prototypes (5h)" - }, - { - "objectID": "slides/about/intro.html#day-4-02.05.2024", - "href": "slides/about/intro.html#day-4-02.05.2024", - "title": "Sprint: Large Language Models", - "section": "Day 4 (02.05.2024):", - "text": "Day 4 (02.05.2024):\n\nFinal touches for the prototypes (3h)\nPresentation of app prototypes, peer feedback & evaluation (2h)\nReflections on the seminar (1h)\nEthics & data privacy considerations (backup)" + "objectID": "slides/about/intro.html#content-3", + "href": "slides/about/intro.html#content-3", + "title": "Seminar: Large Language Models", + "section": "Content 3", + "text": "Content 3\n\nIntroduction to the OpenAI API\n\nPrompting\nParameterization\nFunction calling\n\nIntroduction to embeddings\n\nSimilarity\nVisualization & Clustering\n\n(Ethics & Privacy)" }, { "objectID": "slides/about/intro.html#after-the-seminar-1d", "href": "slides/about/intro.html#after-the-seminar-1d", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "After the seminar (~1d):", "text": "After the seminar (~1d):\n\nPrototype refinement\nCode review & documentation\nRefine business case & potential applications of prototype\nReflections & lessons learned → Hand in 2-page summary" }, { "objectID": "slides/about/intro.html#evaluation", "href": "slides/about/intro.html#evaluation", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "Evaluation", "text": "Evaluation\n\nYour presentation on the last day of the seminar: 25%\nYour prototype: 35%\nYour summary: 25%\nYour activity during the seminar: 15%" }, { "objectID": "slides/about/intro.html#what-is-the-summary", "href": "slides/about/intro.html#what-is-the-summary", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "What is the summary?", "text": "What is the summary?\n\n2-3 pages only!\nWhat is your prototype? What can I do?\nWhat could be a business case for your prototype, or where can it be applied?\nWhat are current limitations of your prototype and how could you overcome them?\nWhat have been your main learnings during the creation of your prototype (and/or) the seminar itself?" }, { "objectID": "slides/about/intro.html#jupyterlab-exercises", "href": "slides/about/intro.html#jupyterlab-exercises", - "title": "Sprint: Large Language Models", + "title": "Seminar: Large Language Models", "section": "Jupyterlab & Exercises", "text": "Jupyterlab & Exercises\n \nJupyterlab\nTo get started right away, we have prepared a Jupyterlab!\n \nExercises\nAll exercises can be solved in the Jupyterlab, all packages and datasets are pre-installed!\n\n\n\n\nSeminar: LLM, WiSe 2024/25" }, diff --git a/docs/slides/about/intro.html b/docs/slides/about/intro.html index 4fdb049..08b5e97 100644 --- a/docs/slides/about/intro.html +++ b/docs/slides/about/intro.html @@ -10,7 +10,7 @@ - Sprint: Large Language Models + Seminar: Large Language Models @@ -327,11 +327,22 @@
      -

      Sprint: Large Language Models

      +

      Seminar: Large Language Models

      +
      +
      +

      Why should I be here?

      +
      +
        +
      • Understand how LLMs work and where their limitations are!
      • +
      • Learn how to automate tasks with a language model
      • +
      • Learn how to process large data amounts with a language model
      • +
      • Learn how to use a large language model in a product
      • +
      +

      About the seminar

      @@ -365,18 +376,38 @@

      About the seminar

-
-

Goal

+
+

Intended learning outcomes (Part 1)

+
    +
  • Understand the basics of natural language processing, including its tasks and challenges
  • +
  • Understand the general concept of LLMs and why it makes the above so much easier
  • +
  • Write simple Python programs / scripts and use basic data and control structures
  • +
+
+
+

Intended learning outcomes (Part 2)

    -
  • 4-day introduction to language models
  • -
  • Aim: Simple overview to start working with language model APIs
  • -
  • Understand basics, know how to apply in own applications
  • -
  • Have fun!
  • +
  • Access an LLM via the OpenAI API and how to work with the result

  • +
  • Understand the concept of text embeddings and use them via the OpenAI API

  • +
  • Understand why and how LLMs can be used for process automation

  • +
  • Use an LLM in a small application

  • +
  • Have fun!

Content 1

    +
  • Short introduction to Programming in Python in the context of Natural Language Processing and Large Language Models +
      +
    • Basics (Syntax, Variables, Data Types, Conditional Statements etc.)
    • +
    • Lists & Loops
    • +
    • Dictionaries & Classes
    • +
  • +
+
+
+

Content 2

+
  • Quick overview of classic NLP
    • Text processing (Tokenization, Lemmatization, etc.)
    • @@ -390,8 +421,8 @@

      Content 1

-
-

Content 2

+
+

Content 3

  • Introduction to the OpenAI API
      @@ -412,42 +443,26 @@

      Content 2

      The schedule

-
-

Day 1 (24.04.2024):

-
    -
  • Getting to know each other + intro survey (experiences & expectations) (1h)
  • -
  • Learning goals & final evaluation criteria (0.5h)
  • -
  • Introduction & overview of the topic (0.5h)
  • -
  • Introduction to natural language processing & setup of the development environment (4h)
  • -
  • Introduction to LLM & getting to know the OpenAI API: Part 1 (2h)
  • -
-
-
-

Day 2 (25.04.2024):

-
    -
  • Introduction to LLM & getting to know the OpenAI API: Part 2 (3h)
  • -
  • Prompting (1h)
  • -
  • Embeddings (2h)
  • -
  • Group brainstorming session: Designing a simple app concept involving GPT (2h)
  • -
-

→ At home until next week: refine project ideas (1h)

-
-
-

Day 3 (30.04.2024):

+
+
    -
  • Advanced GPT-related topics (1h)
  • -
  • Business-related topics (1h)
  • -
  • Team building for hackathon → develop app concepts (1h)
  • -
  • Work on prototypes (5h)
  • +
  • Today: Intro & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria
  • +
  • Introduction to the general topic & Python & Jupyter
  • +
  • Introduction NLP (tokenization, matching, statistical analysis)
  • +
  • Introduction to LLM & OpenAI API
  • +
  • Prompting
  • +
  • Embeddings
-
-

Day 4 (02.05.2024):

+
+
    -
  • Final touches for the prototypes (3h)
  • -
  • Presentation of app prototypes, peer feedback & evaluation (2h)
  • -
  • Reflections on the seminar (1h)
  • -
  • Ethics & data privacy considerations (backup)
  • +
  • Advanced GPT topics (image data, parameterization, tool calling)
  • +
  • Real-world examples of applications (& implementation) & limitations
  • +
  • App concept & Group brainstorming
  • +
  • Project work on prototype & mentoring
  • +
  • Project presentations & reflections on the seminar
  • +
  • Backup: Ethics and data privacy
diff --git a/index.qmd b/index.qmd index 3664da2..3aeb2fa 100644 --- a/index.qmd +++ b/index.qmd @@ -58,8 +58,20 @@ Eventually, we will have a quick look into some other applications of embeddings Have fun! -### The schedule -TBD +### A rough schedule +- Introduction & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria +- Introduction to the general topic & Python & Jupyter +- Introduction NLP (tokenization, matching, statistical analysis) +- Introduction to LLM & OpenAI API +- Prompting +- Embeddings +- Advanced GPT topics (image data, parameterization, tool calling) +- Real-world examples of applications (& implementation) & limitations +- *App concept & Group brainstorming* +- *Project work on prototype & mentoring* +- *Project presentations* & reflections on the seminar +- Backup: Ethics and data privacy + #### After the seminar (~1d): - Prototype refinement diff --git a/llm/gpt_api.qmd b/llm/gpt_api.qmd index 72c2df1..9470e9d 100644 --- a/llm/gpt_api.qmd +++ b/llm/gpt_api.qmd @@ -44,7 +44,7 @@ Most interaction with GPT and other models consist in generating completions for To request completions from the OpenAI API, we use Python to send HTTP requests to the designated API endpoint. These requests are structured to include various parameters that guide the generation of text completions. The most fundamental parameter is the prompt text, which sets the context for the completion. -Additionally, you can specify the desired model configuration, such as the engine to use (e.g., "gpt-4"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity (TODO: Link parameterization) +Additionally, you can specify the desired model configuration, such as the engine to use (e.g., "gpt-4"), as well as any constraints or preferences for the generated completions, such as the maximum number of tokens or the temperature for controlling creativity ([here](../llm/parameterization.qmd)). ```{python} # creating a completion diff --git a/python_intro/overview.qmd b/python_intro/overview.qmd index 3a757d1..e63b941 100644 --- a/python_intro/overview.qmd +++ b/python_intro/overview.qmd @@ -6,8 +6,6 @@ format: jupyter: python3 --- -# TODO: Some introduction - # Installing Python on Windows and macOS @@ -40,6 +38,7 @@ jupyter: python3 - Open the Terminal application (you can find it using Spotlight Search by pressing `Command + Space` and typing "Terminal"). - Type `python3 --version` and press Enter. You should see the installed version of Python. + ### Additional Setup (Optional) After installing Python, it’s a good idea to install **pip**, Python's package manager, which is included by default in the latest Python versions. You can use pip to install additional libraries and packages as needed. diff --git a/resources/apis.qmd b/resources/apis.qmd index 6e978b6..6f4bc60 100644 --- a/resources/apis.qmd +++ b/resources/apis.qmd @@ -6,8 +6,6 @@ format: jupyter: python3 --- -Certainly! Here's the list with each title linked to its respective website, followed by the description: - 1. [**Google Cloud Natural Language API**](https://cloud.google.com/natural-language) Google Cloud Natural Language API offers a suite of powerful natural language processing capabilities, including sentiment analysis, entity recognition, and syntax analysis. While it may not provide pre-trained large language models like GPT, it offers robust support for various NLP tasks through its RESTful API. 2. [**Microsoft Azure Cognitive Services - Text Analytics**](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) Azure Cognitive Services offers Text Analytics, a set of APIs for analyzing unstructured text. It provides functionalities such as sentiment analysis, key phrase extraction, language detection, and entity recognition. While it doesn't offer large pre-trained language models, it's suitable for various text analysis tasks and integrates well with other Azure services. diff --git a/resources/exercises.qmd b/resources/exercises.qmd index ed4f1d4..2e3adb0 100644 --- a/resources/exercises.qmd +++ b/resources/exercises.qmd @@ -6,6 +6,21 @@ format: jupyter: python3 --- +#### Introduction to Python +[Exercise: Data types](../python_intro/exercises/data_types.ipynb) + +[Exercise: String manipulations](../python_intro/exercises/strings.ipynb) + +[Exercise: Lists and loops](../python_intro/exercises/lists_and_loops.ipynb) + +[Exercise: Conditional statements](../python_intro/exercises/conditional_statements.ipynb) + +[Exercise: Functions](../python_intro/exercises/functions.ipynb) + +[Exercise: Dictionaries](../python_intro/exercises/dictionaries.ipynb) + +[Exercise: Classes](../python_intro/exercises/classes.ipynb) + #### Natural Language Processing [Exercise: Sentence tokenization](../nlp/exercises/ex_tokenization.ipynb) diff --git a/slides/about/intro.qmd b/slides/about/intro.qmd index 02ce6b4..ace2a0d 100644 --- a/slides/about/intro.qmd +++ b/slides/about/intro.qmd @@ -1,5 +1,5 @@ --- -title: "Sprint: Large Language Models" +title: "Seminar: Large Language Models" format: revealjs: theme: default @@ -8,6 +8,16 @@ format: logo: ../../assets/logo.svg --- +## Why should I be here? + +::: {.incremental} +- Understand how LLMs work and where their limitations are! +- Learn how to automate tasks with a language model +- Learn how to process large data amounts with a language model +- Learn how to use a large language model in a product +::: + + ## About the seminar - Roughly divided into 3 parts: **theory**, **training**, and **application** - **Theory**: @@ -30,14 +40,29 @@ format: - Small application involving a language model -## Goal -- 4-day introduction to language models -- Aim: Simple overview to start working with language model APIs -- Understand basics, know how to apply in own applications +## Intended learning outcomes (Part 1) +- Understand the basics of natural language processing, including its tasks and challenges +- Understand the general concept of LLMs and why it makes the above so much easier +- Write simple Python programs / scripts and use basic data and control structures + + +## Intended learning outcomes (Part 2) +- Access an LLM via the OpenAI API and how to work with the result +- Understand the concept of text embeddings and use them via the OpenAI API +- Understand why and how LLMs can be used for process automation +- Use an LLM in a small application + - **Have fun!** ## Content 1 +- Short introduction to Programming in Python **in the context** of Natural Language Processing and Large Language Models + - Basics (Syntax, Variables, Data Types, Conditional Statements etc.) + - Lists & Loops + - Dictionaries & Classes + + +## Content 2 - Quick overview of classic NLP - Text processing (Tokenization, Lemmatization, etc.) - Applications (Classification, Sentiment Analysis, Matching, etc.) @@ -46,7 +71,7 @@ format: - Text processing with neural networks - Sequence generation & language modeling -## Content 2 +## Content 3 - Introduction to the OpenAI API - Prompting - Parameterization @@ -56,34 +81,27 @@ format: - Visualization & Clustering - (Ethics & Privacy) + # The schedule -## Day 1 (24.04.2024): -- Getting to know each other + intro survey (experiences & expectations) (1h) -- Learning goals & final evaluation criteria (0.5h) -- Introduction & overview of the topic (0.5h) -- Introduction to natural language processing & setup of the development environment (4h) -- Introduction to LLM & getting to know the OpenAI API: Part 1 (2h) - -## Day 2 (25.04.2024): -- Introduction to LLM & getting to know the OpenAI API: Part 2 (3h) -- Prompting (1h) -- Embeddings (2h) -- Group brainstorming session: Designing a simple app concept involving GPT (2h) - -→ At home until next week: refine project ideas (1h) - -## Day 3 (30.04.2024): -- Advanced GPT-related topics (1h) -- Business-related topics (1h) -- Team building for hackathon → develop app concepts (1h) -- Work on prototypes (5h) - -## Day 4 (02.05.2024): -- Final touches for the prototypes (3h) -- Presentation of app prototypes, peer feedback & evaluation (2h) -- Reflections on the seminar (1h) -- Ethics & data privacy considerations (backup) +--- + +- **Today**: Intro & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria +- Introduction to the general topic & Python & Jupyter +- Introduction NLP (tokenization, matching, statistical analysis) +- Introduction to LLM & OpenAI API +- Prompting +- Embeddings + +--- + +- Advanced GPT topics (image data, parameterization, tool calling) +- Real-world examples of applications (& implementation) & limitations +- *App concept & Group brainstorming* +- *Project work on prototype & mentoring* +- *Project presentations* & reflections on the seminar +- Backup: Ethics and data privacy + ## After the seminar (~1d): - Prototype refinement