created exercise summary and some cleanup

Kubus42 · Apr 7, 2024 · 2b6f45d · 2b6f45d
1 parent fa68f17
commit 2b6f45d
Show file tree

Hide file tree

Showing 36 changed files with 932 additions and 1,179 deletions.
diff --git a/_freeze/exercises/execute-results/html.json b/_freeze/exercises/execute-results/html.json
@@ -0,0 +1,12 @@
+{
+  "hash": "84a442cdb5c872d89b43e3a246e1467f",
+  "result": {
+    "engine": "jupyter",
+    "markdown": "---\ntitle: List of exercises\nformat:\n  html:\n    code-fold: true\n---\n\n#### Natural Language Processing \n[Exercise: Sentence tokenization](nlp/exercises/ex_tokenization.ipynb)\n\n[Exercise: TF-IDF](nlp/exercises/ex_tfidf.ipynb)\n\n[Exercise: Word matching](nlp/exercises/ex_word_matching.ipynb)\n\n[Exercise: Fuzzy matching](nlp/exercises/ex_fuzzy_matching.ipynb)\n\n\n#### Large Language Models with OpenAI\n[Exercise: OpenAI - Getting started](llm/exercises/ex_gpt_start.ipynb)\n\n[Exercise: GPT Chatbot](llm/exercises/ex_gpt_chatbot.ipynb)\n\n[Exercise: GPT Parameterization](llm/exercises/ex_gpt_parameterization.ipynb)\n\n[Exercise: NER with tool calling](llm/exercises/ex_gpt_ner_with_function_calls.ipynb)\n\n\n#### Embeddings \n[Exercise: Embedding similarity](embeddings/exercises/ex_emb_similarity.ipynb)\n\n",
+    "supporting": [
+      "exercises_files"
+    ],
+    "filters": [],
+    "includes": {}
+  }
+}
diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "34afe4aa345097b76cbd20ee09f16946",
+  "hash": "3f1db2b28c2828864960a4ba9bafd0a0",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: 'Sprint: Large Language Models'\nformat:\n  html:\n    code-fold: true\n---\n\n![Robot by DALL-E](assets/dall-e-robot.jpeg){width=350 fig-align=\"left\"}\n\n\nHello and welcome to the sprint seminar **Large Language Models** in the summer semester of 2024 at the University of Applied Sciences in Münster.\nOn this website, you will find all the information you need about and around the seminar. \n\n\n### About the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. \nIn the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. \nWe will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models themselves and their applications themselves.\nAlready during the theory, we will make sure to code in `Python` alongside all the concepts and see coding examples to get familiar with it.\n\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. \nYou will solve a few (coding) exercises around all the topics yourselves. \nTo get everyone fired up as quickly as possible, we have prepared a [Jupyterlab](https://jupyter.fh-muenster.de/){.external} environment that everyone can use for the solution of the exercises.\n\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in *our own projects*.\nAll participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model.\nMore information and ideas for these projects can be found [here](about/projects.qmd).\n\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. \nHowever, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\n### How to use this script\nThis script is meant to give a comprehensive overview right away from the start.\nFeel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. \nAll exercises that we will solve together in this seminar are contained in this script as well, *including their solution*. \nFor all exercises, the (or more precisely, a) solution is hidden behind a *Show solution* button. \nFor the sake of your own learning process, try to solve the exercises yourselves first!\nIf you're stuck, ask for a quick hint. \nIf you still feel like you do not advance any more, *then* check out the solution and try to understand it. \nThe solution of the exercises is not part of the evaluation, so it's really for your own progress!\nA \"summary\" of all exercises can be found in (TODO: Link).\n\n:::::: {.callout-important}\nA small disclaimer: As this is the first round of the seminar, this script is not (yet) ridiculously comprehensive.\nAnd, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note. \n:::\n\n\n### What you will learn\nTODO: When finalized, do a quick summary here.\n\n\n### The schedule\nThis seminar is spread over 4 days of roughly 8 hours, of course with some breaks and modifications if we need them.\nThe schedule for this semester is the following (the included hours are just some estimations):\n\n#### Day 1 (24.04.2024):\n  - Getting to know each other + intro survey (experiences & expectations) (1h)\n  - Learning goals & final evaluation criteria (0.5h)\n  - Introduction & overview of the topic (0.5h)\n  - Introduction to natural language processing & setup of the development environment (4h)\n  - Introduction to LLM & getting to know the OpenAI API: Part 1 (2h)\n\n#### Day 2 (25.04.2024):\n  - Introduction to LLM & getting to know the OpenAI API: Part 2 (3h)\n  - Prompting (1h)\n  - Embeddings (2h)\n  - Group brainstorming session: Designing a simple app concept involving GPT (2h)\n  \n→ At home until next week: refine project ideas (1h)\n\n#### Day 3 (30.04.2024):\n  - Advanced GPT-related topics (1h)\n  - Business-related topics (1h)\n  - Team building for hackathon → develop app concepts (1h)\n  - Work on prototypes (5h)\n\n#### Day 4 (02.05.2024):\n  - Final touches for the prototypes (3h)\n  - Presentation of app prototypes, peer feedback & evaluation (2h)\n  - Reflections on the seminar (1h)\n  - Ethics & data privacy considerations (backup)\n\n#### After the seminar (~1d):\n  - Prototype refinement\n  - Code review & documentation\n  - Refine business case & potential applications of prototype\n  - Reflections & lessons learned\n→ *Hand in 2-page summary*\n\n\n### Evaluation\nAll seminar participants will be evaluated in the following way.\n\n- Your presentation on the last day of the seminar: 25%\n- Your prototype: 35%\n- Your summary: 25%\n- Your activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. \nThis seminar is designed for everyone to participate, so the more you do, the more fun it will be! \n\n\n:::::: {.callout-note}\nHas this seminar been created with a little help of language models? Absolutely, why wouldn't it? :)\n:::\n\n",
+    "markdown": "---\ntitle: 'Sprint: Large Language Models'\nformat:\n  html:\n    code-fold: true\n---\n\n![Robot by DALL-E](assets/dall-e-robot.jpeg){width=350 fig-align=\"left\"}\n\n\nHello and welcome to the sprint seminar **Large Language Models** in the summer semester of 2024 at the University of Applied Sciences in Münster.\nOn this website, you will find all the information you need about and around the seminar. \n\n\n### About the seminar\nThe seminar is roughly divided into 3 parts of equal size: theory, training and application. \nIn the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. \nWe will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models themselves and their applications themselves.\nAlready during the theory, we will make sure to code in `Python` alongside all the concepts and see coding examples to get familiar with it.\n\nAfter each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. \nYou will solve a few (coding) exercises around all the topics yourselves. \nTo get everyone fired up as quickly as possible, we have prepared a [Jupyterlab](https://jupyter.fh-muenster.de/){.external} environment that everyone can use for the solution of the exercises.\n\nIn the final part of the seminar we will go ahead and apply our newly acquired knowledge in *our own projects*.\nAll participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model.\nMore information and ideas for these projects can be found [here](about/projects.qmd).\n\nBy the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. \nHowever, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.\n\n\n### How to use this script\nThis script is meant to give a comprehensive overview right away from the start.\nFeel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. \nAll exercises that we will solve together in this seminar are contained in this script as well, *including their solution*. \nFor all exercises, the (or more precisely, a) solution is hidden behind a *Show solution* button. \nFor the sake of your own learning process, try to solve the exercises yourselves first!\nIf you're stuck, ask for a quick hint. \nIf you still feel like you do not advance any more, *then* check out the solution and try to understand it. \nThe solution of the exercises is not part of the evaluation, so it's really for your own progress!\nA \"summary\" of all exercises can be found [here](exercises.qmd).\n\n:::::: {.callout-important}\nA small disclaimer: As this is the first round of the seminar, this script is not (yet) ridiculously comprehensive.\nAnd, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note. \n:::\n\n\n### What you will learn\nTODO: When finalized, do a quick summary here.\n\n\n### The schedule\nThis seminar is spread over 4 days of roughly 8 hours, of course with some breaks and modifications if we need them.\nThe schedule for this semester is the following (the included hours are just some estimations):\n\n#### Day 1 (24.04.2024):\n  - Getting to know each other + intro survey (experiences & expectations) (1h)\n  - Learning goals & final evaluation criteria (0.5h)\n  - Introduction & overview of the topic (0.5h)\n  - Introduction to natural language processing & setup of the development environment (4h)\n  - Introduction to LLM & getting to know the OpenAI API: Part 1 (2h)\n\n#### Day 2 (25.04.2024):\n  - Introduction to LLM & getting to know the OpenAI API: Part 2 (3h)\n  - Prompting (1h)\n  - Embeddings (2h)\n  - Group brainstorming session: Designing a simple app concept involving GPT (2h)\n  \n→ At home until next week: refine project ideas (1h)\n\n#### Day 3 (30.04.2024):\n  - Advanced GPT-related topics (1h)\n  - Business-related topics (1h)\n  - Team building for hackathon → develop app concepts (1h)\n  - Work on prototypes (5h)\n\n#### Day 4 (02.05.2024):\n  - Final touches for the prototypes (3h)\n  - Presentation of app prototypes, peer feedback & evaluation (2h)\n  - Reflections on the seminar (1h)\n  - Ethics & data privacy considerations (backup)\n\n#### After the seminar (~1d):\n  - Prototype refinement\n  - Code review & documentation\n  - Refine business case & potential applications of prototype\n  - Reflections & lessons learned\n→ *Hand in 2-page summary*\n\n\n### Evaluation\nAll seminar participants will be evaluated in the following way.\n\n- Your presentation on the last day of the seminar: 25%\n- Your prototype: 35%\n- Your summary: 25%\n- Your activity during the seminar: 15%\n\nI will allow myself to give your evaluation a little extra boost for good activity during the seminar. \nThis seminar is designed for everyone to participate, so the more you do, the more fun it will be! \n\n#### What is the summary? \nAs mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings.\nThe summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information:\n- What is your prototype? What can I do? \n- What could be a business case for your prototype, or where can it be applied?\n- What are current limitations of your prototype and how could you overcome them?\n- What have been your main learnings during the creation of your prototype (and/or) the seminar itself?\n\nJust hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.\n\n\n:::::: {.callout-note}\nHas this seminar been created with a little help of language models? Absolutely, why wouldn't it? :)\n:::\n\n",
     "supporting": [
       "index_files"
     ],