index.qmd

---
title: "Seminar: Large Language Models"
format:
  html:
    code-fold: true
jupyter: python3
---

![Robot by DALL-E](assets/dall-e-robot.jpeg){width=350 fig-align="left"}


Hello and welcome to the seminar **Large Language Models** in the winter semester of 2024/25 at the University of Applied Sciences in Münster.
On this website, you will find all the information you need about and around the seminar. 


### About the seminar
The seminar is roughly divided into 3 parts of equal size: theory, training and application. 
In the theoretical part, you will learn about the most important topics and ideas when it comes to natural language processing and large language models. 
We will discuss topics like tokenization, matching, statistical text analysis and embeddings to get you started before eventually dealing with large language models and their applications themselves.
Already during the theory, we will make sure to code in `Python` alongside all the concepts and see coding examples to get familiar with it.

After each small input session on a new topic, we will get to some hands-on training so that you can consolidate the knowledge you just acquired. 
You will solve a few (coding) exercises around all the topics yourselves. 
To get everyone fired up as quickly as possible, we have prepared a [Jupyterlab](https://jupyter.fh-muenster.de/){.external} environment that everyone can use for the solution of the exercises.

In the final part of the seminar we will go ahead and apply our newly acquired knowledge in our own projects.
All participants will team up in teams of 2-3 and try to develop and implement their own little prototype for a small application involving a language model.
More information and ideas for these projects can be found [here](about/projects.qmd).

By the way, you can (and maybe absolutely should) use a language model like ChatGPT also during this seminar and the solution of some of the exercises. 
However, feel encouraged to try for yourselves first, and make sure you have understood the solution of a language model if you use it.


### How to use this script
This script is meant to give a comprehensive overview right away from the start.
Feel free to browse it even before we have reached a specific topic, in particular, if you already have some prior knowledge in the topic. 
All exercises that we will solve together in this seminar are contained in this script as well, *including their solution*. 
For all exercises, the (or more precisely, a) solution is hidden behind a *Show solution* button. 
For the sake of your own learning process, try to solve the exercises yourselves first!
If you're stuck, ask for a quick hint. 
If you still feel like you do not advance any more, *then* check out the solution and try to understand it. 
The solution of the exercises is not part of the evaluation, so it's really for your own progress!
A "summary" of all exercises can be found [here](/resources/exercises.qmd).

:::::: {.callout-important}
A small disclaimer: This script is not (yet) ridiculously comprehensive.
And, of course, we cannot cover the full realm of NLP and LLM within a 4-days-course. However, you should find everything we will do in the seminar also in this script. If there is something missing, I will make sure to include it as soon as possible, just give me a note. 
:::


### What you will learn
As this seminar is meant to be an introduction to understanding and working with language models, so we can obviously not cover everything and offer deep insights into all the details. 
Instead, we aim to give you a simple overview of all the necessities to start working with language models APIs and understand why things are working the way they do and how you can apply them in your own applications. 
The content can already be seen from the navigation bar, but here's a quick walk-through.
More precisely, we will walk you through a quick history of natural language processing with some of its challenges and limitations, and introduce you to text processing and analysis techniques such as tokenization, term frequency or bag of words as well as applications such as text classification or sentiment analysis.
Afterwards, we will give a short introduction to how modern large language models approach these with more sophisticated techniques based on neural networks and vast amounts of training data, before getting more hands-on with the language model API by OpenAI.
Eventually, we will have a quick look into some other applications of embeddings, before quickly discussing some of the ethical considerations when working with language models.
Have fun! 


### A rough schedule
- Introduction & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria
- Introduction to the general topic & Python & Jupyter 
- Introduction NLP (tokenization, matching, statistical analysis)
- Introduction to LLM & OpenAI API
- Prompting
- Embeddings
- Advanced GPT topics (image data, parameterization, tool calling)
- Real-world examples of applications (& implementation) & limitations
- *App concept & Group brainstorming*
- *Project work on prototype & mentoring*
- *Project presentations* & reflections on the seminar
- Backup: Ethics and data privacy 


#### After the seminar (~1d):
  - Prototype refinement
  - Code review & documentation
  - Refine business case & potential applications of prototype
  - Reflections & lessons learned
→ *Hand in 2-page summary*


### Evaluation
All seminar participants will be evaluated in the following way.

- Your presentation on the last day of the seminar: 25%
- Your prototype: 35%
- Your summary: 25%
- Your activity during the seminar: 15%

I will allow myself to give your evaluation a little extra boost for good activity during the seminar. 
This seminar is designed for everyone to participate, so the more you do, the more fun it will be! 

#### What is the summary? 
As mentioned above, to finalize our seminar I want to you to take roughly a day to refine your prototype and then write a quick summary your project and your learnings.
The summary should be 2-3 pages only (kind of like a small executive summary) and contain the following information:
- What is your prototype? What can I do? 
- What could be a business case for your prototype, or where can it be applied?
- What are current limitations of your prototype and how could you overcome them?
- What have been your main learnings during the creation of your prototype (and/or) the seminar itself?

Just hand it in within a couple of weeks after the seminar, it will be a part of your evaluation.


:::::: {.callout-note}
Has this seminar been created with a little help of language models? Absolutely, why wouldn't it? :)
:::