- You can create anything you want but it must involve computational science.
- You will be judgged primarially on innovation, inginuity and creativity, as well as the other criteria mentioned here like your final oral presentation.
A loose collection of example workflows to kickstart hackathon projects. Each of these are just starting points that need significant work and focus to make real.
-
Computational genomics
- The notebook at https://colab.research.google.com/drive/1YoVUzH7251VAzCoG3uXLSetfZ-W60gaw steps users through a simplified version of "Case Study 1" from https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000583.
- Tasks:
- Convert the notebook to a workflow description that can be executed on Delta.
- Modify the workflow so that more steps can be run in parallel (for example, the clustering steps).
- Find the optimal number of clusters for the clustering steps.
- Explore alternative algorithms for clustering and feature selection.
-
Enron dataset task
- Enron email dataset in kaggle https://www.kaggle.com/datasets/wcukierski/enron-email-dataset/data .
- Tasks :
- Get the count of user sent emails per year and plot it
- NetworkX graph from emails
- RAG pipeline
- Find the most important person and summarize all the emails
-
Letterbox dataset
- Letterboxd dataset is a more cleaner and elaborate IMDb dataset https://www.kaggle.com/datasets/gsimonx37/letterboxd/data
- Has posters (image data) and multiple csv tabular data
- Devise some tasks that might combine multiple data
-
DocVQA
- massive dataset of visual docs for QA. https://rrc.cvc.uab.es/?ch=17&com=introduction
- Define some tasks based on this dataset
-
Astronomical Image processing workflow
- Most astronomical deep-sky imagery needs some processing.
- Checkout astronomical-pipeline directory in this repo for a working pipeline and related tasks
- LSST datasets, some tools and some tutorials https://github.com/lsst/dp0-2_lsst_io/tree/main
- European Org for Astronomical Research Data processing pipeline https://www.eso.org/sci/software/edps.html https://ftp.eso.org/pub/dfs/pipelines/libraries/edps/edps_tutorial0.9.pdf
-
Astronomy question set
- Checkout astronomy-question-set for astronomy related problem statements
NCSA has a new project to host LLMs that are directly compatible with the OpenAI API.
- API Docs: https://docs.ncsa.ai/
- Playground (experimental: no guarentee that all features work): https://ncsa.ai/
Come see a hackathon organizer and we can provide Azure OpenAI API keys. These are generously subsidised by Microsoft Research. This has access to GPT-4 Turbo, etc. We only have the Azure version, not the regular OpenAI version.
3. UIUC.chat - RAG llm API
The UIUC.chat API allows you to upload many types of documents and chat with them. The API will return "answers that are grounded in your documents" much like Perplexity.ai.
Please email me ([email protected]) if you have any questions or problems! Just a quick casual, email is great, low pressure.
- UIUC.chat https://www.uiuc.chat/
- API docs: https://docs.uiuc.chat/uiuc.chat-api/api-keys
- Tutorial & highlights: https://www.youtube.com/watch?v=IIMCrIoz7LM&ab_channel=KastanDay
Usage:
- Make an account with your Illinois email.
- Create a new project by uploading documents
- This requires supplying an Azure OpenAI key (see above). Enter it on the "Materials" page under Project-wide OpenAl key before continuing.
- Try chatting with your documents on the website, then try via the API.
This is favorite LLM provider, they have a generous free tier, high rate limits, and leading-class features like function calling and json mode.
You'll have to create your own account. I recommend using the models Mistral
and Mixtral
.
- Function calling blog / explainer: https://www.anyscale.com/blog/anyscale-endpoints-json-mode-and-function-calling-features
- Function calling docs: https://docs.endpoints.anyscale.com/text-generation/function-calling
- JSON mode docs: https://docs.endpoints.anyscale.com/text-generation/json-mode/
- OpenAI docs (good to read important notes!): https://platform.openai.com/docs/guides/text-generation/json-mode