Open LLM Models List

Due to projects like Explore the LLMs specializing in model indexing, the custom list has been removed.

Noteworthy

Cerebras GPT-13b (release notes)
LAION OpenFlamingo | Multi Modal Model and training architecture
GeoV/GeoV-9b - 9B parameter, in-progress training to 300B tokens (33:1)
RWKV: Parallelizable RNN with Transformer-level LLM Performance
CodeGeeX 13B | Multi Language Code Generation Model
BigCode | Open Scientific collaboration to train a coding LLM
MOSS by Fudan University a 16b Chinese/English custom foundational model with additional models fine tuned on sft and plugin usage
mPLUG-Owl Multimodal finetuned model for visual/language tasks
Multimodal-GPT multi-modal visual/language chatbot, using llama with custom LoRA weights and openflamingo-9B.
Visual-med-alpaca fine-tuning llama-7b on self instruct for the biomedical domain. Models locked behind a request form.
replit-code focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
VPGTrans Transfer Visual Prompt Generator across LLMs and the VL-Vicuna model is a novel VL-LLM. Paper, code
salesforce/CodeT5 code assistant, has released their codet5+ 16b and other model sizes
baichuan-7b Baichuan Intelligent Technology developed baichuan-7B, an open-source language model with 7 billion parameters trained on 1.2 trillion tokens. Supporting Chinese and English, it achieves top performance on authoritative benchmarks (C-EVAL, MMLU)
ChatGLM2-6B v2 of the GLM 6B open bilingual EN/CN model
sqlcoder 15B parameter model that outperforms gpt-3.5-turbo for natural language to SQL generation tasks
CodeShell code LLM with 7b parameters trained on 500b tokens, context length of 8k outperforming CodeLlama and Starcoder on humaneval, weights
SauerkrautLM-13B-v1 fine tuned llama-2 13b on a mix of German data augmentation and translations, SauerkrautLM-7b-v1-mistral German SauerkrautLM-7b fine-tuned using QLoRA on 1 A100 80GB with Axolotl
em_german_leo_mistral LeoLM Mistral fine tune of LeoLM with german instructions
leo-hessianai-13b-chat-bilingual based on llama-2 13b is a fine tune of the base leo-hessianai-13b for chat
WizardMath-70B-V1.0 SOTA Mathematical Reasoning
Mistral-7B-german-assistant-v3 finetuned version for german instructions and conversations in style of Alpaca. "### Assistant:" "### User:", trained with a context length of 8k tokens. The dataset used is deduplicated and cleaned, with no codes inside. The focus is on instruction following and conversational tasks
HelixNet Mixture of Experts with 3 Mistral-7B, LoRA, HelixNet-LMoE optimized version
llmware RAG models small LLMs and sentence transformer embedding models specifically fine-tuned for RAG workflows
openchat Advancing Open-source Language Models with Mixed-Quality Data
deepseek-coder code language models, trained on 2T tokens, 87% code 13% English / Chinese, up to 33B with 16K context size achieving SOTA performance on coding benchmarks
Poro SiloGen model checkpoints of a family of multilingual open source LLMs covering all official European languages and code, news
Mixtral of experts A high quality Sparse Mixture-of-Experts.
meditron 7B and 70B Llama2 based LLM fine tuning adapted for the medical domain
SeaLLM multilingual LLM for Southeast Asian (SEA) languages 🇬🇧 🇨🇳 🇻🇳 🇮🇩 🇹🇭 🇲🇾 🇰🇭 🇱🇦 🇲🇲 🇵🇭
seamlessM4T v2 Multimodal Audio and Text Translation between many languages
aya-101 13b model fine tuned open acess multilingual LLM from Cohere For AI
SLIM Model Family Small Specialized Function-Calling Models for Multi-Step Automation, focused on enterprise RAG workflows
Smaug-72B Based on Qwen-72B and MoMo-72B-Lora then finetuned by Abacus.AI, is the best performing Open LLM on the HF leaderboard by Feb-2024
AI21 Jamba production-grade Mamba-based hybrid SSM-Transformer Model licensed under Apache 2.0 with 256K context and 52B MoE at 12B each
command-r 35B optimized for retrieval augmented generation (RAG) and tool use supporting Embed and Rerank methodology. model weights
StarCoder2 15B, 7B and 3B code completion models trained on The Stack v2
command-r-plus a 104B model with highly advanced capabilities including RAG and tool use for English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese
DBRX base and instruct MoE models from databricks with 132B total parameters and a larger number of smaller experts supporting RoPE and 32K context size
grok-1 314b MoE model by xAI
Mixtral-8x22B-v0.1 Sparse MoE model with 176B total and 44B active parameters, 65k context size
aiXcoder 7B Code LLM for code completion, comprehension, generation
WizardLM-2-7B Microsoft's WizardLM 2 7B, release for 70B coming up backup0
WizardLM-2-8x22B Microsoft's WizardLM 2 8x22B beating gpt-4-0314 on MT-Bench
Mixtral-8x22B-Instruct-v0.1 an instruct fine-tuned version of the Mixtral-8x22B-v0.1
wavecoder-ultra-6.7b covering four general code-related tasks: code generation, code summary, code translation, and code repair
GemMoE An 8x8 Mixture Of Experts based on Gemma
Granite family of Code Models from IBM with 3b, 8b, 20b, 34b, base and instruct models for code completion and chat
DeepSeek-V2 21B Strong, Economical, and Efficient Mixture-of-Experts Language Model
Yuan2-M32 Mixture of Experts with Attention Router, 32 Experts, 2 Active, TOtal 40B parameters, 3.7B active and max length of 16K
CodeStral-22B Coding model trained on 80+ languages with instruct and Fill in the Middle tasks, 32k max context
Mistral-7b-instruct-v0.3 with function calling, new tokenizer and 32k max context
Aya-23 8B and 35B instruction tuned multi lingual model focusing on 23 languages
Mamba-Codestral by mistral based on the Mamba2 architecture performing on par with SOTA transformer based code models
CodeGeeX4 9B multilingual code generation model for chat and instruct with a 128k context length
Mistral Nemo a 12B model by mistral and nvidia offering 128k context window offered as instruct and base models
Nuextract is a structure extraction model based on phi-3-mini, allowing to instruct based on a json template that the model fills from unstructured text provided
Llama-3.1 Metas most advanced model providing 8b, 70b and 405b base and instruction tuned models and 128k context window with on par quality of current SOTA closed source models
Mistral-Large a 123B sized model beating llama-3.1 and gpt-4o in several categories with a focus on multilinguality, coding, agentic tasks and reasoning.
InternLM2.5 7B base and chat models focusing reasoning, math and tool use and 1M context window
Yi-1.5 9b model focusing on multilingual text understanding, available as 9B and 34B variants
Phi Microsoft's small language and vision models with small and medium parameter sizes, short and long context lengths and great performance
Qwen2 English and Chinese models from 0.5b, 1.5b, 7b, and 72b sizes with great performance and 128k context windows for the 7 and 72b models
codeqwen1.5 base and chat models with 7B parameters and good quality
grantie IBMs code models available in 3b, 8b, 20b size as base and instruct variants with up to 128k context size
codegemma google's coding models from 2b base, 7b base and 7b instruct
DeepSeekCoderv2 16b and 236b mixture of experts coding models with 128k context length
gemma2 2b 2b small language model by google achieving SOTA performance for sub 3b models on LLM Leaderboard 2
llama-3.2 small and medium sized vision LLMs in 11b and 90b and text only 1b and 3b models by Meta
Pixtral 12B LLM with a 400M vision encoder for multi modal image and text inference and 128k sequence length by Mistral
reader-lm Jina AI's LLM to convert HTML to Markdown, making heuristics, cleanup and content identification an LLM task
Zamba2 a 7B SOTA SML for running on-device with 25% faster first token time and 20% token per second rate compared to other architectures using Mamba2 blocks interleaved shared attention blocks and LoRA shared MLP block
ichigo an open research project extending text-based llama3 to have native "listening" ability, using an early fusion technique, with improved multiturn capabilities and refusal to process inaudible queries
CursorCore Coding LLMs for use within CursorCore and CursorWeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-model-list.md

llm-model-list.md

Open LLM Models List

Noteworthy

Files

llm-model-list.md

Latest commit

History

llm-model-list.md

File metadata and controls

Open LLM Models List

Noteworthy