Skip to content

Latest commit

 

History

History
83 lines (78 loc) · 13.1 KB

llm-model-list.md

File metadata and controls

83 lines (78 loc) · 13.1 KB

🏠Home

Open LLM Models List

Due to projects like Explore the LLMs specializing in model indexing, the custom list has been removed.

Noteworthy

  • Cerebras GPT-13b (release notes)
  • LAION OpenFlamingo | Multi Modal Model and training architecture
  • GeoV/GeoV-9b - 9B parameter, in-progress training to 300B tokens (33:1)
  • RWKV: Parallelizable RNN with Transformer-level LLM Performance
  • CodeGeeX 13B | Multi Language Code Generation Model
  • BigCode | Open Scientific collaboration to train a coding LLM
  • MOSS by Fudan University a 16b Chinese/English custom foundational model with additional models fine tuned on sft and plugin usage
  • mPLUG-Owl Multimodal finetuned model for visual/language tasks
  • Multimodal-GPT multi-modal visual/language chatbot, using llama with custom LoRA weights and openflamingo-9B.
  • Visual-med-alpaca fine-tuning llama-7b on self instruct for the biomedical domain. Models locked behind a request form.
  • replit-code focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
  • VPGTrans Transfer Visual Prompt Generator across LLMs and the VL-Vicuna model is a novel VL-LLM. Paper, code
  • salesforce/CodeT5 code assistant, has released their codet5+ 16b and other model sizes
  • baichuan-7b Baichuan Intelligent Technology developed baichuan-7B, an open-source language model with 7 billion parameters trained on 1.2 trillion tokens. Supporting Chinese and English, it achieves top performance on authoritative benchmarks (C-EVAL, MMLU)
  • ChatGLM2-6B v2 of the GLM 6B open bilingual EN/CN model
  • sqlcoder 15B parameter model that outperforms gpt-3.5-turbo for natural language to SQL generation tasks
  • CodeShell code LLM with 7b parameters trained on 500b tokens, context length of 8k outperforming CodeLlama and Starcoder on humaneval, weights
  • SauerkrautLM-13B-v1 fine tuned llama-2 13b on a mix of German data augmentation and translations, SauerkrautLM-7b-v1-mistral German SauerkrautLM-7b fine-tuned using QLoRA on 1 A100 80GB with Axolotl
  • em_german_leo_mistral LeoLM Mistral fine tune of LeoLM with german instructions
  • leo-hessianai-13b-chat-bilingual based on llama-2 13b is a fine tune of the base leo-hessianai-13b for chat
  • WizardMath-70B-V1.0 SOTA Mathematical Reasoning
  • Mistral-7B-german-assistant-v3 finetuned version for german instructions and conversations in style of Alpaca. "### Assistant:" "### User:", trained with a context length of 8k tokens. The dataset used is deduplicated and cleaned, with no codes inside. The focus is on instruction following and conversational tasks
  • HelixNet Mixture of Experts with 3 Mistral-7B, LoRA, HelixNet-LMoE optimized version
  • llmware RAG models small LLMs and sentence transformer embedding models specifically fine-tuned for RAG workflows
  • openchat Advancing Open-source Language Models with Mixed-Quality Data
  • deepseek-coder code language models, trained on 2T tokens, 87% code 13% English / Chinese, up to 33B with 16K context size achieving SOTA performance on coding benchmarks
  • Poro SiloGen model checkpoints of a family of multilingual open source LLMs covering all official European languages and code, news
  • Mixtral of experts A high quality Sparse Mixture-of-Experts.
  • meditron 7B and 70B Llama2 based LLM fine tuning adapted for the medical domain
  • SeaLLM multilingual LLM for Southeast Asian (SEA) languages 🇬🇧 🇨🇳 🇻🇳 🇮🇩 🇹🇭 🇲🇾 🇰🇭 🇱🇦 🇲🇲 🇵🇭
  • seamlessM4T v2 Multimodal Audio and Text Translation between many languages
  • aya-101 13b model fine tuned open acess multilingual LLM from Cohere For AI
  • SLIM Model Family Small Specialized Function-Calling Models for Multi-Step Automation, focused on enterprise RAG workflows
  • Smaug-72B Based on Qwen-72B and MoMo-72B-Lora then finetuned by Abacus.AI, is the best performing Open LLM on the HF leaderboard by Feb-2024
  • AI21 Jamba production-grade Mamba-based hybrid SSM-Transformer Model licensed under Apache 2.0 with 256K context and 52B MoE at 12B each
  • command-r 35B optimized for retrieval augmented generation (RAG) and tool use supporting Embed and Rerank methodology. model weights
  • StarCoder2 15B, 7B and 3B code completion models trained on The Stack v2
  • command-r-plus a 104B model with highly advanced capabilities including RAG and tool use for English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese
  • DBRX base and instruct MoE models from databricks with 132B total parameters and a larger number of smaller experts supporting RoPE and 32K context size
  • grok-1 314b MoE model by xAI
  • Mixtral-8x22B-v0.1 Sparse MoE model with 176B total and 44B active parameters, 65k context size
  • aiXcoder 7B Code LLM for code completion, comprehension, generation
  • WizardLM-2-7B Microsoft's WizardLM 2 7B, release for 70B coming up backup0
  • WizardLM-2-8x22B Microsoft's WizardLM 2 8x22B beating gpt-4-0314 on MT-Bench
  • Mixtral-8x22B-Instruct-v0.1 an instruct fine-tuned version of the Mixtral-8x22B-v0.1
  • wavecoder-ultra-6.7b covering four general code-related tasks: code generation, code summary, code translation, and code repair
  • GemMoE An 8x8 Mixture Of Experts based on Gemma
  • Granite family of Code Models from IBM with 3b, 8b, 20b, 34b, base and instruct models for code completion and chat
  • DeepSeek-V2 21B Strong, Economical, and Efficient Mixture-of-Experts Language Model
  • Yuan2-M32 Mixture of Experts with Attention Router, 32 Experts, 2 Active, TOtal 40B parameters, 3.7B active and max length of 16K
  • CodeStral-22B Coding model trained on 80+ languages with instruct and Fill in the Middle tasks, 32k max context
  • Mistral-7b-instruct-v0.3 with function calling, new tokenizer and 32k max context
  • Aya-23 8B and 35B instruction tuned multi lingual model focusing on 23 languages
  • Mamba-Codestral by mistral based on the Mamba2 architecture performing on par with SOTA transformer based code models
  • CodeGeeX4 9B multilingual code generation model for chat and instruct with a 128k context length
  • Mistral Nemo a 12B model by mistral and nvidia offering 128k context window offered as instruct and base models
  • Nuextract is a structure extraction model based on phi-3-mini, allowing to instruct based on a json template that the model fills from unstructured text provided
  • Llama-3.1 Metas most advanced model providing 8b, 70b and 405b base and instruction tuned models and 128k context window with on par quality of current SOTA closed source models
  • Mistral-Large a 123B sized model beating llama-3.1 and gpt-4o in several categories with a focus on multilinguality, coding, agentic tasks and reasoning.
  • InternLM2.5 7B base and chat models focusing reasoning, math and tool use and 1M context window
  • Yi-1.5 9b model focusing on multilingual text understanding, available as 9B and 34B variants
  • Phi Microsoft's small language and vision models with small and medium parameter sizes, short and long context lengths and great performance
  • Qwen2 English and Chinese models from 0.5b, 1.5b, 7b, and 72b sizes with great performance and 128k context windows for the 7 and 72b models
  • codeqwen1.5 base and chat models with 7B parameters and good quality
  • grantie IBMs code models available in 3b, 8b, 20b size as base and instruct variants with up to 128k context size
  • codegemma google's coding models from 2b base, 7b base and 7b instruct
  • DeepSeekCoderv2 16b and 236b mixture of experts coding models with 128k context length
  • gemma2 2b 2b small language model by google achieving SOTA performance for sub 3b models on LLM Leaderboard 2
  • llama-3.2 small and medium sized vision LLMs in 11b and 90b and text only 1b and 3b models by Meta
  • Pixtral 12B LLM with a 400M vision encoder for multi modal image and text inference and 128k sequence length by Mistral
  • reader-lm Jina AI's LLM to convert HTML to Markdown, making heuristics, cleanup and content identification an LLM task
  • Zamba2 a 7B SOTA SML for running on-device with 25% faster first token time and 20% token per second rate compared to other architectures using Mamba2 blocks interleaved shared attention blocks and LoRA shared MLP block
  • ichigo an open research project extending text-based llama3 to have native "listening" ability, using an early fusion technique, with improved multiturn capabilities and refusal to process inaudible queries
  • CursorCore Coding LLMs for use within CursorCore and CursorWeb