Skip to content

adham-elarabawy/llvim

Repository files navigation

LLVim: Verifiable and Token-Efficient Text Extraction Using LLMs and Vim.

DOI CC BY 4.0

LLVim uses Large Language Models (LLMs) to operate on text documents through a Vim client. This approach ensures model-extracted content exists in the source text, eliminating hallucinations common in traditional LLM extraction. LLVim achieves over 95% reduction in token usage compared to verbatim extraction methods, and is robust on weakly supported languages that even frontier models struggle with. It operates a headless Neovim instance to execute LLM-generated Vim commands, providing verifiable and efficient text extraction.

CleanShot 2024-09-24 at 16 11 51

Roadmap

  • Vim emulator with helpers for llm interaction.
  • End-to-end single-turn proof-of-concept, with Hamming's You and Your Research.
  • Token savings metric. Aim to answer "how many tokens do we save by doing this?"
  • Plot token savings vs extracted length. (compared to verbatim extraction methods)
  • Plot pipeline latency vs extracted length. (compared to verbatim extraction methods)
  • Plot partial-ratio existence (verifiable extraction) vs extracted length. (compared to verbatim extraction methods)
  • Ablate vim window size
  • End-to-end multi-turn proof-of-concept (navigating a large document efficiently).
  • Replicate results on open-source models.
  • Concise & direct whitepaper to demonstrate findings.
  • [Maybe] synthetically bootstrap some finetuning data (output is easily verifiable, synthetic data is applicable).
  • [Maybe] fine-tune lightweight OS model on this.

Artifacts

Plot token savings vs extracted length. (compared to verbatim extraction methods) image

Cite this work

@misc{llvim,
  author = {Adham Elarabawy},
  title = {LLVim: Verifiable and Token-Efficient Text Extraction Using LLMs and Vim.},
  year = {2024},
  version = {0.1.0},
  url = {https://github.com/adham-elarabawy/llvim},
  doi = {10.5281/zenodo.13835827},
}

This work is licensed under a Creative Commons Attribution 4.0 International License. This imposes that you must provide proper attribution (citation above) when referencing, using, or deriving from this work.

About

Verifiable and Token-Efficient Text Extraction Using LLMs and Vim.

Resources

License

Stars

Watchers

Forks

Packages

No packages published