We introduce Streaming Infinite Retentive LLM (SirLLM), which utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory.
pip install torch torchvision torchaudio
pip install transformers==4.33.0 accelerate datasets evaluate wandb scikit-learn scipy sentencepiece
python setup.py develop
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama_concate_question_new_eval.py \
--start_size 4 --data_root "data/grocery_keys" \
--model_name_or_path "01-ai/Yi-6B-Chat" \
--enable_streaming --token_entropy_size 1020 \
--recent_size 0 --enable_token_entropy \
--output_dir "outputs/keys" --decay_ratio 1
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama_concate_question_new_eval.py \
--if_w_turns --start_size 4 \
--data_root "data/dailydialog" \
--model_name_or_path "01-ai/Yi-6B-Chat" \
--enable_streaming --token_entropy_size 508 \
--recent_size 0 --enable_token_entropy \
--output_dir "outputs/dailydialog" \
--decay_ratio 0.7
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama_concate_question_new_eval.py \
--start_size 4 --data_root "data/rock_paper_scissors" \
--model_name_or_path "01-ai/Yi-6B-Chat" \
--enable_streaming --token_entropy_size 1020 \
--recent_size 0 --enable_token_entropy \
--output_dir "outputs/rock_paper_scissors" \
--decay_ratio 0.9
💐Many thanks to Streamllm. Some portions of this codebase were inspired by or directly borrowed from the Streamllm. Their contributions have been invaluable in the development of this project.
@article{yao2024sirllm,
title={SirLLM: Streaming Infinite Retentive LLM},
author={Yao, Yao and Li, Zuchao and Zhao, Hai},
journal={arXiv preprint arXiv:2405.12528},
year={2024}
}