Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

📢 Latest News

Dec, 2024: We have released the dataset, trained model $\text{RankMistral}_{100}$ (download) and codes.

📋 Introduction

This repository contains the code for our paper Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models.

Large Language Models (LLMs) have shown exciting performance in listwise passage ranking. Due to the limited input length, existing methods often adopt the sliding window strategy. Such a strategy, though effective, is inefficient as it involves repetitive and serialized processing, which usually re-evaluates relevant passages multiple times. As a result, it incurs redundant API costs, which are proportional to the number of inference tokens. The development of long-context LLMs enables the full ranking of all passages within a single inference, avoiding redundant API costs. In this paper, we conduct a comprehensive study of long-context LLMs for ranking tasks in terms of efficiency and effectiveness. Surprisingly, our experiments reveal that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting with a huge efficiency improvement. Furthermore, we identify two limitations of fine-tuning the full ranking model based on existing methods: (1) sliding window strategy fails to produce a full ranking list as a training label, and (2) the language modeling loss cannot emphasize top-ranked passage IDs in the label. To alleviate these issues, we propose a complete listwise label construction approach and a novel importance-aware learning objective for full ranking. Experiments show the superior performance of our method over baselines.

📦 Environment

Step 1: Create Conda Environment

conda create -n fullrank python=3.9
conda activate fullrank

Step 2: Install jdk

In our project, we utilize JDK version 11.0.8 (other versions may also be compatible).

Step 3: Install packages

bash env.sh

📝 How to reproduce the experimental results?

1. Effectiveness

For the evaluation of effectiveness, please run the following script:

bash run_rank_llm.sh

The evaluation script uses vllm for acceleration. Please place the open-source long-context LLM to be evaluated in llm/, and place our $\text{RankMistral}_{100}$ in trained_models/.

Note: If you want to call the OpenAI API, remember to create a file named .env.local in the root directory of the project and set a variable OPEN_AI_API_KEY={YOUR_KEY}.

2. Efficiency

For the evaluation of efficiency, please run the following script:

bash test_latency.sh

Note that we choose not to use vllm technique for a fair comparison. The following image shows the latency across different LLMs:

🚀 How to fine-tune a full ranking model?

The training data is constructed by multi-pass sliding window and the model is optimized with importance-aware loss. Here is the overall framework:

The training data is placed in training_data/, which can also be downloaded from here. The data is generated by performing multi-pass sliding windows based on GPT-4o-2024-08-06. Below defines a piece of training data:

{
  "qid": "689440",
  "initial_list": ["4600588", "722358", "6582134", "2071892", "7466269", "2071888", "6093666", "562673", "562665", "1758980", "562679", "7757482", "1758979", "393724", "159972", "3916807", "687135", "1758973", "1758974", "8133472", "8133471", "625649", "3825484", "5600557", "7174178", "3018593", "2071889", "458944", "4015452", "7687108", "1472705", "458945", "5876966", "7397074", "2275276", "6551342", "7218862", "6881961", "1028265", "302030", "323769", "4704236", "6363015", "6881962", "6881963", "1472707", "6881964", "1116098", "7718211", "562670", "2071893", "3018587", "323765", "392148", "7544711", "7055748", "4015454", "49880", "3431733", "6451002", "7239242", "1250186", "7710675", "908920", "2509285", "1127726", "6289406", "6722658", "4413752", "1224378", "1774822", "3396051", "6881965", "6138380", "6138381", "2699391", "4217197", "1372040", "1886768", "6294509", "6707614", "431934", "4542554", "6138382", "2076787", "3284239", "3449908", "625646", "625648", "973146", "7031192", "2215635", "4413747", "5427959", "2857247", "6592283", "6629150", "729281", "7761482", "2476063"],
  "final_list": ["2071889", "2476063", "2071888", "3449908", "6582134", "1886768", "2076787", "625646", "625648", "1472705", "458944", "49880", "625649", "2275276", "6881962", "6451002", "7544711", "6138380", "722358", "2071892", "6138382", "6138381", "2071893", "3396051", "458945", "3284239", "7466269", "562673", "562665", "1758980", "6592283", "4413752", "4413747", "6881964", "6551342", "6881965", "1472707", "7174178", "6881961", "3431733", "2509285", "6881963", "6289406", "4217197", "5427959", "4542554", "1250186", "7239242", "4600588", "6093666", "7031192", "431934", "687135", "159972", "973146", "1127726", "392148", "6363015", "6707614", "1224378", "4704236", "7710675", "7055748", "7718211", "2699391", "1028265", "6294509", "302030", "908920", "1116098", "323769", "323765", "1774822", "1372040", "7757482", "562679", "1758979", "393724", "8133472", "1758973", "1758974", "5876966", "6722658", "7687108", "7397074", "562670", "8133471", "3018593", "7218862", "4015452", "4015454", "5600557", "3825484", "3018587", "6629150", "2215635", "7761482", "3916807", "2857247", "729281"]
}

Field Explanations:

qid: This represents the query identifier.
initial_list: This is the list of passage IDs retrieved using the BM25 algorithm.
final_list: This is the reordered list of passage IDs after processing by the teacher reranker.

Run the following code to fine-tune a full ranking model:

cd training
bash run_train.sh

For training with standard language modeling loss, set the parameter weighted_loss=False.

📞 Contact

If you have any questions or suggestions related to this project, feel free to open an issue or pull request. You also can email Wenhan Liu ([email protected]).

✨ Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{liu2024sliding,
  title={Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models},
  author={Liu, Wenhan and Ma, Xinyu and Zhu, Yutao and Zhao, Ziliang and Wang, Shuaiqiang and Yin, Dawei and Dou, Zhicheng},
  journal={arXiv preprint arXiv:2412.14574},
  year={2024}
}

We also acknowledge the opens-source repo RankLLM, which is instrumental for this work.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
cache		cache
data/ms_marco		data/ms_marco
rerank		rerank
results		results
runs		runs
training		training
training_data		training_data
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
data.py		data.py
env.sh		env.sh
index_and_topics.py		index_and_topics.py
run_rank_llm.py		run_rank_llm.py
run_rank_llm.sh		run_rank_llm.sh
test_latency.sh		test_latency.sh
trec_eval.py		trec_eval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

📢 Latest News

📋 Introduction

📦 Environment

Step 1: Create Conda Environment

Step 2: Install jdk

Step 3: Install packages

📝 How to reproduce the experimental results?

1. Effectiveness

2. Efficiency

🚀 How to fine-tune a full ranking model?

📞 Contact

✨ Citation

About

Releases

Packages

Contributors 2

Languages

License

8421BCD/fullrank

Folders and files

Latest commit

History

Repository files navigation

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

📢 Latest News

📋 Introduction

📦 Environment

Step 1: Create Conda Environment

Step 2: Install jdk

Step 3: Install packages

📝 How to reproduce the experimental results?

1. Effectiveness

2. Efficiency

🚀 How to fine-tune a full ranking model?

📞 Contact

✨ Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages