Skip to content

8421BCD/fullrank

Repository files navigation

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

License Static Badge

📢 Latest News

  • Dec, 2024: We have released the dataset, trained model $\text{RankMistral}_{100}$ (download) and codes.

📋 Introduction

This repository contains the code for our paper Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models.

image-20241218172223836

Large Language Models (LLMs) have shown exciting performance in listwise passage ranking. Due to the limited input length, existing methods often adopt the sliding window strategy. Such a strategy, though effective, is inefficient as it involves repetitive and serialized processing, which usually re-evaluates relevant passages multiple times. As a result, it incurs redundant API costs, which are proportional to the number of inference tokens. The development of long-context LLMs enables the full ranking of all passages within a single inference, avoiding redundant API costs. In this paper, we conduct a comprehensive study of long-context LLMs for ranking tasks in terms of efficiency and effectiveness. Surprisingly, our experiments reveal that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting with a huge efficiency improvement. Furthermore, we identify two limitations of fine-tuning the full ranking model based on existing methods: (1) sliding window strategy fails to produce a full ranking list as a training label, and (2) the language modeling loss cannot emphasize top-ranked passage IDs in the label. To alleviate these issues, we propose a complete listwise label construction approach and a novel importance-aware learning objective for full ranking. Experiments show the superior performance of our method over baselines.

📦 Environment

Step 1: Create Conda Environment

conda create -n fullrank python=3.9
conda activate fullrank

Step 2: Install jdk

In our project, we utilize JDK version 11.0.8 (other versions may also be compatible).

Step 3: Install packages

bash env.sh

📝 How to reproduce the experimental results?

1. Effectiveness

For the evaluation of effectiveness, please run the following script:

bash run_rank_llm.sh

The evaluation script uses vllm for acceleration. Please place the open-source long-context LLM to be evaluated in llm/, and place our $\text{RankMistral}_{100}$ in trained_models/.

Note: If you want to call the OpenAI API, remember to create a file named .env.local in the root directory of the project and set a variable OPEN_AI_API_KEY={YOUR_KEY}.

2. Efficiency

For the evaluation of efficiency, please run the following script:

bash test_latency.sh

Note that we choose not to use vllm technique for a fair comparison. The following image shows the latency across different LLMs:

image-20241218232539325

🚀 How to fine-tune a full ranking model?

The training data is constructed by multi-pass sliding window and the model is optimized with importance-aware loss. Here is the overall framework:

image-20241218200920116

The training data is placed in training_data/, which can also be downloaded from here. The data is generated by performing multi-pass sliding windows based on GPT-4o-2024-08-06. Below defines a piece of training data:

{
  "qid": "689440",
  "initial_list": ["4600588", "722358", "6582134", "2071892", "7466269", "2071888", "6093666", "562673", "562665", "1758980", "562679", "7757482", "1758979", "393724", "159972", "3916807", "687135", "1758973", "1758974", "8133472", "8133471", "625649", "3825484", "5600557", "7174178", "3018593", "2071889", "458944", "4015452", "7687108", "1472705", "458945", "5876966", "7397074", "2275276", "6551342", "7218862", "6881961", "1028265", "302030", "323769", "4704236", "6363015", "6881962", "6881963", "1472707", "6881964", "1116098", "7718211", "562670", "2071893", "3018587", "323765", "392148", "7544711", "7055748", "4015454", "49880", "3431733", "6451002", "7239242", "1250186", "7710675", "908920", "2509285", "1127726", "6289406", "6722658", "4413752", "1224378", "1774822", "3396051", "6881965", "6138380", "6138381", "2699391", "4217197", "1372040", "1886768", "6294509", "6707614", "431934", "4542554", "6138382", "2076787", "3284239", "3449908", "625646", "625648", "973146", "7031192", "2215635", "4413747", "5427959", "2857247", "6592283", "6629150", "729281", "7761482", "2476063"],
  "final_list": ["2071889", "2476063", "2071888", "3449908", "6582134", "1886768", "2076787", "625646", "625648", "1472705", "458944", "49880", "625649", "2275276", "6881962", "6451002", "7544711", "6138380", "722358", "2071892", "6138382", "6138381", "2071893", "3396051", "458945", "3284239", "7466269", "562673", "562665", "1758980", "6592283", "4413752", "4413747", "6881964", "6551342", "6881965", "1472707", "7174178", "6881961", "3431733", "2509285", "6881963", "6289406", "4217197", "5427959", "4542554", "1250186", "7239242", "4600588", "6093666", "7031192", "431934", "687135", "159972", "973146", "1127726", "392148", "6363015", "6707614", "1224378", "4704236", "7710675", "7055748", "7718211", "2699391", "1028265", "6294509", "302030", "908920", "1116098", "323769", "323765", "1774822", "1372040", "7757482", "562679", "1758979", "393724", "8133472", "1758973", "1758974", "5876966", "6722658", "7687108", "7397074", "562670", "8133471", "3018593", "7218862", "4015452", "4015454", "5600557", "3825484", "3018587", "6629150", "2215635", "7761482", "3916807", "2857247", "729281"]
}

Field Explanations:

  • qid: This represents the query identifier.
  • initial_list: This is the list of passage IDs retrieved using the BM25 algorithm.
  • final_list: This is the reordered list of passage IDs after processing by the teacher reranker.

Run the following code to fine-tune a full ranking model:

cd training
bash run_train.sh

For training with standard language modeling loss, set the parameter weighted_loss=False.

📞 Contact

If you have any questions or suggestions related to this project, feel free to open an issue or pull request. You also can email Wenhan Liu ([email protected]).

✨ Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{liu2024sliding,
  title={Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models},
  author={Liu, Wenhan and Ma, Xinyu and Zhu, Yutao and Zhao, Ziliang and Wang, Shuaiqiang and Yin, Dawei and Dou, Zhicheng},
  journal={arXiv preprint arXiv:2412.14574},
  year={2024}
}

We also acknowledge the opens-source repo RankLLM, which is instrumental for this work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published