📃 Paper | 🤗 Huggingface | 📭 Contact
Welcome to the repository of MAPO, our cutting-edge framework designed to revolutionize multilingual reasoning capabilities in large language models (LLMs).
-
🚀 We propose a framework that enhances the reasoning multilingual reasoning capabilities by aligning reasoning processes of other languages with those of English. We use off-the-shelf translation models to estimate the alignment of reasoning processes in other languages, and then optimize this alignment as a preference using popular preference optimization methods such as DPO or PPO.
-
📈 By utilizing our framework, you can effectively improve the consistency of multilingual reasoning, thereby enhancing the multilingual reasoning capabilities of large models in a more generalizable manner. Our approach has achieved impressive performance improvements, surpassing all baselines, including ChatGPT, and has reached state-of-the-art (SOTA) results.
-
🌐 Overall, our method demonstrates a novel way of improving the multilingual reasoning abilities of models without the need for extensive annotation of reasoning processes in other languages, enabling a more generalizable enhancement of multilingual reasoning capabilities.
Below is the average accuracy across ten languages on three multilingual mathematical reasoning datasets . Our method significantly improves the multilingual reasoning capabilities of LLMs by a large margin, achieving the SOTA performance. We also hope that in the future, more multilingual reasoning LLMs can collaborate with our work to further enhance multilingual reasoning capabilities.
System | MSVAMP | MGSM | MNumGLUESub |
---|---|---|---|
GPT-3.5-Turbo | 46.6 | 42.2 | 49.4 |
MAmmoTH 7B | 26.3 | 21.3 | 24.2 |
WizardMath 7B | 32.5 | 23.0 | 28.7 |
MetaMath 7B | 46.2 | 37.0 | 43.2 |
QAlign 7B | 57.2 | 49.6 | - |
MathOctopus 7B | 41.2 | 39.5 | 37.1 |
+ MAPO-DPO(ours)🔥 | 57.4 | 41.6 | 50.4 |
MetaMathOctopus 7B | 53.0 | 45.5 | 39.2 |
+ MAPO-DPO(ours) 👑 | 64.7 | 51.6 | 52.9 |
MistralMathOctopus 7B | 59.0 | 58.0 | 56.8 |
+ MAPO-DPO(ours) 👑 | 74.6 | 67.3 | 70.0 |
System | MSVAMP | MGSM | MNumGLUESub |
---|---|---|---|
GPT-3.5-Turbo | 46.6 | 42.2 | 49.4 |
MAmmoTH 13B | 38.6 | 28.9 | 29.5 |
WizardMath 13B | 35.7 | 28.3 | 29.0 |
MetaMath 13B | 46.2 | 43.9 | 43.3 |
QAlign 13B | 62.6 | 57.1 | - |
MathOctopus 13B | 51.8 | 46.0 | 40.3 |
+ MAPO-DPO(ours)🔥 | 60.1 | 48.5 | 53.8 |
MetaMathOctopus 13B | 56.3 | 51.4 | 49.5 |
+ MAPO-DPO(ours) 👑 | 67.0 | 58.0 | 59.8 |
We report PPL-based alignment score (left) and ACR (right), respectively assessing the consistency of the reasoning process and the reasoning answer. MAPO achieves significant improvements in the consistency of both the reasoning processes and the reasoning answers of LLM across various languages.
-
Preference optimization data preparation
- Generation: bash sampling.sh
- Preference estimation: bash PreferenceEstimate.sh
- Format paired data: python3 extract_dpo_data.py
-
Training:
- DPO: bash dpo.sh/dpo13b.sh yourconfig.json
- PPO: bash ppo_lora.sh yourconfig.json
-
Evaluation: bash run.sh
For more details about training/evaluating, please navigate to the Alignment/Evaluation directory.
If you find this repository helpful, feel free to cite our paper:
@misc{she2024mapo,
title={MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization},
author={Shuaijie She and Wei Zou and Shujian Huang and Wenhao Zhu and Xiang Liu and Xiang Geng and Jiajun Chen},
year={2024},
eprint={2401.06838},
archivePrefix={arXiv},
primaryClass={cs.CL}
}