You can download our data from: data.zip and unzip it using the following command:
unzip data.zip
We will release the data download link as soon as possible.
You can construct the Idk dataset given a certain Ik threshold using the following command:
python process_sft_data.py --model_name llama-2-7b-chat --threshold 1.0
You need to process the preference data for reward modeling using the following command:
python process_preference_data.py
At first you need to construct Idk datasets with thresholds range from 0.1 to 1.0. Then you can relabel these Idk datasets using the following command:
python process_hir_data.py --root_dir sft_data/llama-2-7b-chat/