DPO code #2

HaoshengZou · 2024-08-14T09:11:43Z

Any plans on releasing the DPO code, or a brief intro of how you conducted long-context DPO?

bys0318 · 2024-08-15T11:04:19Z

Hi! We refer to section 4.2 of our paper for details of DPO. We use the same codebase as ChatGLM-RLHF. We currently do not have plan to release the code and data for DPO.

HaoshengZou · 2024-08-15T13:08:59Z

Thanks for the reply!

ChatGLM-RLHF doesn't have code released. Did you modify Megatron-LM for long-context DPO, or use NeMo-Aligner, or other implementation?

bys0318 · 2024-08-15T15:15:14Z

Hi, our DPO code is based on Megatron-LM.

heyzude · 2024-09-10T06:28:07Z

Hi, and thanks for sharing your work!

Could you elaborate more on which specific part of Megatraon-LM (https://github.com/NVIDIA/Megatron-LM) you used?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO code #2

DPO code #2

HaoshengZou commented Aug 14, 2024

bys0318 commented Aug 15, 2024

HaoshengZou commented Aug 15, 2024

bys0318 commented Aug 15, 2024

heyzude commented Sep 10, 2024

DPO code #2

DPO code #2

Comments

HaoshengZou commented Aug 14, 2024

bys0318 commented Aug 15, 2024

HaoshengZou commented Aug 15, 2024

bys0318 commented Aug 15, 2024

heyzude commented Sep 10, 2024