-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demo of DPO with QLoRA (w Llama3 70B Instruct) #858
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andrew! Thank you for submitting this QLoRA&DPO implementation to LMFlow. I have reviewed and tested the changes and it looks good to me.
- The curve is approaching the full-param training as provided below
- full: https://api.wandb.ai/links/yuil/shi57ch1
- qlora: https://api.wandb.ai/links/yuil/uhka9oth
- lora: https://api.wandb.ai/links/yuil/9p9ilr35
- Below are showcases of the trained model output
{
"question": "I might be nitpicking, but is it better to do this:\n\n```\nif ($loggedin) {\n // normal process\n}\nelse {\n header('Location: login.php');\n}\n\n```\n\nOr this:\n\n```\nif (!$loggedin) {\n header('Location: login.php');\n exit();\n}\n// normal process\n\n```\n\nOr does it just not matter?",
"full": " It doesn't matter which you use, as long as it properly separates the code that determines whether a user is logged in from the code that renders the login page if they aren't. So they're both equivalent.",
"qlora": "\nBoth ways are valid, but the easier is more readable. In this example, the `if (!$loggedin)` expression checks if we are current logged in or not. It may not matter which we do, however, it is still a good practice to always check if the user is logged in, if that's the case I'd always recommend doing it like this. \n\nI hope this helps! Let me know if you have any more questions.",
"lora": "\nBoth ways are valid, but the easier is more readable. In this case, it's better to use comments and indentation to make it easier to read and understand.\n\nRegarding another topic we covered earlier, normal structure. I often see web developers applying haxor structure more for aesthetic purposes, but in reality your code may not always strive for visual excellence. Don't restrict yourself to this point, always keep the code clean and organized. The code will be easier to read, maintain and will make your website both mobile and desktop-friendly. Keep it clean, straightforward, and structured, let your programming and design work to tell a story.",
"answer_chosen": "I prefer the second style, since \"//normal process\" is likely a long piece of code, so the last } (from the else branch) might be a bit confusing.",
"answer_rejected": "Definitely the second. It reads better and works better for the rest of your code."
}
test_full_dpo.json |
No description provided.