Definition
Fine-tuning methods (like DPO, RLHF) aligning models with human preferences using preferred vs. non-preferred output data.
Detailed Explanation
Fine-tuning methods, like DPO or RLHF, that align model behavior with human preferences by training on data indicating preferred versus less preferred outputs.
Use Cases
Aligning LLMs to be more helpful, honest, and harmless; improving chatbot quality based on user feedback; making AI outputs more desirable or useful to humans.
