Alignment, Preference Optimization, RLHF
Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization