Post
18
Yet Another New Multimodal Fine-Tuning Recipe π₯§
π§βπ³ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.
π‘ This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!
We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).
Check it out! β‘οΈ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo
π§βπ³ In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.
π‘ This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!
We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).
Check it out! β‘οΈ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo