Soeren Moeller Christensen
AI & ML interests
Recent Activity
Organizations
Very cool with the RL usage for rerankers!!
Wanted to share an alternative that I did october 2024 when no information was available on multimodal rerankers (https://github.com/huggingface/transformers/pull/34086)
I trained a multimodal reranker based on Qwen2-VL (later retrained on Qwen2.5-VL) and I experimented with the classification layer a bit! We used it as a binary classifier for relevance instead of actually ranking though.
First I tried the LM head style that you also used here, which i was inspired to try by reading this blogpost https://www.lighton.ai/lighton-blogs/monoqwen-vision, basically just trained this with focal loss on the "yes" "no" tokens
Then i tried replacing the LM head with a binary classifier head and also trained this with focal loss.
The binary classifier head was cheaper to train and run inference on because we didn't have to materialize the whole vocab just to get the 2 decision tokens (although you could do the slicing trick you also wrote about here), it was faster and also performance was slightly better (5% higher f1 if i remember correctly)