mair-lab
/

sft-simple.rl-simple-n-complex

Model card Files Files and versions

sft-simple.rl-simple-n-complex / README.md

rabiulawal's picture

Create README.md

0923aca verified about 1 month ago

|

history blame contribute delete

2.24 kB

	---
	language:
	- en
	base_model:
	- mair-lab/sft-simple
	---

	# EARL - RL Fine-tuned (S + C) (8B)

	Model Name: `mair-lab/sft-simple.rl-simple-n-complex`
	Model Size: 8B parameters
	Base Checkpoint: [`mair-lab/sft-simple`](https://huggingface.co/mair-lab/sft-simple)
	Training Method: Supervised Fine-Tuning (SFT) on Simple Edits → Reinforcement Learning (RL) on Simple + Complex Edits
	Datasets: Simple Edit (S), Complex Edit (C)

	This model is part of the EARL benchmark study:
	📄 [EARL: The Promise of RL for Autoregressive Image Editing](https://arxiv.org/abs/2508.01119)

	## Model Summary

	This RL fine-tuned model builds on the SFT-simple checkpoint, using reinforcement learning to improve performance on both simple and complex edit tasks. It’s optimized using a human-aligned reward function across diverse editing instructions.

	➡️ Inference instructions: [GitHub Repo](https://github.com/saba96/EARL?tab=readme-ov-file)

	## Full Benchmark Results

	\| Model \| Base Model \| OmniEdit \| EmuEdit \| AURORA \| MB \| VisMin \| I2EBench \| AVG \|
	\|---------------------------\|------------\|----------\|---------\|--------\|------\|--------\|----------\|---------\|
	\| Magicbrush \| SD v1.5 \| 3.43 \| 3.28 \| 3.01 \| 3.64 \| 3.48 \| 3.06 \| 3.32 \|
	\| InstructPix2Pix \| SD v1.5 \| 3.97 \| 3.24 \| 3.05 \| 3.12 \| 2.94 \| 3.23 \| 3.26 \|
	\| Aurora \| SD v1.5 \| 4.50 \| 4.40 \| 4.12 \| 4.62 \| 3.82 \| 3.58 \| 4.17 \|
	\| Omnigen* \| - \| 5.68 \| 5.00 \| 4.10 \| 4.68 \| 4.09 \| 4.68 \| 4.70 \|
	\| SFT (S) \| Emu3 \| 5.73 \| 3.66 \| 3.58 \| 3.19 \| 3.57 \| 3.59 \| 3.88 \|
	\| EARL SFT (S) → RL (S+C) \| SFT (S) \| 6.39 \| 4.47 \| 4.27 \| 4.52 \| 4.93 \| 4.19 \| 4.80 \|

	> 🚀 Highlight: Our RL model outperforms all supervised and diffusion baselines, setting a new state-of-the-art across the EARL benchmark with 4.80 AVG.

	## Use Cases
	- Simple edits of object, attribute, style and environment changes
	- Complex edits of counting, spatial relation and action changes
	- Instruction-following visual transformations