metadata

language:
  - en
base_model:
  - mair-lab/sft-simple

EARL - RL Fine-tuned (S + C) (8B)

Model Name: mair-lab/sft-simple.rl-simple-n-complex
Model Size: 8B parameters
Base Checkpoint: mair-lab/sft-simple
Training Method: Supervised Fine-Tuning (SFT) on Simple Edits → Reinforcement Learning (RL) on Simple + Complex Edits
Datasets: Simple Edit (S), Complex Edit (C)

This model is part of the EARL benchmark study:
📄 EARL: The Promise of RL for Autoregressive Image Editing

Model Summary

This RL fine-tuned model builds on the SFT-simple checkpoint, using reinforcement learning to improve performance on both simple and complex edit tasks. It’s optimized using a human-aligned reward function across diverse editing instructions.

➡️ Inference instructions: GitHub Repo

Full Benchmark Results

Model	Base Model	OmniEdit	EmuEdit	AURORA	MB	VisMin	I2EBench	AVG
Magicbrush	SD v1.5	3.43	3.28	3.01	3.64	3.48	3.06	3.32
InstructPix2Pix	SD v1.5	3.97	3.24	3.05	3.12	2.94	3.23	3.26
Aurora	SD v1.5	4.50	4.40	4.12	4.62	3.82	3.58	4.17
Omnigen*	-	5.68	5.00	4.10	4.68	4.09	4.68	4.70
SFT (S)	Emu3	5.73	3.66	3.58	3.19	3.57	3.59	3.88
EARL SFT (S) → RL (S+C)	SFT (S)	6.39	4.47	4.27	4.52	4.93	4.19	4.80

🚀 Highlight: Our RL model outperforms all supervised and diffusion baselines, setting a new state-of-the-art across the EARL benchmark with 4.80 AVG.

Use Cases

Simple edits of object, attribute, style and environment changes
Complex edits of counting, spatial relation and action changes
Instruction-following visual transformations