rabiulawal's picture
Create README.md
0923aca verified
metadata
language:
  - en
base_model:
  - mair-lab/sft-simple

EARL - RL Fine-tuned (S + C) (8B)

Model Name: mair-lab/sft-simple.rl-simple-n-complex
Model Size: 8B parameters
Base Checkpoint: mair-lab/sft-simple
Training Method: Supervised Fine-Tuning (SFT) on Simple Edits → Reinforcement Learning (RL) on Simple + Complex Edits
Datasets: Simple Edit (S), Complex Edit (C)

This model is part of the EARL benchmark study:
📄 EARL: The Promise of RL for Autoregressive Image Editing

Model Summary

This RL fine-tuned model builds on the SFT-simple checkpoint, using reinforcement learning to improve performance on both simple and complex edit tasks. It’s optimized using a human-aligned reward function across diverse editing instructions.

➡️ Inference instructions: GitHub Repo

Full Benchmark Results

Model Base Model OmniEdit EmuEdit AURORA MB VisMin I2EBench AVG
Magicbrush SD v1.5 3.43 3.28 3.01 3.64 3.48 3.06 3.32
InstructPix2Pix SD v1.5 3.97 3.24 3.05 3.12 2.94 3.23 3.26
Aurora SD v1.5 4.50 4.40 4.12 4.62 3.82 3.58 4.17
Omnigen* - 5.68 5.00 4.10 4.68 4.09 4.68 4.70
SFT (S) Emu3 5.73 3.66 3.58 3.19 3.57 3.59 3.88
EARL SFT (S) → RL (S+C) SFT (S) 6.39 4.47 4.27 4.52 4.93 4.19 4.80

🚀 Highlight: Our RL model outperforms all supervised and diffusion baselines, setting a new state-of-the-art across the EARL benchmark with 4.80 AVG.

Use Cases

  • Simple edits of object, attribute, style and environment changes
  • Complex edits of counting, spatial relation and action changes
  • Instruction-following visual transformations