File size: 2,240 Bytes
0923aca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
language:
- en
base_model:
- mair-lab/sft-simple
---

# EARL - RL Fine-tuned (S + C) (8B)

**Model Name:** `mair-lab/sft-simple.rl-simple-n-complex`  
**Model Size:** 8B parameters  
**Base Checkpoint:** [`mair-lab/sft-simple`](https://huggingface.co/mair-lab/sft-simple)  
**Training Method:** Supervised Fine-Tuning (SFT) on Simple Edits → Reinforcement Learning (RL) on Simple + Complex Edits  
**Datasets:** Simple Edit (S), Complex Edit (C)  

This model is part of the EARL benchmark study:  
📄 [EARL: The Promise of RL for Autoregressive Image Editing](https://arxiv.org/abs/2508.01119)

## Model Summary

This RL fine-tuned model builds on the SFT-simple checkpoint, using reinforcement learning to improve performance on both simple and complex edit tasks. It’s optimized using a human-aligned reward function across diverse editing instructions.

➡️ **Inference instructions:** [GitHub Repo](https://github.com/saba96/EARL?tab=readme-ov-file)

## Full Benchmark Results

| Model                      | Base Model | OmniEdit | EmuEdit | AURORA | MB   | VisMin | I2EBench | **AVG** |
|---------------------------|------------|----------|---------|--------|------|--------|----------|---------|
| Magicbrush                | SD v1.5    | 3.43     | 3.28    | 3.01   | 3.64 | 3.48   | 3.06     | 3.32    |
| InstructPix2Pix           | SD v1.5    | 3.97     | 3.24    | 3.05   | 3.12 | 2.94   | 3.23     | 3.26    |
| Aurora                    | SD v1.5    | 4.50     | 4.40    | 4.12   | 4.62 | 3.82   | 3.58     | 4.17    |
| Omnigen*                  | -          | 5.68     | 5.00    | 4.10   | 4.68 | 4.09   | 4.68     | 4.70    |
| **SFT (S)**               | Emu3       | 5.73     | 3.66    | 3.58   | 3.19 | 3.57   | 3.59     | 3.88    |
| **EARL SFT (S) → RL (S+C)** | SFT (S)    | **6.39** | 4.47    | **4.27** | 4.52 | 4.93   | 4.19     | **4.80** |

> 🚀 **Highlight:** Our RL model outperforms all supervised and diffusion baselines, setting a new state-of-the-art across the EARL benchmark with **4.80 AVG**.

## Use Cases
- Simple edits of object, attribute, style and environment changes
- Complex edits of counting, spatial relation and action changes
- Instruction-following visual transformations