| --- |
| license: apache-2.0 |
| datasets: |
| - ssssmark/AesCoT |
| metrics: |
| - spearmanr |
| - pearsonr |
| base_model: |
| - Qwen/Qwen2.5-VL-7B-Instruct |
| pipeline_tag: reinforcement-learning |
| --- |
| |
|
|
| <div align="center"> |
|
|
| # Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization |
| <a href="https://arxiv.org/pdf/2509.21871" target="_blank"> |
| <img alt="arXiv" src="https://img.shields.io/badge/arXiv-Aes--R1-red?logo=arxiv" height="25" /> |
| </a> |
| <a href="https://huggingface.co/ssssmark/Aes-R1" target="_blank"> |
| <img alt="HF Model: Aes-R1" src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-Aes--R1-ffc107" height="25" /> |
| </a> |
| <a href="https://huggingface.co/datasets/ssssmark/AesCoT" target="_blank"> |
| <img alt="HF Dataset : Aes-CoT" src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Aes--CoT-ffc107" height="25" /> |
| </a> |
| </div> |
| |
|
|
| > A novel and effective reinforcement learning framework designed for Image Aesthetic Assessment and general open-ended preference evaluation. |
|
|
|
|
| # 🖥️Training |
| ## Preparation |
| 1. First download the IAA datasets(AVA,TAD66K,AADB,PARA...) and place them all in a single folder. |
| 2. Construct your image-score dataset in the following format: |
| ```json |
| { |
| "messages": [ |
| { |
| "content": "prompt here", |
| "role": "user" |
| }, |
| { |
| "content": "response here", |
| "role": "assistant" |
| } |
| ], |
| "images": "image_path_1" |
| }, |
| ``` |
| we provide an example dataset in `AesR1/data` folder. |
| 3. Download the pre-trained model weights from [here](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) and place them in `AesR1/models` |
|
|
| ## Cold-start |
| We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to train the SFT model. |
|
|
| 1. Clone the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) repository and install the dependencies. |
|
|
| ```bash |
| git clone https://github.com/hiyouga/LLaMA-Factory.git |
| conda create -n coldstart python=3.11.10 |
| conda activate coldstart |
| cd LLaMA-Factory |
| pip install -e ".[torch,metrics]" |
| ``` |
| 2. Put your cot dataset info in `LLaMA-Factory/data/dataset_info.json` and move `qwen_aescot.yaml` into `LLaMA-Factory/examples/train_full` |
| 3. Run the following command to train the SFT model. |
|
|
| ```bash |
| llamafactory-cli train examples/train_full/qwen_aescot.yaml |
| ``` |
|
|
| ## RAPO |
| First setup the environment for RAPO training. |
| ```bash |
| conda create -n rapo python=3.11.10 |
| conda activate rapo |
| bash setup.sh |
| ``` |
| After modification, run the following command to train the RAPO model. |
| ```bash |
| # For single node training |
| bash train/rapo/src/open-r1-multimodal/run_scripts/Aes/aes_onenode.sh |
| |
| # For multi node training |
| bash train/rapo/src/open-r1-multimodal/run_scripts/Aes/aes_multinode.sh |
| ``` |
|
|
| # Inference |
| After training, you can inference the model by using the scripts in LLaMA-Factory. |
|
|
| ```bash |
| #Install vllm |
| pip install vllm |
| |
| #Infer |
| python scripts/vllm_infer.py \ |
| --model_name_or_path [path/to/your/model] \ |
| --dataset [dataset_name] \ |
| --template qwen2_vl \ |
| --save_name result.jsonl \ |
| --temperature 0.6 \ |
| ``` |
|
|
| # 📚 Citation |
| If you find this repo useful, please consider citing our paper as follows: |
| ``` |
| @misc{liu2025unlockingessencebeautyadvanced, |
| title={Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization}, |
| author={Boyang Liu and Yifan Hu and Senjie Jin and Shihan Dou and Gonglei Shi and Jie Shao and Tao Gui and Xuanjing Huang}, |
| year={2025}, |
| eprint={2509.21871}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2509.21871}, |
| } |
| ``` |