Improve model card: Add pipeline tag, library name, and GitHub link
Browse filesThis PR enhances the model card by:
* Adding the `pipeline_tag: image-text-to-text` to accurately reflect the model's functionality in processing visual (screenshots) and textual inputs to generate text (code).
* Including `library_name: transformers`, as the model is compatible with the Hugging Face `transformers` library, which enables the automated "How to use" widget.
* Adding a direct link to the GitHub repository (https://github.com/mnluzimu/WebGen-Agent) for improved discoverability of the codebase.
* Updating the image paths to point to the raw assets on the GitHub repository to ensure they render correctly on the Hugging Face Hub.
These changes improve the model's discoverability and usability for the community.
README.md
CHANGED
@@ -1,16 +1,20 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- luzimu/webgen-agent_train_step-grpo
|
5 |
- luzimu/webgen-agent_train_sft
|
6 |
-
|
7 |
-
|
|
|
8 |
---
|
9 |
|
10 |
# WebGen-Agent
|
11 |
|
12 |
WebGen-Agent is an advanced website generation agent designed to autonomously create websites from natural language instructions. It was introduced in the paper [WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/pdf/2509.22644v1).
|
13 |
|
|
|
|
|
14 |
## Project Overview
|
15 |
|
16 |
WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
|
@@ -35,25 +39,25 @@ Links to the data and model parameters are as follows:
|
|
35 |
|
36 |
WebGen-Agent follows an iterative, multi-step paradigm for website generation:
|
37 |
|
38 |
-
1.
|
39 |
-
2.
|
40 |
-
3.
|
41 |
- A screenshot of the website is captured
|
42 |
- A Visual Language Model (VLM) provides appearance feedback and scores
|
43 |
- A GUI-agent tests the website functionality and provides functional feedback
|
44 |
-
4.
|
45 |
|
46 |
-
.
|
15 |
|
16 |
+
Code: https://github.com/mnluzimu/WebGen-Agent
|
17 |
+
|
18 |
## Project Overview
|
19 |
|
20 |
WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
|
|
|
39 |
|
40 |
WebGen-Agent follows an iterative, multi-step paradigm for website generation:
|
41 |
|
42 |
+
1. **Code Generation**: The agent generates code to create or edit website files based on natural language instructions
|
43 |
+
2. **Code Execution**: Dependencies are installed and the website service is started
|
44 |
+
3. **Feedback Gathering**:
|
45 |
- A screenshot of the website is captured
|
46 |
- A Visual Language Model (VLM) provides appearance feedback and scores
|
47 |
- A GUI-agent tests the website functionality and provides functional feedback
|
48 |
+
4. **Refinement**: Based on the feedback, the agent continues to improve the website until it meets requirements
|
49 |
|
50 |
+

|
51 |
|
52 |
## Step-GRPO with Screenshot and GUI-agent Feedback
|
53 |
|
54 |
The Step-GRPO with Screenshot and GUI-agent Feedback approach uses the screenshot and GUI-agent scores inherently produced in the WebGen-Agent workflow as step-level rewards:
|
55 |
+
- **Screenshot Score**: Quantifies the visual appeal and aesthetics of the website
|
56 |
+
- **GUI-agent Score**: Measures how well the website meets functional requirements
|
57 |
|
58 |
These dual rewards provide dense, reliable process supervision that significantly improves the model's ability to generate high-quality websites.
|
59 |
|
60 |
+

|
61 |
|
62 |
## Citation
|
63 |
|