File size: 7,943 Bytes
cbb8847 fb31ac1 cbb8847 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
language:
- en
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
datasets:
- ServiceNow/BigDocs-Sketch2Flow
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
# Model Card for ServiceNow/Qwen2.5-VL-7B-Instruct-StarFlow
Qwen2.5-VL-7B-Instruct-StarFlow is a vision-language model finetuned for **structured workflow generation from sketch images**. It translates hand-drawn or computer-generated workflow diagrams into structured JSON workflows, including triggers, flow logic, and actions.
---
## Model Details
### Model Description
Qwen2.5-VL-7B-Instruct-StarFlow is part of the **StarFlow** framework for automating workflow creation. It extends Qwen2.5-VL-7B-Instruct with domain-specific finetuning on workflow diagrams, enabling accurate sketch-to-workflow generation.
* **Developed by:** ServiceNow Research
* **Model type:** Transformer-based Vision-Language Model (VLM)
* **Language(s) (NLP):** English
* **License:** Apache 2.0
* **Finetuned from model:** [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
### Model Sources
* **Repository:** [ServiceNow/Qwen2.5-VL-7B-Instruct-StarFlow](https://huggingface.co/ServiceNow/Qwen2.5-VL-7B-Instruct-StarFlow)
* **Paper:** [StarFlow: Generating Structured Workflow Outputs From Sketch Images](https://arxiv.org/abs/2503.21889);
---
## Uses
### Direct Use
* Translating **sketches of workflows** (hand-drawn, whiteboard, or digital diagrams) into **JSON structured workflows**.
* Supporting **workflow automation** in enterprise platforms by removing the need for manual low-code configuration.
### Downstream Use
* Integration into **enterprise low-code platforms** for rapid prototyping of workflows by users.
* Used in **automation migration pipelines**, e.g., converting legacy workflow screenshots into JSON representations.
### Out-of-Scope Use
* General-purpose vision-language tasks (e.g., image captioning, OCR).
* Use on domains outside workflow automation (e.g., arbitrary diagram-to-code).
* Real-time handwriting recognition (StarFlow focuses on structured workflow translation, not raw OCR).
---
## Bias, Risks, and Limitations
* **Limited generalization**: Finetuned models perform poorly on out-of-distribution diagrams from unfamiliar platforms.
* **Sensitivity to input style**: Whiteboard/handwritten sketches degrade performance compared to digital or UI-rendered workflows.
* **Component naming mismatches**: Model may mispredict action definitions (e.g., “create\_user” vs. “create\_a\_user”), leading to execution errors.
* **Evaluation gap**: Current metrics don’t always reflect execution correctness of generated workflows.
### Recommendations
Users should:
* Validate outputs before deployment.
* Be cautious with **handwritten/ambiguous sketches**.
* Consider supplementing with **retrieval-augmented generation (RAG)** or **tool grounding** for robustness.
---
## How to Get Started with the Model
```python
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image
processor = AutoProcessor.from_pretrained("ServiceNow/Qwen2.5-VL-7B-Instruct-StarFlow")
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("ServiceNow/Qwen2.5-VL-7B-Instruct-StarFlow")
image = Image.open("workflow_sketch.png")
inputs = processor(images=image, text="Generate workflow JSON", return_tensors="pt")
outputs = model.generate(**inputs, max_length=4096)
workflow_json = processor.decode(outputs[0], skip_special_tokens=True)
print(workflow_json)
```
---
## Training Details
### Training Data
The model was trained using the [ServiceNow/BigDocs-Sketch2Flow](https://huggingface.co/datasets/ServiceNow/BigDocs-Sketch2Flow) dataset, which includes the following data distribution:
* **Synthetic** (12,376 Graphviz-generated diagrams)
* **Manual** (3,035 sketches hand-drawn by annotators)
* **Digital** (2,613 diagrams drawn using software)
* **Whiteboard** (484 sketches drawn on whiteboard / blackboard)
* **User Interface** (373 screenshots from ServiceNow Flow Designer)
### Training Procedure
#### Preprocessing
* Synthetic workflows generated via **heuristics** (Scheduled Loop, IF/ELSE, FOREACH, etc.).
* Annotators recreated flows in digital, manual, and whiteboard formats.
#### Training Hyperparameters
* Optimizer: **AdamW** with β=(0.95,0.999), lr=2e-5, weight decay=1e-6.
* Scheduler: **cosine learning rate** with 30 warmup steps.
* Early stopping based on validation loss.
* Precision: **bf16 mixed-precision**.
* Sequence length: up to **32k tokens**.
#### Speeds, Sizes, Times
* Trained with **16× NVIDIA H100 80GB GPUs** across two nodes.
* Full Sharded Data Parallel (FSDP) training, no CPU offloading.
---
## Evaluation
### Testing Data
Same dataset distribution as training: synthetic, manual, digital, whiteboard, UI-rendered workflows.
### Factors
* **Source of sample** (synthetic, manual, UI, etc.)
* **Orientation** (portrait vs. landscape diagrams)
* **Resolution** (small <400k pixels, medium, large >1M pixels)
### Metrics
All Evaluation metrics can be found in the official [StarFlow repo](https://github.com/ServiceNow/StarFlow).
* **Flow Similarity (FlowSim)** – tree edit distance similarity.
* **TreeBLEU** – structural recall of subtrees.
* **Trigger Match (TM)** – accuracy of workflow triggers.
* **Component Match (CM)** – overlap of predicted vs. gold components.
### Results
* Proprietary models (GPT-4o, Claude-3.7, Gemini 2.0) outperform open-weights **without finetuning**.
* **Finetuned Pixtral-12B achieves SOTA**:
* FlowSim w/ inputs: **0.919**
* TreeBLEU w/ inputs: **0.950**
* Trigger Match: **0.753**
* Component Match: **0.930**
#### Summary
Finetuning yields **large gains over base Pixtral-12B and GPT-4o**, particularly in matching workflow components and triggers.
---
## Model Examination
* Finetuned models capture **naming conventions** and structured execution logic better.
* Failure modes include **missing ELSE branches** or **generic table names**.
---
## Technical Specifications
### Model Architecture and Objective
* Base: **Qwen2.5-VL-7B-Instruct**, a multimodal transformer (7B parameters).
* Objective: **Image-to-JSON structured workflow generation**.
### Compute Infrastructure
* **Hardware:** 16× NVIDIA H100 80GB (2 nodes)
* **Software:** FSDP, bf16 mixed precision, PyTorch/Transformers
---
## Citation
**BibTeX:**
```bibtex
@article{bechard2025starflow,
title={StarFlow: Generating Structured Workflow Outputs from Sketch Images},
author={B{\'e}chard, Patrice and Wang, Chao and Abaskohi, Amirhossein and Rodriguez, Juan and Pal, Christopher and Vazquez, David and Gella, Spandana and Rajeswar, Sai and Taslakian, Perouz},
journal={arXiv preprint arXiv:2503.21889},
year={2025}
}
```
**APA:**
Béchard, P., Wang, C., Abaskohi, A., Rodriguez, J., Pal, C., Vazquez, D., Gella, S., Rajeswar, S., & Taslakian, P. (2025). **StarFlow: Generating Structured Workflow Outputs from Sketch Images**. *arXiv preprint arXiv:2503.21889*.
---
## Glossary
* **FlowSim**: Metric based on tree edit distance for workflows.
* **TreeBLEU**: BLEU-like score using tree structures.
* **Trigger Match**: Correctness of predicted workflow trigger.
* **Component Match**: Correctness of predicted components (order-agnostic).
---
## More Information
* [ServiceNow Flow Designer](https://www.servicenow.com/products/platform-flow-designer.html)
* [StarFlow Blog](https://www.servicenow.com/blogs/2025/starflow-ai-turns-sketches-into-workflows)
---
## The StarFlow Team
* Patrice Béchard, Chao Wang, Amirhossein Abaskohi, Juan Rodriguez, Christopher Pal, David Vazquez, Spandana Gella, Sai Rajeswar, Perouz Taslakian
---
## Model Card Contact
* Patrice Bechard - [patrice.bechard@servicenow.com](mailto:patrice.bechard@servicenow.com)
* ServiceNow Research – [research.servicenow.com](https://research.servicenow.com)
|