SigLIP 2 - Fine-tuned for Spectrum Icons
This repository hosts a fine-tuned checkpoint derived from google/siglip2-base-patch16-naflex. The model keeps the SigLIP2 architecture and tokenizer from the base checkpoint and is optimized for: Image-text retrieval and caption alignment for Spectrum iconography.
Model Sources
- Base model: google/siglip2-base-patch16-naflex
- Fine-tuned checkpoint: JianLiao/siglip2-spectrum-icons-naflex
Training Data
- Spectrum icon set captions (internal).
Training Configuration
Phase 1
- num_train_epochs: 32.0
- learning_rate: 3e-05
- per_device_train_batch_size: 144
- gradient_accumulation_steps: 1
- warmup_ratio: 0.05
- weight_decay: 0.05
- save_strategy: steps
- eval_strategy: steps
Phase 2 Fine-tuning Hyperparameters
- num_train_epochs: 8.0
- learning_rate: 1e-05
- per_device_train_batch_size: 144
- gradient_accumulation_steps: 1
- warmup_ratio: 0.02
- weight_decay: 0.05
- save_strategy: steps
- eval_strategy: steps
How to Use
import torch
from transformers import AutoModel, AutoProcessor
from PIL import Image
processor = AutoProcessor.from_pretrained("JianLiao/siglip2-spectrum-icons-naflex", use_fast=False)
model = AutoModel.from_pretrained("JianLiao/siglip2-spectrum-icons-naflex", dtype=torch.float16, attn_implementation="sdpa")
image = Image.open("./image.png").convert("RGB")
inputs = processor(
text=["display forecast", "Crystal ball with a small sparkle", "show prediction", "Minimalist fortune-telling orb on a stand", "Monochrome magic globe with star accent"],
images=[image],
return_tensors="pt",
padding="max_length",
max_num_patches=256
)
with torch.no_grad():
outputs = model(**inputs)
image_embeds = outputs.vision_model_output.pooler_output
text_embeds = outputs.text_model_output.pooler_output
image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)
similarity = text_embeds @ image_embeds.T
print(similarity)
CPU example output:
tensor([[0.1677],
[0.0732],
[0.1676],
[0.1084],
[0.1381]], dtype=torch.float16)
Captions 1/3 rank highest for the icon; caption 5 remains competitive without losing descriptiveness. Sample icon:
Limitations
- Tuned for icon imagery; performance on natural images is not evaluated.
- Captions are domain-specific and concise; long-form text may not align well.
Intended Use
- Icon search and retrieval: rank Spectrum-style icons by text queries (design intent or UI labels).
- Caption verification: check alignment between proposed captions and icon visuals in QA pipelines.
- Embedding export: produce text/image embeddings for downstream vector search in design tooling.
Changelog
- 2025-11-26: Initial upload fine-tuned from google/siglip2-base-patch16-naflex.
- Downloads last month
- 250
Model tree for JianLiao/siglip2-spectrum-icons-naflex
Base model
google/siglip2-base-patch16-naflex