Model · Demo · Quickstart · Quick comparisons
PixAI Tagger v0.9
A practical anime multi-label tagger. Not trying to win benchmarks; trying to be useful.
High recall, updated character coverage, trained on a fresh Danbooru snapshot (2025-01).
We’ll keep shipping: v1.0 (with updated tags) is next.
TL;DR
- ~13.5k Danbooru-style tags (general, character, copyright)
- Headline: strong character performance; recall-leaning defaults
- Built for search, dataset curation, caption assistance, and text-to-image conditioning
What it is (in one breath)
pixai-tagger-v0.9
is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to find more of the right stuff (recall) so you can filter later. We continued training the classification head of EVA02 (from WD v3) on a newer dataset, and used embedding-space MixUp to help calibration.
- Last trained: 2025-04
- Data snapshot: Danbooru IDs 1–8,600,750 (2025-01)
- Finetuned from:
SmilingWolf/wd-eva02-large-tagger-v3
(encoder frozen) - License (weights): Apache 2.0 (Note: Danbooru content has its own licenses.)
Why you might care
- Newer data. Catches more recent IPs/characters.
- Recall-first defaults. Good for search and curation; dial thresholds for precision.
- Character focus. We spent time here; it shows up in evals.
- Simple to run. Works as an endpoint or locally; small set of knobs.
Quickstart
Recommended defaults (balanced):
top_k = 128
threshold_general = 0.30
threshold_character = 0.75
Coverage preset (recall-heavier): threshold_general = 0.10
(expect more false positives)
1) Inference Endpoint
Deploy as an HF Inference Endpoint and test with the following command:
# Replace with your own endpoint URL
curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \
-X POST \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"inputs": {"url": "https://your.cdn/image.jpg"},
"parameters": {
"top_k": 128,
"threshold_general": 0.10,
"threshold_character": 0.75
}
}'
2) Python (InferenceClient)
from huggingface_hub import InferenceClient
client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud")
out = client.post(json={
"inputs": {"url": "https://your.cdn/image.jpg"},
"parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75}
})
# out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...]
3) Local Deployment
- Minimal Script: See
handler.py
under Files for a minimal script. - Demo UI: our Huggingface Space above or this Huggingface Space from DeepGHS.
pip
+ direct weights: TBD (planned for v1.0).
Also , this Tagger can be used via the imgutils tool.
Training notes (short version)
- Source: Danbooru (IDs 1–8,600,750; snapshot 2025-01)
- Tag set: ~13,461 tags (≥600 occurrences); grouped as general/character/copyright
- Filtering: remove images with <10 general tags (WD v3 heuristic)
- Setup: EVA02 encoder frozen; classification head continued training
- Input: 448×448; standard Danbooru tag normalization
- Augment: MixUp in embedding space (α=200)
- Optim: Adam 1e-5, cycle schedule; batch 2048; full precision
- Compute: ~1 day on 1× 8×H100 node
- (Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)
Evaluation (what to expect)
Metric style: Fixed thresholds (above). Reported as micro-averaged unless noted.
- All-tags (13k) micro-F1: ~0.60 (recall-leaning)
- Character subset (4k) micro-F1: 0.865 @
t_char=0.75
- Reference: WD v3 SwinV2 character F1 ≈ 0.608 (same protocol)
Internal “accuracy/coverage” snapshot
Model | Coverage-F1 | Accuracy-F1 | Acc-Recall | Acc-Precision | Cov-Precision | Cov-Recall |
---|---|---|---|---|---|---|
PixAI v0.9 | 0.4910 | 0.4403 | 0.6654 | 0.3634 | 0.4350 | 0.6547 |
WD-v3-EVA02 | 0.4155 | 0.4608 | 0.4465 | 0.5248 | 0.4580 | 0.4083 |
WD-v3-SwinV2 | 0.3349 | 0.3909 | 0.3603 | 0.4821 | 0.3906 | 0.3171 |
Camie-70k | 0.4877 | 0.4800 | 0.5743 | 0.4123 | 0.4288 | 0.5930 |
Notes • Character uses
t≈0.75
; coverage often usest≈0.10
. • Keep micro vs macro consistent when updating numbers.
Note: Plots show internal candidate versions (v2.x). Current release is equivalent to
pixai-tagger-v0.9
(ex-v2.4.1
). Follow-up version is in progress.
Quick comparisons
A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’.
Topic | PixAI Tagger v0.9 | WD v3 (EVA02 / SwinV2) | What it means in practice |
---|---|---|---|
Data snapshot | Danbooru to 2025-01 | Danbooru to 2024-02 | Better coverage of newer IPs |
Tag vocabulary | ~13.5k tags | ~10.8k tags | More labels to catch long-tail |
Character F1 | ≈0.865 (@ 0.75 threshold) | ~0.61 (SwinV2 ref) | Stronger character recognition |
Default posture | Recall-leaning (tune down for precision) | Often more balanced | Good for search/curation; more false positives; set your own thresholds |
Model size | ~1.27 GB checkpoint | Similar ballpark | Easy to host; endpoint-friendly |
Training strategy | Head-only; encoder frozen (EVA02) | Depends on release | Faster iteration on data updates |
Intended use
You can:
- Auto-tag anime images with Danbooru-style tags
- Build tag-search indices
- Assist caption generation (merge tags with NL captions)
- Feed tags into text-to-image pipelines (alone or alongside text)
Please don’t rely on it for:
- Legal/safety moderation or age verification
- Non-anime imagery (performance will drop)
- Fine-grained counting/attributes without human review
Limitations & risks
- NSFW & sensitive tags. The dataset contains them; outputs may too.
- Recall vs precision. Low thresholds increase false positives.
- Hallucinations. Number-sensitive or visually similar tags can be mispredicted.
- Representation bias. Mirrors Danbooru’s styles, tropes, and demographics.
- IP/character names. Can be wrong or incomplete; use allow/deny lists and co-occurrence rules.
Tuning tips
- Set different thresholds for general vs character tags.
- Consider allow/deny lists for your domain.
- Add simple co-occurrence rules to suppress contradictions.
Authors / Contributors
- Linso — primary contributor (training, data processing)
- narugo1992 — contributions
- AngelBottomless (PixAI) — contributions
- trojblue (PixAI) — contributions
- The rest of the PixAI team — further development support and testing
We also appreciate the broader anime image generation community. Several ideas, discussions, and experiments from outside PixAI helped shape this release.
Maintenance
- We plan future releases with updated snapshots.
- v1.0 will include updated tags + packaging improvements.
- Changelog will live in the repo.
Other
- There is an ONNX version of this Tagger provided by DeepGHS, thanks!