metadata

license: apache-2.0
pipeline_tag: image-classification
tags:
  - multi-label
  - anime
  - danbooru

Model · Demo · Quickstart · Quick comparisons

PixAI Tagger v0.9

A practical anime multi-label tagger. Not trying to win benchmarks; trying to be useful.
High recall, updated character coverage, trained on a fresh Danbooru snapshot (2025-01).
We’ll keep shipping: v1.0 (with updated tags) is next.

TL;DR

~13.5k Danbooru-style tags (general, character, copyright)

Headline: strong character performance; recall-leaning defaults

Built for search, dataset curation, caption assistance, and text-to-image conditioning

What it is (in one breath)

pixai-tagger-v0.9 is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to find more of the right stuff (recall) so you can filter later. We continued training the classification head of EVA02 (from WD v3) on a newer dataset, and used embedding-space MixUp to help calibration.

Last trained: 2025-04
Data snapshot: Danbooru IDs 1–8,600,750 (2025-01)
Finetuned from: SmilingWolf/wd-eva02-large-tagger-v3 (encoder frozen)
License (weights): Apache 2.0 (Note: Danbooru content has its own licenses.)

Why you might care

Newer data. Catches more recent IPs/characters.
Recall-first defaults. Good for search and curation; dial thresholds for precision.
Character focus. We spent time here; it shows up in evals.
Simple to run. Works as an endpoint or locally; small set of knobs.

Quickstart

Recommended defaults (balanced):

top_k = 128
threshold_general = 0.30
threshold_character = 0.75

Coverage preset (recall-heavier): threshold_general = 0.10 (expect more false positives)

1) Inference Endpoint

Deploy as an HF Inference Endpoint and test with the following command:

# Replace with your own endpoint URL
curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \
  -X POST \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {"url": "https://your.cdn/image.jpg"},
    "parameters": {
      "top_k": 128,
      "threshold_general": 0.10,
      "threshold_character": 0.75
    }
  }'

2) Python (InferenceClient)

from huggingface_hub import InferenceClient

client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud")
out = client.post(json={
    "inputs": {"url": "https://your.cdn/image.jpg"},
    "parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75}
})
# out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...]

3) Local Deployment

Minimal Script: See handler.py under Files for a minimal script.
Demo UI: our Huggingface Space above or this Huggingface Space from DeepGHS.
pip + direct weights: TBD (planned for v1.0).

Also , this Tagger can be used via the imgutils tool.

Training notes (short version)

Source: Danbooru (IDs 1–8,600,750; snapshot 2025-01)
Tag set: ~13,461 tags (≥600 occurrences); grouped as general/character/copyright
Filtering: remove images with <10 general tags (WD v3 heuristic)
Setup: EVA02 encoder frozen; classification head continued training
Input: 448×448; standard Danbooru tag normalization
Augment: MixUp in embedding space (α=200)
Optim: Adam 1e-5, cycle schedule; batch 2048; full precision
Compute: ~1 day on 1× 8×H100 node
(Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)

Evaluation (what to expect)

Metric style: Fixed thresholds (above). Reported as micro-averaged unless noted.

All-tags (13k) micro-F1: ~0.60 (recall-leaning)
Character subset (4k) micro-F1: 0.865 @ t_char=0.75
Reference: WD v3 SwinV2 character F1 ≈ 0.608 (same protocol)

Internal “accuracy/coverage” snapshot

Model	Coverage-F1	Accuracy-F1	Acc-Recall	Acc-Precision	Cov-Precision	Cov-Recall
PixAI v0.9	0.4910	0.4403	0.6654	0.3634	0.4350	0.6547
WD-v3-EVA02	0.4155	0.4608	0.4465	0.5248	0.4580	0.4083
WD-v3-SwinV2	0.3349	0.3909	0.3603	0.4821	0.3906	0.3171
Camie-70k	0.4877	0.4800	0.5743	0.4123	0.4288	0.5930

Notes • Character uses t≈0.75; coverage often uses t≈0.10. • Keep micro vs macro consistent when updating numbers.

Note: Plots show internal candidate versions (v2.x). Current release is equivalent to pixai-tagger-v0.9 (ex-v2.4.1). Follow-up version is in progress.

Quick comparisons

A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’.

Topic	PixAI Tagger v0.9	WD v3 (EVA02 / SwinV2)	What it means in practice
Data snapshot	Danbooru to 2025-01	Danbooru to 2024-02	Better coverage of newer IPs
Tag vocabulary	~13.5k tags	~10.8k tags	More labels to catch long-tail
Character F1	≈0.865 (@ 0.75 threshold)	~0.61 (SwinV2 ref)	Stronger character recognition
Default posture	Recall-leaning (tune down for precision)	Often more balanced	Good for search/curation; more false positives; set your own thresholds
Model size	~1.27 GB checkpoint	Similar ballpark	Easy to host; endpoint-friendly
Training strategy	Head-only; encoder frozen (EVA02)	Depends on release	Faster iteration on data updates

Intended use

You can:

Auto-tag anime images with Danbooru-style tags
Build tag-search indices
Assist caption generation (merge tags with NL captions)
Feed tags into text-to-image pipelines (alone or alongside text)

Please don’t rely on it for:

Legal/safety moderation or age verification
Non-anime imagery (performance will drop)
Fine-grained counting/attributes without human review

Limitations & risks

NSFW & sensitive tags. The dataset contains them; outputs may too.
Recall vs precision. Low thresholds increase false positives.
Hallucinations. Number-sensitive or visually similar tags can be mispredicted.
Representation bias. Mirrors Danbooru’s styles, tropes, and demographics.
IP/character names. Can be wrong or incomplete; use allow/deny lists and co-occurrence rules.

Tuning tips

Set different thresholds for general vs character tags.
Consider allow/deny lists for your domain.
Add simple co-occurrence rules to suppress contradictions.

Authors / Contributors

Linso — primary contributor (training, data processing)
narugo1992 — contributions
AngelBottomless (PixAI) — contributions
trojblue (PixAI) — contributions
The rest of the PixAI team — further development support and testing

We also appreciate the broader anime image generation community. Several ideas, discussions, and experiments from outside PixAI helped shape this release.

Maintenance

We plan future releases with updated snapshots.
v1.0 will include updated tags + packaging improvements.
Changelog will live in the repo.

Other

There is an ONNX version of this Tagger provided by DeepGHS, thanks!