pixai-tagger-v0.9 / README.md
peacej's picture
Update README.md
d8958b2 verified
metadata
license: apache-2.0
pipeline_tag: image-classification
tags:
  - multi-label
  - anime
  - danbooru

Model · Demo · Quickstart · Quick comparisons


PixAI Tagger v0.9

A practical anime multi-label tagger. Not trying to win benchmarks; trying to be useful.
High recall, updated character coverage, trained on a fresh Danbooru snapshot (2025-01).
We’ll keep shipping: v1.0 (with updated tags) is next.

TL;DR

  • ~13.5k Danbooru-style tags (general, character, copyright)
  • Headline: strong character performance; recall-leaning defaults
  • Built for search, dataset curation, caption assistance, and text-to-image conditioning

What it is (in one breath)

pixai-tagger-v0.9 is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to find more of the right stuff (recall) so you can filter later. We continued training the classification head of EVA02 (from WD v3) on a newer dataset, and used embedding-space MixUp to help calibration.

  • Last trained: 2025-04
  • Data snapshot: Danbooru IDs 1–8,600,750 (2025-01)
  • Finetuned from: SmilingWolf/wd-eva02-large-tagger-v3 (encoder frozen)
  • License (weights): Apache 2.0 (Note: Danbooru content has its own licenses.)

Why you might care

  • Newer data. Catches more recent IPs/characters.
  • Recall-first defaults. Good for search and curation; dial thresholds for precision.
  • Character focus. We spent time here; it shows up in evals.
  • Simple to run. Works as an endpoint or locally; small set of knobs.

Quickstart

Recommended defaults (balanced):

  • top_k = 128
  • threshold_general = 0.30
  • threshold_character = 0.75

Coverage preset (recall-heavier): threshold_general = 0.10 (expect more false positives)

1) Inference Endpoint

Deploy as an HF Inference Endpoint and test with the following command:

# Replace with your own endpoint URL
curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \
  -X POST \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {"url": "https://your.cdn/image.jpg"},
    "parameters": {
      "top_k": 128,
      "threshold_general": 0.10,
      "threshold_character": 0.75
    }
  }'

2) Python (InferenceClient)

from huggingface_hub import InferenceClient

client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud")
out = client.post(json={
    "inputs": {"url": "https://your.cdn/image.jpg"},
    "parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75}
})
# out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...]

3) Local Deployment

Also , this Tagger can be used via the imgutils tool.


Training notes (short version)

  • Source: Danbooru (IDs 1–8,600,750; snapshot 2025-01)
  • Tag set: ~13,461 tags (≥600 occurrences); grouped as general/character/copyright
  • Filtering: remove images with <10 general tags (WD v3 heuristic)
  • Setup: EVA02 encoder frozen; classification head continued training
  • Input: 448×448; standard Danbooru tag normalization
  • Augment: MixUp in embedding space (α=200)
  • Optim: Adam 1e-5, cycle schedule; batch 2048; full precision
  • Compute: ~1 day on 1× 8×H100 node
  • (Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)

Evaluation (what to expect)

Metric style: Fixed thresholds (above). Reported as micro-averaged unless noted.

  • All-tags (13k) micro-F1: ~0.60 (recall-leaning)
  • Character subset (4k) micro-F1: 0.865 @ t_char=0.75
  • Reference: WD v3 SwinV2 character F1 ≈ 0.608 (same protocol)

Internal “accuracy/coverage” snapshot

Model Coverage-F1 Accuracy-F1 Acc-Recall Acc-Precision Cov-Precision Cov-Recall
PixAI v0.9 0.4910 0.4403 0.6654 0.3634 0.4350 0.6547
WD-v3-EVA02 0.4155 0.4608 0.4465 0.5248 0.4580 0.4083
WD-v3-SwinV2 0.3349 0.3909 0.3603 0.4821 0.3906 0.3171
Camie-70k 0.4877 0.4800 0.5743 0.4123 0.4288 0.5930

Notes • Character uses t≈0.75; coverage often uses t≈0.10. • Keep micro vs macro consistent when updating numbers.

image/png

Note: Plots show internal candidate versions (v2.x). Current release is equivalent to pixai-tagger-v0.9 (ex-v2.4.1). Follow-up version is in progress.


Quick comparisons

A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’.

Topic PixAI Tagger v0.9 WD v3 (EVA02 / SwinV2) What it means in practice
Data snapshot Danbooru to 2025-01 Danbooru to 2024-02 Better coverage of newer IPs
Tag vocabulary ~13.5k tags ~10.8k tags More labels to catch long-tail
Character F1 ≈0.865 (@ 0.75 threshold) ~0.61 (SwinV2 ref) Stronger character recognition
Default posture Recall-leaning (tune down for precision) Often more balanced Good for search/curation; more false positives; set your own thresholds
Model size ~1.27 GB checkpoint Similar ballpark Easy to host; endpoint-friendly
Training strategy Head-only; encoder frozen (EVA02) Depends on release Faster iteration on data updates

Intended use

You can:

  • Auto-tag anime images with Danbooru-style tags
  • Build tag-search indices
  • Assist caption generation (merge tags with NL captions)
  • Feed tags into text-to-image pipelines (alone or alongside text)

Please don’t rely on it for:

  • Legal/safety moderation or age verification
  • Non-anime imagery (performance will drop)
  • Fine-grained counting/attributes without human review

Limitations & risks

  • NSFW & sensitive tags. The dataset contains them; outputs may too.
  • Recall vs precision. Low thresholds increase false positives.
  • Hallucinations. Number-sensitive or visually similar tags can be mispredicted.
  • Representation bias. Mirrors Danbooru’s styles, tropes, and demographics.
  • IP/character names. Can be wrong or incomplete; use allow/deny lists and co-occurrence rules.

Tuning tips

  • Set different thresholds for general vs character tags.
  • Consider allow/deny lists for your domain.
  • Add simple co-occurrence rules to suppress contradictions.

Authors / Contributors

  • Linso — primary contributor (training, data processing)
  • narugo1992 — contributions
  • AngelBottomless (PixAI) — contributions
  • trojblue (PixAI) — contributions
  • The rest of the PixAI team — further development support and testing

We also appreciate the broader anime image generation community. Several ideas, discussions, and experiments from outside PixAI helped shape this release.


Maintenance

  • We plan future releases with updated snapshots.
  • v1.0 will include updated tags + packaging improvements.
  • Changelog will live in the repo.

Other