|
--- |
|
license: apache-2.0 |
|
pipeline_tag: image-classification |
|
tags: |
|
- multi-label |
|
- anime |
|
- danbooru |
|
--- |
|
|
|
<p align="center"> |
|
<img src="./banner_09_cropped.jpg" style="height:240px;" /> |
|
</p> |
|
|
|
|
|
<p align="center"> |
|
<a href="https://huggingface.co/pixai-labs/pixai-tagger-v0.9"><strong>Model</strong></a> · |
|
<a href="https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo"><strong>Demo</strong></a> · |
|
<a href="#quickstart"><strong>Quickstart</strong></a> · |
|
<a href="#quick-comparisons"><strong>Quick comparisons</strong></a> |
|
</p> |
|
|
|
|
|
<br> |
|
|
|
# PixAI Tagger v0.9 |
|
|
|
A practical anime **multi-label tagger**. Not trying to win benchmarks; trying to be useful. |
|
**High recall**, updated **character coverage**, trained on a fresh Danbooru snapshot (2025-01). |
|
We’ll keep shipping: **v1.0** (with updated tags) is next. |
|
|
|
> TL;DR |
|
> |
|
> - ~**13.5k** Danbooru-style tags (**general**, **character**, **copyright**) |
|
> - Headline: strong **character** performance; recall-leaning defaults |
|
> - Built for search, dataset curation, caption assistance, and text-to-image conditioning |
|
|
|
--- |
|
|
|
## What it is (in one breath) |
|
|
|
`pixai-tagger-v0.9` is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to **find more of the right stuff** (recall) so you can filter later. We continued training the **classification head** of EVA02 (from WD v3) on a newer dataset, and used **embedding-space MixUp** to help calibration. |
|
|
|
- **Last trained:** 2025-04 |
|
- **Data snapshot:** Danbooru IDs 1–8,600,750 (2025-01) |
|
- **Finetuned from:** `SmilingWolf/wd-eva02-large-tagger-v3` (encoder frozen) |
|
- **License (weights):** Apache 2.0 *(Note: Danbooru content has its own licenses.)* |
|
|
|
--- |
|
|
|
## Why you might care |
|
|
|
- **Newer data.** Catches more recent IPs/characters. |
|
- **Recall-first defaults.** Good for search and curation; dial thresholds for precision. |
|
- **Character focus.** We spent time here; it shows up in evals. |
|
- **Simple to run.** Works as an endpoint or locally; small set of knobs. |
|
|
|
--- |
|
|
|
## Quickstart |
|
|
|
**Recommended defaults (balanced):** |
|
|
|
- `top_k = 128` |
|
- `threshold_general = 0.30` |
|
- `threshold_character = 0.75` |
|
|
|
**Coverage preset (recall-heavier):** `threshold_general = 0.10` (expect more false positives) |
|
|
|
### 1) Inference Endpoint |
|
|
|
Deploy as an HF Inference Endpoint and test with the following command: |
|
|
|
```bash |
|
# Replace with your own endpoint URL |
|
curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \ |
|
-X POST \ |
|
-H "Accept: application/json" \ |
|
-H "Content-Type: application/json" \ |
|
-d '{ |
|
"inputs": {"url": "https://your.cdn/image.jpg"}, |
|
"parameters": { |
|
"top_k": 128, |
|
"threshold_general": 0.10, |
|
"threshold_character": 0.75 |
|
} |
|
}' |
|
``` |
|
|
|
### 2) Python (InferenceClient) |
|
|
|
```python |
|
from huggingface_hub import InferenceClient |
|
|
|
client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud") |
|
out = client.post(json={ |
|
"inputs": {"url": "https://your.cdn/image.jpg"}, |
|
"parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75} |
|
}) |
|
# out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...] |
|
|
|
``` |
|
|
|
### 3) Local Deployment |
|
|
|
- **Minimal Script**: See [`handler.py`](https://huggingface.co/pixai-labs/pixai-tagger-v0.9/blob/main/handler.py) under **Files** for a minimal script. |
|
- **Demo UI**: our [Huggingface Space](https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo) above or this [Huggingface Space from DeepGHS](https://huggingface.co/spaces/deepghs/pixai-tagger-v0.9-demo). |
|
- `pip` + direct weights: **TBD** (planned for v1.0). |
|
|
|
Also , this Tagger can be used via the [imgutils tool](https://dghs-imgutils.deepghs.org/main/api_doc/tagging/pixai.html). |
|
|
|
|
|
------ |
|
|
|
## Training notes (short version) |
|
|
|
- **Source:** Danbooru (IDs 1–8,600,750; snapshot 2025-01) |
|
- **Tag set:** ~**13,461** tags (≥600 occurrences); grouped as general/character/copyright |
|
- **Filtering:** remove images with **<10 general tags** (WD v3 heuristic) |
|
- **Setup:** EVA02 encoder **frozen**; classification head **continued training** |
|
- **Input:** 448×448; standard Danbooru tag normalization |
|
- **Augment:** **MixUp in embedding space** (α=200) |
|
- **Optim:** Adam 1e-5, cycle schedule; batch 2048; full precision |
|
- **Compute:** ~**1 day** on **1× 8×H100** node |
|
- *(Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)* |
|
|
|
------ |
|
|
|
## Evaluation (what to expect) |
|
|
|
**Metric style:** Fixed thresholds (above). Reported as **micro-averaged** unless noted. |
|
|
|
- **All-tags (13k) micro-F1:** ~**0.60** (recall-leaning) |
|
- **Character subset (4k) micro-F1:** **0.865** @ `t_char=0.75` |
|
- Reference: **WD v3 SwinV2** character F1 ≈ **0.608** (same protocol) |
|
|
|
**Internal “accuracy/coverage” snapshot** |
|
|
|
| Model | Coverage-F1 | Accuracy-F1 | Acc-Recall | Acc-Precision | Cov-Precision | Cov-Recall | |
|
| -------------- | ----------- | ----------- | ---------- | ------------- | ------------- | ---------- | |
|
| **PixAI v0.9** | **0.4910** | 0.4403 | 0.6654 | 0.3634 | 0.4350 | 0.6547 | |
|
| WD-v3-EVA02 | 0.4155 | 0.4608 | 0.4465 | **0.5248** | 0.4580 | 0.4083 | |
|
| WD-v3-SwinV2 | 0.3349 | 0.3909 | 0.3603 | 0.4821 | 0.3906 | 0.3171 | |
|
| Camie-70k | 0.4877 | 0.4800 | 0.5743 | 0.4123 | 0.4288 | 0.5930 | |
|
> Notes |
|
> • Character uses `t≈0.75`; coverage often uses `t≈0.10`. |
|
> • Keep micro vs macro consistent when updating numbers. |
|
|
|
|
|
|
|
|
|
 |
|
|
|
> Note: Plots show internal candidate versions (v2.x). Current release is equivalent to `pixai-tagger-v0.9` (ex-`v2.4.1`). Follow-up version is in progress. |
|
|
|
------ |
|
|
|
## Quick comparisons |
|
|
|
A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’. |
|
|
|
| Topic | PixAI Tagger v0.9 | WD v3 (EVA02 / SwinV2) | What it means in practice | |
|
| --------------------- | ---------------------------------------- | ---------------------- | ------------------------------------------------------------ | |
|
| **Data snapshot** | Danbooru to **2025-01** | Danbooru to 2024-02 | Better coverage of newer IPs | |
|
| **Tag vocabulary** | ~**13.5k** tags | ~10.8k tags | More labels to catch long-tail | |
|
| **Character F1** | **≈0.865** (@ 0.75 threshold) | ~0.61 (SwinV2 ref) | Stronger character recognition | |
|
| **Default posture** | Recall-leaning (tune down for precision) | Often more balanced | Good for search/curation; more false positives; set your own thresholds | |
|
| **Model size** | **~1.27 GB** checkpoint | Similar ballpark | Easy to host; endpoint-friendly | |
|
| **Training strategy** | Head-only; encoder frozen (EVA02) | Depends on release | Faster iteration on data updates | |
|
|
|
------ |
|
|
|
## Intended use |
|
|
|
**You can:** |
|
|
|
- Auto-tag anime images with Danbooru-style tags |
|
- Build tag-search indices |
|
- Assist caption generation (merge tags with NL captions) |
|
- Feed tags into **text-to-image** pipelines (alone or alongside text) |
|
|
|
**Please don’t rely on it for:** |
|
|
|
- Legal/safety moderation or age verification |
|
- Non-anime imagery (performance will drop) |
|
- Fine-grained counting/attributes without human review |
|
|
|
------ |
|
|
|
## Limitations & risks |
|
|
|
- **NSFW & sensitive tags.** The dataset contains them; outputs may too. |
|
- **Recall vs precision.** Low thresholds increase false positives. |
|
- **Hallucinations.** Number-sensitive or visually similar tags can be mispredicted. |
|
- **Representation bias.** Mirrors Danbooru’s styles, tropes, and demographics. |
|
- **IP/character names.** Can be wrong or incomplete; use allow/deny lists and co-occurrence rules. |
|
|
|
**Tuning tips** |
|
|
|
- Set **different thresholds** for general vs character tags. |
|
- Consider **allow/deny lists** for your domain. |
|
- Add simple **co-occurrence rules** to suppress contradictions. |
|
|
|
------ |
|
|
|
## Authors / Contributors |
|
|
|
- **[Linso](https://huggingface.co/richard-guyunqi)** — primary contributor (training, data processing) |
|
- **[narugo1992](https://huggingface.co/narugo1992)** — contributions |
|
- **[AngelBottomless](https://huggingface.co/AngelBottomless)** (PixAI) — contributions |
|
- **[trojblue](https://huggingface.co/trojblue)** (PixAI) — contributions |
|
- The rest of the PixAI team — further development support and testing |
|
|
|
**We also appreciate the broader anime image generation community.** Several ideas, discussions, and experiments from outside PixAI helped shape this release. |
|
|
|
--- |
|
|
|
## Maintenance |
|
|
|
- We plan **future releases** with updated snapshots. |
|
- v1.0 will include updated tags + packaging improvements. |
|
- Changelog will live in the repo. |
|
|
|
## Other |
|
- There is an [ONNX version of this Tagger provided by DeepGHS](https://huggingface.co/deepghs/pixai-tagger-v0.9-onnx), thanks! |