pixai-tagger-v0.9 / README.md

Update README.md

d8958b2 verified 2 days ago

9.25 kB

	---
	license: apache-2.0
	pipeline_tag: image-classification
	tags:
	- multi-label
	- anime
	- danbooru
	---

	<p align="center">
	<img src="./banner_09_cropped.jpg" style="height:240px;" />
	</p>


	<p align="center">
	<a href="https://huggingface.co/pixai-labs/pixai-tagger-v0.9"><strong>Model</strong></a> ·
	<a href="https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo"><strong>Demo</strong></a> ·
	<a href="#quickstart"><strong>Quickstart</strong></a> ·
	<a href="#quick-comparisons"><strong>Quick comparisons</strong></a>
	</p>


	<br>

	# PixAI Tagger v0.9

	A practical anime multi-label tagger. Not trying to win benchmarks; trying to be useful.
	High recall, updated character coverage, trained on a fresh Danbooru snapshot (2025-01).
	We’ll keep shipping: v1.0 (with updated tags) is next.

	> TL;DR
	>
	> - ~13.5k Danbooru-style tags (general, character, copyright)
	> - Headline: strong character performance; recall-leaning defaults
	> - Built for search, dataset curation, caption assistance, and text-to-image conditioning

	---

	## What it is (in one breath)

	`pixai-tagger-v0.9` is a multi-label image classifier for anime images. It predicts Danbooru-style tags and aims to find more of the right stuff (recall) so you can filter later. We continued training the classification head of EVA02 (from WD v3) on a newer dataset, and used embedding-space MixUp to help calibration.

	- Last trained: 2025-04
	- Data snapshot: Danbooru IDs 1–8,600,750 (2025-01)
	- Finetuned from: `SmilingWolf/wd-eva02-large-tagger-v3` (encoder frozen)
	- License (weights): Apache 2.0 (Note: Danbooru content has its own licenses.)

	---

	## Why you might care

	- Newer data. Catches more recent IPs/characters.
	- Recall-first defaults. Good for search and curation; dial thresholds for precision.
	- Character focus. We spent time here; it shows up in evals.
	- Simple to run. Works as an endpoint or locally; small set of knobs.

	---

	## Quickstart

	Recommended defaults (balanced):

	- `top_k = 128`
	- `threshold_general = 0.30`
	- `threshold_character = 0.75`

	Coverage preset (recall-heavier): `threshold_general = 0.10` (expect more false positives)

	### 1) Inference Endpoint

	Deploy as an HF Inference Endpoint and test with the following command:

	```bash
	# Replace with your own endpoint URL
	curl "https://YOUR_ENDPOINT_URL.huggingface.cloud" \
	-X POST \
	-H "Accept: application/json" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": {"url": "https://your.cdn/image.jpg"},
	"parameters": {
	"top_k": 128,
	"threshold_general": 0.10,
	"threshold_character": 0.75
	}
	}'
	```

	### 2) Python (InferenceClient)

	```python
	from huggingface_hub import InferenceClient

	client = InferenceClient("https://YOUR_ENDPOINT_URL.huggingface.cloud")
	out = client.post(json={
	"inputs": {"url": "https://your.cdn/image.jpg"},
	"parameters": {"top_k": 128, "threshold_general": 0.10, "threshold_character": 0.75}
	})
	# out: [{"tag": "1girl", "score": 0.97, "group": "general"}, {"tag": "mika_(blue_archive)", "score": 0.92, "group": "character"}, ...]

	```

	### 3) Local Deployment

	- Minimal Script: See [`handler.py`](https://huggingface.co/pixai-labs/pixai-tagger-v0.9/blob/main/handler.py) under Files for a minimal script.
	- Demo UI: our [Huggingface Space](https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo) above or this [Huggingface Space from DeepGHS](https://huggingface.co/spaces/deepghs/pixai-tagger-v0.9-demo).
	- `pip` + direct weights: TBD (planned for v1.0).

	Also , this Tagger can be used via the [imgutils tool](https://dghs-imgutils.deepghs.org/main/api_doc/tagging/pixai.html).


	------

	## Training notes (short version)

	- Source: Danbooru (IDs 1–8,600,750; snapshot 2025-01)
	- Tag set: ~13,461 tags (≥600 occurrences); grouped as general/character/copyright
	- Filtering: remove images with <10 general tags (WD v3 heuristic)
	- Setup: EVA02 encoder frozen; classification head continued training
	- Input: 448×448; standard Danbooru tag normalization
	- Augment: MixUp in embedding space (α=200)
	- Optim: Adam 1e-5, cycle schedule; batch 2048; full precision
	- Compute: ~1 day on 1× 8×H100 node
	- (Explored full-backbone training; deferred—head-only was more stable and faster for data iteration.)

	------

	## Evaluation (what to expect)

	Metric style: Fixed thresholds (above). Reported as micro-averaged unless noted.

	- All-tags (13k) micro-F1: ~0.60 (recall-leaning)
	- Character subset (4k) micro-F1: 0.865 @ `t_char=0.75`
	- Reference: WD v3 SwinV2 character F1 ≈ 0.608 (same protocol)

	Internal “accuracy/coverage” snapshot

	\| Model \| Coverage-F1 \| Accuracy-F1 \| Acc-Recall \| Acc-Precision \| Cov-Precision \| Cov-Recall \|
	\| -------------- \| ----------- \| ----------- \| ---------- \| ------------- \| ------------- \| ---------- \|
	\| PixAI v0.9 \| 0.4910 \| 0.4403 \| 0.6654 \| 0.3634 \| 0.4350 \| 0.6547 \|
	\| WD-v3-EVA02 \| 0.4155 \| 0.4608 \| 0.4465 \| 0.5248 \| 0.4580 \| 0.4083 \|
	\| WD-v3-SwinV2 \| 0.3349 \| 0.3909 \| 0.3603 \| 0.4821 \| 0.3906 \| 0.3171 \|
	\| Camie-70k \| 0.4877 \| 0.4800 \| 0.5743 \| 0.4123 \| 0.4288 \| 0.5930 \|
	> Notes
	> • Character uses `t≈0.75`; coverage often uses `t≈0.10`.
	> • Keep micro vs macro consistent when updating numbers.




	![image/png](https://cdn-uploads.huggingface.co/production/uploads/636982a164aad59d4d42714b/6QW7wK_GqKzr6037REnCP.png)

	> Note: Plots show internal candidate versions (v2.x). Current release is equivalent to `pixai-tagger-v0.9` (ex-`v2.4.1`). Follow-up version is in progress.

	------

	## Quick comparisons

	A fast feel for where v0.9 sits. Numbers are from our protocol and may differ from others’.

	\| Topic \| PixAI Tagger v0.9 \| WD v3 (EVA02 / SwinV2) \| What it means in practice \|
	\| --------------------- \| ---------------------------------------- \| ---------------------- \| ------------------------------------------------------------ \|
	\| Data snapshot \| Danbooru to 2025-01 \| Danbooru to 2024-02 \| Better coverage of newer IPs \|
	\| Tag vocabulary \| ~13.5k tags \| ~10.8k tags \| More labels to catch long-tail \|
	\| Character F1 \| ≈0.865 (@ 0.75 threshold) \| ~0.61 (SwinV2 ref) \| Stronger character recognition \|
	\| Default posture \| Recall-leaning (tune down for precision) \| Often more balanced \| Good for search/curation; more false positives; set your own thresholds \|
	\| Model size \| ~1.27 GB checkpoint \| Similar ballpark \| Easy to host; endpoint-friendly \|
	\| Training strategy \| Head-only; encoder frozen (EVA02) \| Depends on release \| Faster iteration on data updates \|

	------

	## Intended use

	You can:

	- Auto-tag anime images with Danbooru-style tags
	- Build tag-search indices
	- Assist caption generation (merge tags with NL captions)
	- Feed tags into text-to-image pipelines (alone or alongside text)

	Please don’t rely on it for:

	- Legal/safety moderation or age verification
	- Non-anime imagery (performance will drop)
	- Fine-grained counting/attributes without human review

	------

	## Limitations & risks

	- NSFW & sensitive tags. The dataset contains them; outputs may too.
	- Recall vs precision. Low thresholds increase false positives.
	- Hallucinations. Number-sensitive or visually similar tags can be mispredicted.
	- Representation bias. Mirrors Danbooru’s styles, tropes, and demographics.
	- IP/character names. Can be wrong or incomplete; use allow/deny lists and co-occurrence rules.

	Tuning tips

	- Set different thresholds for general vs character tags.
	- Consider allow/deny lists for your domain.
	- Add simple co-occurrence rules to suppress contradictions.

	------

	## Authors / Contributors

	- [Linso](https://huggingface.co/richard-guyunqi) — primary contributor (training, data processing)
	- [narugo1992](https://huggingface.co/narugo1992) — contributions
	- [AngelBottomless](https://huggingface.co/AngelBottomless) (PixAI) — contributions
	- [trojblue](https://huggingface.co/trojblue) (PixAI) — contributions
	- The rest of the PixAI team — further development support and testing

	We also appreciate the broader anime image generation community. Several ideas, discussions, and experiments from outside PixAI helped shape this release.

	---

	## Maintenance

	- We plan future releases with updated snapshots.
	- v1.0 will include updated tags + packaging improvements.
	- Changelog will live in the repo.

	## Other
	- There is an [ONNX version of this Tagger provided by DeepGHS](https://huggingface.co/deepghs/pixai-tagger-v0.9-onnx), thanks!