davanstrien/eval-extraction-ner-v2

Token classifier trained on bootstrap NER labels from davanstrien/eval-mentions-bootstrap-v2. Demonstrates the bootstrap-labels skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

Training data

Source: davanstrien/eval-mentions-bootstrap-v2
Bootstrap model: GLiNER (via uv-scripts/gliner)
Score threshold: 0.8 (entities below this dropped)
Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
Train rows: 306
Val rows: 35
Token-label distribution (excluding O):
- BENCHMARK_NAME: 3663
- EVALUATION_METRIC: 719

Eval results

Metric	Value
F1	0.0000
Precision	0.0000
Recall	0.0000
Accuracy	0.9756

(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

Caveats

This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the bootstrap-labels skill for the full workflow.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v2", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")

Downloads last month: -

Safetensors

Model size

66.4M params

Tensor type

F32

Model tree for davanstrien/eval-extraction-ner-v2

Base model

distilbert/distilbert-base-uncased

Finetuned

(11480)

this model