davanstrien/eval-extraction-ner-v0

Token classifier trained on bootstrap NER labels from davanstrien/eval-mentions-bootstrap. Demonstrates the bootstrap-labels skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

Training data

Source: davanstrien/eval-mentions-bootstrap
Bootstrap model: GLiNER (via uv-scripts/gliner)
Score threshold: 0.8 (entities below this dropped)
Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
Train rows: 1194
Val rows: 133
Token-label distribution (excluding O):
- EVALUATION_METRIC: 7537
- BENCHMARK_NAME: 3104
- EVALUATION_DATASET: 1918

Eval results

Metric	Value
F1	0.5573
Precision	0.5838
Recall	0.5332
Accuracy	0.9870

(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

Caveats

This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the bootstrap-labels skill for the full workflow.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v0", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")

Downloads last month: -

Safetensors

Model size

66.4M params

Tensor type

F32

Model tree for davanstrien/eval-extraction-ner-v0

Base model

distilbert/distilbert-base-uncased

Finetuned

(11479)

this model