davanstrien/eval-extraction-ner-v2

Token classifier trained on bootstrap NER labels from davanstrien/eval-mentions-bootstrap-v2. Demonstrates the bootstrap-labels skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

Training data

  • Source: davanstrien/eval-mentions-bootstrap-v2
  • Bootstrap model: GLiNER (via uv-scripts/gliner)
  • Score threshold: 0.8 (entities below this dropped)
  • Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
  • Train rows: 306
  • Val rows: 35
  • Token-label distribution (excluding O):
    • BENCHMARK_NAME: 3663
    • EVALUATION_METRIC: 719

Eval results

Metric Value
F1 0.0000
Precision 0.0000
Recall 0.0000
Accuracy 0.9756

(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

Caveats

  • This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
  • The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the bootstrap-labels skill for the full workflow.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v2", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")
Downloads last month
-
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davanstrien/eval-extraction-ner-v2

Finetuned
(11480)
this model