davanstrien/eval-extraction-ner-v0
Token classifier trained on bootstrap NER labels from davanstrien/eval-mentions-bootstrap. Demonstrates the bootstrap-labels skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.
Training data
- Source:
davanstrien/eval-mentions-bootstrap - Bootstrap model: GLiNER (via
uv-scripts/gliner) - Score threshold: 0.8 (entities below this dropped)
- Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
- Train rows: 1194
- Val rows: 133
- Token-label distribution (excluding
O):- EVALUATION_METRIC: 7537
- BENCHMARK_NAME: 3104
- EVALUATION_DATASET: 1918
Eval results
| Metric | Value |
|---|---|
| F1 | 0.5573 |
| Precision | 0.5838 |
| Recall | 0.5332 |
| Accuracy | 0.9870 |
(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)
Caveats
- This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
- The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the bootstrap-labels skill for the full workflow.
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v0", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")
- Downloads last month
- -
Model tree for davanstrien/eval-extraction-ner-v0
Base model
distilbert/distilbert-base-uncased