Entity Tag Guide

This document describes the annotation tags you will see in the NER / merged NER output. Each entity corresponds to a labeled span in the text.

Entity	Meaning
`match_named`	Model and ground-truth agree on an explicit, uniquely named dataset span.
`actual_named`	A named dataset span present in the ground-truth but missed by the model.
`pred_named`	A named dataset span predicted by the model but not in the ground-truth.
`match_unnamed`	Model and ground-truth agree on a clearly described but unnamed dataset span.
`actual_unnamed`	An unnamed dataset span present in the ground-truth but missed by the model.
`pred_unnamed`	An unnamed dataset span predicted by the model but not in the ground-truth.
`match_vague`	Model and ground-truth agree on a vague dataset mention (lacking specific identifying details).
`actual_vague`	A vague dataset mention present in the ground-truth but missed by the model.
`pred_vague`	A vague dataset mention predicted by the model but not in the ground-truth.
`<span> <> acronym`	Relation: marks the dataset’s acronym (e.g. `RUV <> acronym`).
`<span> <> data description`	Relation: describes what the dataset contains or how it was collected.
`<span> <> data geography`	Relation: indicates the geographic coverage of the dataset (e.g. country, region).
`<span> <> data source`	Relation: links to the original source or repository of the data.
`<span> <> data type`	Relation: specifies the type of data (e.g. survey, census, register).
`<span> <> geography`	Relation: connects the dataset to its referenced geography (may duplicate data geography).
`<span> <> publication year`	Relation: the year the dataset (or its documentation) was published.
`<span> <> publisher`	Relation: the organization or entity that published the dataset.
`<span> <> reference year`	Relation: the year the data were actually collected or refer to.
`<span> <> version`	Relation: the version identifier of the dataset (e.g. “v5”, “Version 2”).

Use this guide when reviewing model predictions to quickly identify correct matches, false positives, and false negatives, as well as any extracted relations.