Spaces:

ai4data
/

datause-annotation

Sleeping

File size: 4,519 Bytes

1376fd4

# Entity Tag Guide

This document describes the annotation tags you will see in the NER / merged NER output. Each **entity** corresponds to a labeled span in the text.

| Entity                      | Meaning                                                                                                                                                       |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`match_named`**           | Model and ground-truth agree on an explicit, uniquely named dataset span.                                                                                     |
| **`actual_named`**          | A named dataset span present in the ground-truth but missed by the model.                                                                                     |
| **`pred_named`**            | A named dataset span predicted by the model but not in the ground-truth.                                                                                      |
| **`match_unnamed`**         | Model and ground-truth agree on a clearly described but unnamed dataset span.                                                                                 |
| **`actual_unnamed`**        | An unnamed dataset span present in the ground-truth but missed by the model.                                                                                  |
| **`pred_unnamed`**          | An unnamed dataset span predicted by the model but not in the ground-truth.                                                                                   |
| **`match_vague`**           | Model and ground-truth agree on a vague dataset mention (lacking specific identifying details).                                                               |
| **`actual_vague`**          | A vague dataset mention present in the ground-truth but missed by the model.                                                                                  |
| **`pred_vague`**            | A vague dataset mention predicted by the model but not in the ground-truth.                                                                                   |
| **`<span> <> acronym`**               | Relation: marks the dataset’s acronym (e.g. `RUV <> acronym`).                                                                                                      |
| **`<span> <> data description`**      | Relation: describes what the dataset contains or how it was collected.                                                                                            |
| **`<span> <> data geography`**        | Relation: indicates the geographic coverage of the dataset (e.g. country, region).                                                                                 |
| **`<span> <> data source`**           | Relation: links to the original source or repository of the data.                                                                                                 |
| **`<span> <> data type`**             | Relation: specifies the type of data (e.g. survey, census, register).                                                                                              |
| **`<span> <> geography`**             | Relation: connects the dataset to its referenced geography (may duplicate data geography).                                                                         |
| **`<span> <> publication year`**      | Relation: the year the dataset (or its documentation) was published.                                                                                               |
| **`<span> <> publisher`**             | Relation: the organization or entity that published the dataset.                                                                                                   |
| **`<span> <> reference year`**        | Relation: the year the data were actually collected or refer to.                                                                                                   |
| **`<span> <> version`**               | Relation: the version identifier of the dataset (e.g. “v5”, “Version 2”).                                                                                          |

Use this guide when reviewing model predictions to quickly identify correct matches, false positives, and false negatives, as well as any extracted relations.