datause-annotation / guidelines.md
rafmacalaba's picture
init in ai4data org
1376fd4

A newer version of the Gradio SDK is available: 5.33.0

Upgrade

Entity Tag Guide

This document describes the annotation tags you will see in the NER / merged NER output. Each entity corresponds to a labeled span in the text.

Entity Meaning
match_named Model and ground-truth agree on an explicit, uniquely named dataset span.
actual_named A named dataset span present in the ground-truth but missed by the model.
pred_named A named dataset span predicted by the model but not in the ground-truth.
match_unnamed Model and ground-truth agree on a clearly described but unnamed dataset span.
actual_unnamed An unnamed dataset span present in the ground-truth but missed by the model.
pred_unnamed An unnamed dataset span predicted by the model but not in the ground-truth.
match_vague Model and ground-truth agree on a vague dataset mention (lacking specific identifying details).
actual_vague A vague dataset mention present in the ground-truth but missed by the model.
pred_vague A vague dataset mention predicted by the model but not in the ground-truth.
<span> <> acronym Relation: marks the dataset’s acronym (e.g. RUV <> acronym).
<span> <> data description Relation: describes what the dataset contains or how it was collected.
<span> <> data geography Relation: indicates the geographic coverage of the dataset (e.g. country, region).
<span> <> data source Relation: links to the original source or repository of the data.
<span> <> data type Relation: specifies the type of data (e.g. survey, census, register).
<span> <> geography Relation: connects the dataset to its referenced geography (may duplicate data geography).
<span> <> publication year Relation: the year the dataset (or its documentation) was published.
<span> <> publisher Relation: the organization or entity that published the dataset.
<span> <> reference year Relation: the year the data were actually collected or refer to.
<span> <> version Relation: the version identifier of the dataset (e.g. “v5”, “Version 2”).

Use this guide when reviewing model predictions to quickly identify correct matches, false positives, and false negatives, as well as any extracted relations.