|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/deberta-v3-large |
|
- HuggingFaceTB/SmolLM2-135M-Instruct |
|
pipeline_tag: token-classification |
|
tags: |
|
- NER |
|
- encoder |
|
- decoder |
|
- GLiNER |
|
- information-extraction |
|
library_name: gliner |
|
--- |
|
|
|
 |
|
|
|
**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner. |
|
This architecture combines: |
|
|
|
* An **encoder** for representing entity spans |
|
* A **decoder** for generating label names |
|
|
|
This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities. |
|
By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed. |
|
|
|
--- |
|
|
|
## Key Features |
|
|
|
* **Open ontology**: Works when the label set is unknown |
|
* **Multi-label entity recognition**: Assign multiple labels to a single entity |
|
* **Entity linking**: Handle large label sets via constrained generation |
|
* **Knowledge expansion**: Gain from large decoder models |
|
* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER |
|
|
|
--- |
|
|
|
## Installation |
|
|
|
Update to the latest version of GLiNER: |
|
|
|
```bash |
|
# until the new pip release, install from main to use the new architecture |
|
pip install git+https://github.com/urchade/GLiNER.git |
|
``` |
|
|
|
--- |
|
|
|
## Usage |
|
If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below: |
|
|
|
```python |
|
from gliner import GLiNER |
|
|
|
model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0") |
|
|
|
text = "Hugging Face is a company that advances and democratizes artificial intelligence through open source and science." |
|
|
|
labels = ["label"] |
|
|
|
model.predict_entities(text, labels, threshold=0.3, num_gen_sequences=1) |
|
``` |
|
|
|
If you need to run a model on many text and/or set some labels constraints, please check example below: |
|
|
|
```python |
|
from gliner import GLiNER |
|
|
|
model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0") |
|
|
|
text = ( |
|
"Apple was founded as Apple Computer Company on April 1, 1976, " |
|
"by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to " |
|
"develop and sell Wozniak's Apple I personal computer." |
|
) |
|
|
|
labels = ["person", "company", "date"] |
|
|
|
model.run([text], labels, threshold=0.3, num_gen_sequences=1) |
|
``` |
|
|
|
--- |
|
|
|
### Example Output |
|
|
|
```json |
|
[ |
|
[ |
|
{ |
|
"start": 21, |
|
"end": 26, |
|
"text": "Apple", |
|
"label": "company", |
|
"score": 0.6795641779899597, |
|
"generated labels": ["Organization"] |
|
}, |
|
{ |
|
"start": 47, |
|
"end": 60, |
|
"text": "April 1, 1976", |
|
"label": "date", |
|
"score": 0.44296327233314514, |
|
"generated labels": ["Date"] |
|
}, |
|
{ |
|
"start": 65, |
|
"end": 78, |
|
"text": "Steve Wozniak", |
|
"label": "person", |
|
"score": 0.9934439659118652, |
|
"generated labels": ["Person"] |
|
}, |
|
{ |
|
"start": 80, |
|
"end": 90, |
|
"text": "Steve Jobs", |
|
"label": "person", |
|
"score": 0.9725918769836426, |
|
"generated labels": ["Person"] |
|
}, |
|
{ |
|
"start": 107, |
|
"end": 119, |
|
"text": "Ronald Wayne", |
|
"label": "person", |
|
"score": 0.9964536428451538, |
|
"generated labels": ["Person"] |
|
} |
|
] |
|
] |
|
``` |
|
|
|
--- |
|
|
|
### Restricting the Decoder |
|
|
|
You can limit the decoder to generate labels only from a predefined set: |
|
|
|
```python |
|
model.run( |
|
text, labels, |
|
threshold=0.3, |
|
num_gen_sequences=1, |
|
gen_constraints=[ |
|
"organization", "organization type", "city", |
|
"technology", "date", "person" |
|
] |
|
) |
|
``` |
|
|
|
--- |
|
|
|
## Performance Tips |
|
|
|
Two label trie implementations are available. |
|
For a **faster, memory-efficient C++ version**, install **Cython**: |
|
|
|
```bash |
|
pip install cython |
|
``` |
|
|
|
This can significantly improve performance and reduce memory usage, especially with millions of labels. |
|
|