Ihor's picture
Update README.md
9c5efad verified
---
license: apache-2.0
language:
- en
base_model:
- microsoft/deberta-v3-large
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: token-classification
tags:
- NER
- encoder
- decoder
- GLiNER
- information-extraction
library_name: gliner
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)
**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
This architecture combines:
* An **encoder** for representing entity spans
* A **decoder** for generating label names
This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
---
## Key Features
* **Open ontology**: Works when the label set is unknown
* **Multi-label entity recognition**: Assign multiple labels to a single entity
* **Entity linking**: Handle large label sets via constrained generation
* **Knowledge expansion**: Gain from large decoder models
* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
---
## Installation
Update to the latest version of GLiNER:
```bash
# until the new pip release, install from main to use the new architecture
pip install git+https://github.com/urchade/GLiNER.git
```
---
## Usage
If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below:
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")
text = "Hugging Face is a company that advances and democratizes artificial intelligence through open source and science."
labels = ["label"]
model.predict_entities(text, labels, threshold=0.3, num_gen_sequences=1)
```
If you need to run a model on many text and/or set some labels constraints, please check example below:
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")
text = (
"Apple was founded as Apple Computer Company on April 1, 1976, "
"by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
"develop and sell Wozniak's Apple I personal computer."
)
labels = ["person", "company", "date"]
model.run([text], labels, threshold=0.3, num_gen_sequences=1)
```
---
### Example Output
```json
[
[
{
"start": 21,
"end": 26,
"text": "Apple",
"label": "company",
"score": 0.6795641779899597,
"generated labels": ["Organization"]
},
{
"start": 47,
"end": 60,
"text": "April 1, 1976",
"label": "date",
"score": 0.44296327233314514,
"generated labels": ["Date"]
},
{
"start": 65,
"end": 78,
"text": "Steve Wozniak",
"label": "person",
"score": 0.9934439659118652,
"generated labels": ["Person"]
},
{
"start": 80,
"end": 90,
"text": "Steve Jobs",
"label": "person",
"score": 0.9725918769836426,
"generated labels": ["Person"]
},
{
"start": 107,
"end": 119,
"text": "Ronald Wayne",
"label": "person",
"score": 0.9964536428451538,
"generated labels": ["Person"]
}
]
]
```
---
### Restricting the Decoder
You can limit the decoder to generate labels only from a predefined set:
```python
model.run(
text, labels,
threshold=0.3,
num_gen_sequences=1,
gen_constraints=[
"organization", "organization type", "city",
"technology", "date", "person"
]
)
```
---
## Performance Tips
Two label trie implementations are available.
For a **faster, memory-efficient C++ version**, install **Cython**:
```bash
pip install cython
```
This can significantly improve performance and reduce memory usage, especially with millions of labels.