File size: 3,980 Bytes
c4bc283
 
 
 
 
 
 
 
 
 
 
 
 
 
9c5efad
c4bc283
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fcb0eed
 
c4bc283
 
 
 
 
91bc6c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4bc283
 
 
 
5cda6d2
c4bc283
 
 
 
 
 
 
91bc6c6
c4bc283
93206e8
c4bc283
 
 
 
 
 
91bc6c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4bc283
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
license: apache-2.0
language:
- en
base_model:
- microsoft/deberta-v3-large
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: token-classification
tags:
- NER
- encoder
- decoder
- GLiNER
- information-extraction
library_name: gliner
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)

**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
This architecture combines:

* An **encoder** for representing entity spans
* A **decoder** for generating label names

This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.

---

## Key Features

* **Open ontology**: Works when the label set is unknown
* **Multi-label entity recognition**: Assign multiple labels to a single entity
* **Entity linking**: Handle large label sets via constrained generation
* **Knowledge expansion**: Gain from large decoder models
* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER

---

## Installation

Update to the latest version of GLiNER:

```bash
# until the new pip release, install from main to use the new architecture
pip install git+https://github.com/urchade/GLiNER.git
```

---

## Usage
If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below:

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")

text = "Hugging Face is a company that advances and democratizes artificial intelligence through open source and science."

labels = ["label"]

model.predict_entities(text, labels, threshold=0.3, num_gen_sequences=1)
```

If you need to run a model on many text and/or set some labels constraints, please check example below:

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")

text = (
    "Apple was founded as Apple Computer Company on April 1, 1976, "
    "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
    "develop and sell Wozniak's Apple I personal computer."
)

labels = ["person", "company", "date"]

model.run([text], labels, threshold=0.3, num_gen_sequences=1)
```

---

### Example Output

```json
[
  [
    {
      "start": 21,
      "end": 26,
      "text": "Apple",
      "label": "company",
      "score": 0.6795641779899597,
      "generated labels": ["Organization"]
    },
    {
      "start": 47,
      "end": 60,
      "text": "April 1, 1976",
      "label": "date",
      "score": 0.44296327233314514,
      "generated labels": ["Date"]
    },
    {
      "start": 65,
      "end": 78,
      "text": "Steve Wozniak",
      "label": "person",
      "score": 0.9934439659118652,
      "generated labels": ["Person"]
    },
    {
      "start": 80,
      "end": 90,
      "text": "Steve Jobs",
      "label": "person",
      "score": 0.9725918769836426,
      "generated labels": ["Person"]
    },
    {
      "start": 107,
      "end": 119,
      "text": "Ronald Wayne",
      "label": "person",
      "score": 0.9964536428451538,
      "generated labels": ["Person"]
    }
  ]
]
```

---

### Restricting the Decoder

You can limit the decoder to generate labels only from a predefined set:

```python
model.run(
    text, labels,
    threshold=0.3,
    num_gen_sequences=1,
    gen_constraints=[
        "organization", "organization type", "city",
        "technology", "date", "person"
    ]
)
```

---

## Performance Tips

Two label trie implementations are available.
For a **faster, memory-efficient C++ version**, install **Cython**:

```bash
pip install cython
```

This can significantly improve performance and reduce memory usage, especially with millions of labels.