Update README.md

9c5efad verified 19 days ago

3.98 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- microsoft/deberta-v3-large
	- HuggingFaceTB/SmolLM2-135M-Instruct
	pipeline_tag: token-classification
	tags:
	- NER
	- encoder
	- decoder
	- GLiNER
	- information-extraction
	library_name: gliner
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)

	GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type in a zero-shot manner.
	This architecture combines:

	* An encoder for representing entity spans
	* A decoder for generating label names

	This hybrid approach enables new use cases such as entity linking and expands GLiNER’s capabilities.
	By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their richer knowledge capacity while maintaining competitive inference speed.

	---

	## Key Features

	* Open ontology: Works when the label set is unknown
	* Multi-label entity recognition: Assign multiple labels to a single entity
	* Entity linking: Handle large label sets via constrained generation
	* Knowledge expansion: Gain from large decoder models
	* Efficient: Minimal speed reduction on GPU compared to single-encoder GLiNER

	---

	## Installation

	Update to the latest version of GLiNER:

	```bash
	# until the new pip release, install from main to use the new architecture
	pip install git+https://github.com/urchade/GLiNER.git
	```

	---

	## Usage
	If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below:

	```python
	from gliner import GLiNER

	model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")

	text = "Hugging Face is a company that advances and democratizes artificial intelligence through open source and science."

	labels = ["label"]

	model.predict_entities(text, labels, threshold=0.3, num_gen_sequences=1)
	```

	If you need to run a model on many text and/or set some labels constraints, please check example below:

	```python
	from gliner import GLiNER

	model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")

	text = (
	"Apple was founded as Apple Computer Company on April 1, 1976, "
	"by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
	"develop and sell Wozniak's Apple I personal computer."
	)

	labels = ["person", "company", "date"]

	model.run([text], labels, threshold=0.3, num_gen_sequences=1)
	```

	---

	### Example Output

	```json
	[
	[
	{
	"start": 21,
	"end": 26,
	"text": "Apple",
	"label": "company",
	"score": 0.6795641779899597,
	"generated labels": ["Organization"]
	},
	{
	"start": 47,
	"end": 60,
	"text": "April 1, 1976",
	"label": "date",
	"score": 0.44296327233314514,
	"generated labels": ["Date"]
	},
	{
	"start": 65,
	"end": 78,
	"text": "Steve Wozniak",
	"label": "person",
	"score": 0.9934439659118652,
	"generated labels": ["Person"]
	},
	{
	"start": 80,
	"end": 90,
	"text": "Steve Jobs",
	"label": "person",
	"score": 0.9725918769836426,
	"generated labels": ["Person"]
	},
	{
	"start": 107,
	"end": 119,
	"text": "Ronald Wayne",
	"label": "person",
	"score": 0.9964536428451538,
	"generated labels": ["Person"]
	}
	]
	]
	```

	---

	### Restricting the Decoder

	You can limit the decoder to generate labels only from a predefined set:

	```python
	model.run(
	text, labels,
	threshold=0.3,
	num_gen_sequences=1,
	gen_constraints=[
	"organization", "organization type", "city",
	"technology", "date", "person"
	]
	)
	```

	---

	## Performance Tips

	Two label trie implementations are available.
	For a faster, memory-efficient C++ version, install Cython:

	```bash
	pip install cython
	```

	This can significantly improve performance and reduce memory usage, especially with millions of labels.