---
license: mit
language:
- pl
base_model:
- EuroBERT/EuroBERT-210m
pipeline_tag: token-classification
tags:
- token classification
- hallucination detection
- transformers
- question answer
datasets:
- KRLabsOrg/ragtruth-pl-translated
---
# LettuceDetect: Polish Hallucination Detection Model
**Model Name:** lettucedect-210m-eurobert-pl-v1
**Organization:** KRLabsOrg
**Github:** https://github.com/KRLabsOrg/LettuceDetect
## Overview
LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on **EuroBERT-210M**, which has been specifically chosen for its extended context support (up to **8192 tokens**) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
**This is our Polish base model utilizing EuroBERT-210M architecture**
## Model Details
- **Architecture:** EuroBERT-210M with extended context support (up to 8192 tokens)
- **Task:** Token Classification / Hallucination Detection
- **Training Dataset:** RagTruth-PL (translated from the original RAGTruth dataset)
- **Language:** Polish
## How It Works
The model is trained to identify tokens in the Polish answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.
## Usage
### Installation
Install the 'lettucedetect' repository
```bash
pip install lettucedetect
```
### Using the model
```python
from lettucedetect.models.inference import HallucinationDetector
# For a transformer-based approach:
detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-210m-eurobert-pl-v1",
lang="pl",
trust_remote_code=True
)
contexts = ["Kopernikanizm to teoria astronomiczna opracowana przez Mikołaja Kopernika, zgodnie z którą Słońce znajduje się w centrum Układu Słonecznego, a Ziemia i inne planety krążą wokół niego. Teoria ta została opublikowana w dziele 'O obrotach sfer niebieskich' w 1543 roku."]
question = "Na czym polega teoria kopernikańska i kiedy została opublikowana?"
answer = "Teoria kopernikańska zakłada, że Ziemia jest jednym z wielu ciał niebieskich krążących wokół Słońca. Kopernik opracował również zaawansowane równania matematyczne opisujące ruch satelitów, które zostały wykorzystane w XX wieku w programie kosmicznym NASA. Teoria została opublikowana w 1543 roku."
# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Przewidywania:", predictions)
# Przewidywania: [{'start': 9, 'end': 19, 'confidence': 0.93, 'text': ' również zaawansowane równania matematyczne opisujące ruch satelitów, które zostały wykorzystane w XX wieku w programie kosmicznym NASA.'}]
```
## Performance
**Results on Translated RAGTruth-PL**
We evaluate our Polish models on translated versions of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. The EuroBERT-210M Polish model achieves an F1 score of 66.46%, significantly outperforming prompt-based methods like GPT-4.1-mini (59.27%).
For detailed performance metrics across different languages, see the table below:
| Language | Model | Precision (%) | Recall (%) | F1 (%) | GPT-4.1-mini F1 (%) | Δ F1 (%) |
|----------|-----------------|---------------|------------|--------|---------------------|----------|
| Polish | EuroBERT-210M | 63.62 | 69.57 | 66.46 | 59.27 | +7.19 |
| Polish | EuroBERT-610M | 77.16 | 69.36 | 73.05 | 59.27 | +13.78 |
While the 610M variant achieves higher performance, the 210M model offers a good balance between accuracy and computational efficiency, processing examples approximately 3× faster.
### Manual Validation
We performed additional validation on a manually reviewed set of 100 examples covering diverse task types (QA, summarization, data-to-text). The EuroBERT-210M Polish model maintained strong performance with an F1 score of 68.32% on this curated dataset.
| Model | Precision (%) | Recall (%) | F1 (%) |
|---------------|---------------|------------|--------|
| EuroBERT-210M | 68.32 | 68.32 | 68.32 |
## Citing
If you use the model or the tool, please cite the following paper:
```bibtex
@misc{Kovacs:2025,
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
author={Ádám Kovács and Gábor Recski},
year={2025},
eprint={2502.17125},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.17125},
}
```