--- license: mit language: - pl base_model: - EuroBERT/EuroBERT-210m pipeline_tag: token-classification tags: - token classification - hallucination detection - transformers - question answer datasets: - KRLabsOrg/ragtruth-pl-translated --- # LettuceDetect: Polish Hallucination Detection Model

LettuceDetect Logo

**Model Name:** lettucedect-210m-eurobert-pl-v1 **Organization:** KRLabsOrg **Github:** https://github.com/KRLabsOrg/LettuceDetect ## Overview LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on **EuroBERT-210M**, which has been specifically chosen for its extended context support (up to **8192 tokens**) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context. **This is our Polish base model utilizing EuroBERT-210M architecture** ## Model Details - **Architecture:** EuroBERT-210M with extended context support (up to 8192 tokens) - **Task:** Token Classification / Hallucination Detection - **Training Dataset:** RagTruth-PL (translated from the original RAGTruth dataset) - **Language:** Polish ## How It Works The model is trained to identify tokens in the Polish answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated. ## Usage ### Installation Install the 'lettucedetect' repository ```bash pip install lettucedetect ``` ### Using the model ```python from lettucedetect.models.inference import HallucinationDetector # For a transformer-based approach: detector = HallucinationDetector( method="transformer", model_path="KRLabsOrg/lettucedect-210m-eurobert-pl-v1", lang="pl", trust_remote_code=True ) contexts = ["Kopernikanizm to teoria astronomiczna opracowana przez Mikołaja Kopernika, zgodnie z którą Słońce znajduje się w centrum Układu Słonecznego, a Ziemia i inne planety krążą wokół niego. Teoria ta została opublikowana w dziele 'O obrotach sfer niebieskich' w 1543 roku."] question = "Na czym polega teoria kopernikańska i kiedy została opublikowana?" answer = "Teoria kopernikańska zakłada, że Ziemia jest jednym z wielu ciał niebieskich krążących wokół Słońca. Kopernik opracował również zaawansowane równania matematyczne opisujące ruch satelitów, które zostały wykorzystane w XX wieku w programie kosmicznym NASA. Teoria została opublikowana w 1543 roku." # Get span-level predictions indicating which parts of the answer are considered hallucinated. predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans") print("Przewidywania:", predictions) # Przewidywania: [{'start': 9, 'end': 19, 'confidence': 0.93, 'text': ' również zaawansowane równania matematyczne opisujące ruch satelitów, które zostały wykorzystane w XX wieku w programie kosmicznym NASA.'}] ``` ## Performance **Results on Translated RAGTruth-PL** We evaluate our Polish models on translated versions of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. The EuroBERT-210M Polish model achieves an F1 score of 66.46%, significantly outperforming prompt-based methods like GPT-4.1-mini (59.27%). For detailed performance metrics across different languages, see the table below: | Language | Model | Precision (%) | Recall (%) | F1 (%) | GPT-4.1-mini F1 (%) | Δ F1 (%) | |----------|-----------------|---------------|------------|--------|---------------------|----------| | Polish | EuroBERT-210M | 63.62 | 69.57 | 66.46 | 59.27 | +7.19 | | Polish | EuroBERT-610M | 77.16 | 69.36 | 73.05 | 59.27 | +13.78 | While the 610M variant achieves higher performance, the 210M model offers a good balance between accuracy and computational efficiency, processing examples approximately 3× faster. ### Manual Validation We performed additional validation on a manually reviewed set of 100 examples covering diverse task types (QA, summarization, data-to-text). The EuroBERT-210M Polish model maintained strong performance with an F1 score of 68.32% on this curated dataset. | Model | Precision (%) | Recall (%) | F1 (%) | |---------------|---------------|------------|--------| | EuroBERT-210M | 68.32 | 68.32 | 68.32 | ## Citing If you use the model or the tool, please cite the following paper: ```bibtex @misc{Kovacs:2025, title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, author={Ádám Kovács and Gábor Recski}, year={2025}, eprint={2502.17125}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.17125}, } ```