|
### RAG Techniques Evaluated |
|
|
|
**1. RAG with Context in System Prompt** |
|
The document is embedded inside the system prompt, and the user sends only the question: |
|
```text |
|
[System]: You are an assistant for question-answering tasks. |
|
Given the QUESTION and DOCUMENT you must answer the QUESTION using the information in the DOCUMENT. |
|
You must not offer new information beyond the context provided in the DOCUMENT. Do not add any external knowledge. |
|
The ANSWER also must not contradict information provided in the DOCUMENT. |
|
If the DOCUMENT does not contain the facts to answer the QUESTION or you do not know the answer, you truthfully say that you do not know. |
|
You have access to information provided by the user as DOCUMENT to answer the QUESTION, and nothing else. |
|
Use three sentences maximum and keep the answer concise. |
|
DOCUMENT: <context> |
|
|
|
[User]: <prompt> |
|
``` |
|
|
|
**2. RAG with Context and Question in Single-Turn** |
|
Both the document and question are concatenated in a single user message: |
|
```text |
|
[System]: You are an assistant for question-answering tasks. |
|
Given the QUESTION and DOCUMENT you must answer the QUESTION using the information in the DOCUMENT. |
|
You must not offer new information beyond the context provided in the DOCUMENT. Do not add any external knowledge. |
|
The ANSWER also must not contradict information provided in the DOCUMENT. |
|
If the DOCUMENT does not contain the facts to answer the QUESTION or you do not know the answer, you truthfully say that you do not know. |
|
You have access to information provided by the user as DOCUMENT to answer the QUESTION, and nothing else. |
|
Use three sentences maximum and keep the answer concise. |
|
|
|
[User]: |
|
DOCUMENT: <context> |
|
QUESTION: <prompt> |
|
|
|
``` |
|
|
|
**3. RAG with Context and Question in Two-Turns** |
|
The document and question are sent in separate user messages: |
|
```text |
|
[System]: You are an assistant for question-answering tasks. |
|
Given the QUESTION and DOCUMENT you must answer the QUESTION using the information in the DOCUMENT. |
|
You must not offer new information beyond the context provided in the DOCUMENT. Do not add any external knowledge. |
|
The ANSWER also must not contradict information provided in the DOCUMENT. |
|
If the DOCUMENT does not contain the facts to answer the QUESTION or you do not know the answer, you truthfully say that you do not know. |
|
You have access to information provided by the user as DOCUMENT to answer the QUESTION, and nothing else. |
|
Use three sentences maximum and keep the answer concise. |
|
|
|
[User]: DOCUMENT: <context> |
|
[User]: QUESTION: <prompt> |
|
``` |
|
*Note: This method did **not** work on Gemma 3 27B with the default chat template due to its restriction on consecutive user messages without an intervening assistant response.* |
|
|
|
### Dataset |
|
We evaluate all three prompting strategies on the **HaluEval QA** benchmark, a large-scale collection of RAG question-answer examples. |
|
- **Source**: [HaluEval QA](https://huggingface.co/datasets/pminervini/HaluEval/viewer/qa?views%5B%5D=qa) |
|
- **Size**: 10,000 question-document pairs |
|
- **Content**: Each example contains a short passage (extracted primarily from Wikipedia-style articles) and an accompanying question that can be answered **only** from that passage. |
|
- **Use case**: Designed to measure whether an LLM can remain faithful to supplied context without inventing new facts. |
|
|
|
All prompts are generated with *temperature = 0* to remove randomness so that differences in hallucination rate stem solely from the prompt format. |
|
|
|
### Metric |
|
|
|
The values in the table indicate the **hallucination rate (%)** of answers deemed factually incorrect or ungrounded given the provided context. |
|
|
|
Hallucination rates are automatically computed using **[Verify](https://platform.kluster.ai/verify)** by [kluster.ai](https://kluster.ai/), the [leading](https://www.kluster.ai/blog/introducing-verify-by-kluster-ai-the-missing-trust-layer-in-your-ai-stack) AI-powered hallucination detection API that cross-checks model claims against the source document. |