|
--- |
|
title: TRUTHLENS |
|
emoji: 🔎 |
|
colorFrom: indigo |
|
colorTo: green |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# TruthLens |
|
|
|
TruthLens is an open‑source, no‑hallucination context engine that |
|
extracts multiple perspectives on a claim. Given a short textual |
|
assertion, it searches Wikipedia, ranks candidate sentences by |
|
semantic similarity to the claim, classifies each sentence as |
|
supporting, contradicting or neutral, and then presents a context card |
|
with quotations and citations. The goal is to help users quickly |
|
understand the range of evidence and viewpoint diversity around a |
|
topic without relying on opaque, generative summaries. |
|
|
|
## Features |
|
|
|
* **Verbatim evidence** – All sentences are quoted exactly as they |
|
appear in their sources; there is no paraphrasing or synthesis. |
|
* **Citation first** – Each bullet in the context card links back to |
|
the original Wikipedia article so you can verify the information. |
|
* **Multi‑perspective** – Evidence is grouped into *support*, |
|
*contradict* and *neutral* sections to highlight different |
|
viewpoints. |
|
* **Lightweight models** – Uses a small sentence‑transformer model |
|
(MiniLM) for ranking and a cross‑encoder (RoBERTa) for NLI |
|
classification. Falls back to TF‑IDF ranking and heuristic |
|
classification if deep learning models cannot be loaded. |
|
* **Fully open source** – Code, model usage and design decisions are |
|
public, enabling transparency, reproducibility and community |
|
contribution. |
|
|
|
## Installation |
|
|
|
To install and run TruthLens locally you will need Python 3.9 or |
|
newer. Clone this repository and install the dependencies from |
|
`requirements.txt`: |
|
|
|
```bash |
|
git clone https://github.com/yourusername/truthlens.git |
|
cd truthlens |
|
python -m venv .venv |
|
source .venv/bin/activate |
|
pip install -r requirements.txt |
|
``` |
|
|
|
Depending on your environment, the first run may download pre‑trained |
|
models from Hugging Face (for the sentence‑transformer and |
|
cross‑encoder). Subsequent runs will reuse the cached models. |
|
|
|
## Running the App |
|
|
|
Start the Gradio web server by executing: |
|
|
|
```bash |
|
python app.py |
|
``` |
|
|
|
By default the app listens on `http://localhost:7860`. Open this URL |
|
in your browser, enter a claim such as |
|
`"Electric vehicles reduce CO₂ emissions vs gasoline cars"` and |
|
press **Generate Context Card**. The interface will populate |
|
three columns for supporting, contradictory and neutral evidence and |
|
show a table summarising source contributions. |
|
|
|
## Deployment on Hugging Face Spaces |
|
|
|
TruthLens is designed to run seamlessly on [Hugging Face |
|
Spaces](https://huggingface.co/spaces). To deploy your own instance: |
|
|
|
1. Create a new **Space** with the **SDK** option and select |
|
**Gradio**. |
|
2. Upload the contents of this repository (including `src/`, |
|
`app.py`, `requirements.txt`, etc.) to the Space. You can push |
|
via the web interface or connect your Space to a GitHub |
|
repository. |
|
3. The Space builder will install the dependencies and launch the app |
|
automatically. The first build may take several minutes while |
|
models download; subsequent launches will be faster. |
|
|
|
If you wish to enable optional API keys for retrieving from sources |
|
beyond Wikipedia (e.g. Bing or Google News APIs), you can add the |
|
appropriate API clients in `retriever.py` and pass keys via |
|
environment variables. This repository contains only the Wikipedia |
|
retriever for simplicity. |
|
|
|
## Architecture Overview |
|
|
|
TruthLens consists of four primary components: |
|
|
|
1. **Retriever (`src/retriever.py`)** – Searches Wikipedia for the |
|
claim, fetches page content and splits it into sentences. Each |
|
returned item is a `(sentence, source_url)` pair. The module |
|
leverages the `wikipedia` library and falls back to a naive |
|
sentence splitter if NLTK is unavailable. |
|
|
|
2. **Ranker (`src/ranker.py`)** – Ranks candidate sentences by their |
|
semantic similarity to the claim. It first attempts to use a |
|
pre‑trained sentence‑transformer model (`all‑MiniLM‑L6‑v2`) and |
|
falls back to TF‑IDF cosine similarity if transformers are |
|
unavailable. |
|
|
|
3. **Classifier (`src/classifier.py`)** – Determines whether each |
|
candidate sentence supports, contradicts or is neutral with |
|
respect to the claim. It uses the `cross‑encoder/nli‑roberta‑base` |
|
model for NLI and reverts to a simple lexical heuristic when |
|
transformers cannot be loaded. |
|
|
|
4. **Pipeline (`src/pipeline.py`)** – Orchestrates the retrieval, |
|
ranking and classification steps, groups evidence by category and |
|
aggregates counts per source. This is the primary function |
|
invoked by the Gradio interface. |
|
|
|
The `app.py` file ties everything together, exposing a friendly UI |
|
with Gradio. Evidence is presented as markdown bullet lists with |
|
superscript citation numbers that link back to the original sources. |
|
|
|
## Contributing |
|
|
|
Pull requests are welcome! If you have ideas for improvements – |
|
such as additional retrievers, new metrics, multilingual support or |
|
robustness tests – feel free to open an issue or submit a PR. Please |
|
include unit tests for new functionality where applicable. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License – see the |
|
[LICENSE](LICENSE) file for details. |