File size: 5,219 Bytes
36c6dc4 713735a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
title: TRUTHLENS
emoji: 🔎
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
# TruthLens
TruthLens is an open‑source, no‑hallucination context engine that
extracts multiple perspectives on a claim. Given a short textual
assertion, it searches Wikipedia, ranks candidate sentences by
semantic similarity to the claim, classifies each sentence as
supporting, contradicting or neutral, and then presents a context card
with quotations and citations. The goal is to help users quickly
understand the range of evidence and viewpoint diversity around a
topic without relying on opaque, generative summaries.
## Features
* **Verbatim evidence** – All sentences are quoted exactly as they
appear in their sources; there is no paraphrasing or synthesis.
* **Citation first** – Each bullet in the context card links back to
the original Wikipedia article so you can verify the information.
* **Multi‑perspective** – Evidence is grouped into *support*,
*contradict* and *neutral* sections to highlight different
viewpoints.
* **Lightweight models** – Uses a small sentence‑transformer model
(MiniLM) for ranking and a cross‑encoder (RoBERTa) for NLI
classification. Falls back to TF‑IDF ranking and heuristic
classification if deep learning models cannot be loaded.
* **Fully open source** – Code, model usage and design decisions are
public, enabling transparency, reproducibility and community
contribution.
## Installation
To install and run TruthLens locally you will need Python 3.9 or
newer. Clone this repository and install the dependencies from
`requirements.txt`:
```bash
git clone https://github.com/yourusername/truthlens.git
cd truthlens
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
Depending on your environment, the first run may download pre‑trained
models from Hugging Face (for the sentence‑transformer and
cross‑encoder). Subsequent runs will reuse the cached models.
## Running the App
Start the Gradio web server by executing:
```bash
python app.py
```
By default the app listens on `http://localhost:7860`. Open this URL
in your browser, enter a claim such as
`"Electric vehicles reduce CO₂ emissions vs gasoline cars"` and
press **Generate Context Card**. The interface will populate
three columns for supporting, contradictory and neutral evidence and
show a table summarising source contributions.
## Deployment on Hugging Face Spaces
TruthLens is designed to run seamlessly on [Hugging Face
Spaces](https://huggingface.co/spaces). To deploy your own instance:
1. Create a new **Space** with the **SDK** option and select
**Gradio**.
2. Upload the contents of this repository (including `src/`,
`app.py`, `requirements.txt`, etc.) to the Space. You can push
via the web interface or connect your Space to a GitHub
repository.
3. The Space builder will install the dependencies and launch the app
automatically. The first build may take several minutes while
models download; subsequent launches will be faster.
If you wish to enable optional API keys for retrieving from sources
beyond Wikipedia (e.g. Bing or Google News APIs), you can add the
appropriate API clients in `retriever.py` and pass keys via
environment variables. This repository contains only the Wikipedia
retriever for simplicity.
## Architecture Overview
TruthLens consists of four primary components:
1. **Retriever (`src/retriever.py`)** – Searches Wikipedia for the
claim, fetches page content and splits it into sentences. Each
returned item is a `(sentence, source_url)` pair. The module
leverages the `wikipedia` library and falls back to a naive
sentence splitter if NLTK is unavailable.
2. **Ranker (`src/ranker.py`)** – Ranks candidate sentences by their
semantic similarity to the claim. It first attempts to use a
pre‑trained sentence‑transformer model (`all‑MiniLM‑L6‑v2`) and
falls back to TF‑IDF cosine similarity if transformers are
unavailable.
3. **Classifier (`src/classifier.py`)** – Determines whether each
candidate sentence supports, contradicts or is neutral with
respect to the claim. It uses the `cross‑encoder/nli‑roberta‑base`
model for NLI and reverts to a simple lexical heuristic when
transformers cannot be loaded.
4. **Pipeline (`src/pipeline.py`)** – Orchestrates the retrieval,
ranking and classification steps, groups evidence by category and
aggregates counts per source. This is the primary function
invoked by the Gradio interface.
The `app.py` file ties everything together, exposing a friendly UI
with Gradio. Evidence is presented as markdown bullet lists with
superscript citation numbers that link back to the original sources.
## Contributing
Pull requests are welcome! If you have ideas for improvements –
such as additional retrievers, new metrics, multilingual support or
robustness tests – feel free to open an issue or submit a PR. Please
include unit tests for new functionality where applicable.
## License
This project is licensed under the MIT License – see the
[LICENSE](LICENSE) file for details. |