File size: 5,219 Bytes
36c6dc4
 
 
 
 
 
 
 
 
 
 
713735a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: TRUTHLENS
emoji: 🔎
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---

# TruthLens

TruthLens is an open‑source, no‑hallucination context engine that
extracts multiple perspectives on a claim.  Given a short textual
assertion, it searches Wikipedia, ranks candidate sentences by
semantic similarity to the claim, classifies each sentence as
supporting, contradicting or neutral, and then presents a context card
with quotations and citations.  The goal is to help users quickly
understand the range of evidence and viewpoint diversity around a
topic without relying on opaque, generative summaries.

## Features

* **Verbatim evidence** – All sentences are quoted exactly as they
  appear in their sources; there is no paraphrasing or synthesis.
* **Citation first** – Each bullet in the context card links back to
  the original Wikipedia article so you can verify the information.
* **Multi‑perspective** – Evidence is grouped into *support*,
  *contradict* and *neutral* sections to highlight different
  viewpoints.
* **Lightweight models** – Uses a small sentence‑transformer model
  (MiniLM) for ranking and a cross‑encoder (RoBERTa) for NLI
  classification.  Falls back to TF‑IDF ranking and heuristic
  classification if deep learning models cannot be loaded.
* **Fully open source** – Code, model usage and design decisions are
  public, enabling transparency, reproducibility and community
  contribution.

## Installation

To install and run TruthLens locally you will need Python 3.9 or
newer.  Clone this repository and install the dependencies from
`requirements.txt`:

```bash
git clone https://github.com/yourusername/truthlens.git
cd truthlens
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

Depending on your environment, the first run may download pre‑trained
models from Hugging Face (for the sentence‑transformer and
cross‑encoder).  Subsequent runs will reuse the cached models.

## Running the App

Start the Gradio web server by executing:

```bash
python app.py
```

By default the app listens on `http://localhost:7860`.  Open this URL
in your browser, enter a claim such as
`"Electric vehicles reduce CO₂ emissions vs gasoline cars"` and
press **Generate Context Card**.  The interface will populate
three columns for supporting, contradictory and neutral evidence and
show a table summarising source contributions.

## Deployment on Hugging Face Spaces

TruthLens is designed to run seamlessly on [Hugging Face
Spaces](https://huggingface.co/spaces).  To deploy your own instance:

1. Create a new **Space** with the **SDK** option and select
   **Gradio**.
2. Upload the contents of this repository (including `src/`,
   `app.py`, `requirements.txt`, etc.) to the Space.  You can push
   via the web interface or connect your Space to a GitHub
   repository.
3. The Space builder will install the dependencies and launch the app
   automatically.  The first build may take several minutes while
   models download; subsequent launches will be faster.

If you wish to enable optional API keys for retrieving from sources
beyond Wikipedia (e.g. Bing or Google News APIs), you can add the
appropriate API clients in `retriever.py` and pass keys via
environment variables.  This repository contains only the Wikipedia
retriever for simplicity.

## Architecture Overview

TruthLens consists of four primary components:

1. **Retriever (`src/retriever.py`)** – Searches Wikipedia for the
   claim, fetches page content and splits it into sentences.  Each
   returned item is a `(sentence, source_url)` pair.  The module
   leverages the `wikipedia` library and falls back to a naive
   sentence splitter if NLTK is unavailable.

2. **Ranker (`src/ranker.py`)** – Ranks candidate sentences by their
   semantic similarity to the claim.  It first attempts to use a
   pre‑trained sentence‑transformer model (`all‑MiniLM‑L6‑v2`) and
   falls back to TF‑IDF cosine similarity if transformers are
   unavailable.

3. **Classifier (`src/classifier.py`)** – Determines whether each
   candidate sentence supports, contradicts or is neutral with
   respect to the claim.  It uses the `cross‑encoder/nli‑roberta‑base`
   model for NLI and reverts to a simple lexical heuristic when
   transformers cannot be loaded.

4. **Pipeline (`src/pipeline.py`)** – Orchestrates the retrieval,
   ranking and classification steps, groups evidence by category and
   aggregates counts per source.  This is the primary function
   invoked by the Gradio interface.

The `app.py` file ties everything together, exposing a friendly UI
with Gradio.  Evidence is presented as markdown bullet lists with
superscript citation numbers that link back to the original sources.

## Contributing

Pull requests are welcome!  If you have ideas for improvements –
such as additional retrievers, new metrics, multilingual support or
robustness tests – feel free to open an issue or submit a PR.  Please
include unit tests for new functionality where applicable.

## License

This project is licensed under the MIT License – see the
[LICENSE](LICENSE) file for details.