Spaces:

akaafridi
/

TRUTHLENS

Sleeping

App Files Files Community

TRUTHLENS / README.md

akaafridi

Update README.md

36c6dc4 verified 26 days ago

preview code

raw

history blame contribute delete

5.22 kB

	---
	title: TRUTHLENS
	emoji: 🔎
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: "4.44.0"
	app_file: app.py
	pinned: false
	---

	# TruthLens

	TruthLens is an open‑source, no‑hallucination context engine that
	extracts multiple perspectives on a claim. Given a short textual
	assertion, it searches Wikipedia, ranks candidate sentences by
	semantic similarity to the claim, classifies each sentence as
	supporting, contradicting or neutral, and then presents a context card
	with quotations and citations. The goal is to help users quickly
	understand the range of evidence and viewpoint diversity around a
	topic without relying on opaque, generative summaries.

	## Features

	* Verbatim evidence – All sentences are quoted exactly as they
	appear in their sources; there is no paraphrasing or synthesis.
	* Citation first – Each bullet in the context card links back to
	the original Wikipedia article so you can verify the information.
	* Multi‑perspective – Evidence is grouped into support,
	contradict and neutral sections to highlight different
	viewpoints.
	* Lightweight models – Uses a small sentence‑transformer model
	(MiniLM) for ranking and a cross‑encoder (RoBERTa) for NLI
	classification. Falls back to TF‑IDF ranking and heuristic
	classification if deep learning models cannot be loaded.
	* Fully open source – Code, model usage and design decisions are
	public, enabling transparency, reproducibility and community
	contribution.

	## Installation

	To install and run TruthLens locally you will need Python 3.9 or
	newer. Clone this repository and install the dependencies from
	`requirements.txt`:

	```bash
	git clone https://github.com/yourusername/truthlens.git
	cd truthlens
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	Depending on your environment, the first run may download pre‑trained
	models from Hugging Face (for the sentence‑transformer and
	cross‑encoder). Subsequent runs will reuse the cached models.

	## Running the App

	Start the Gradio web server by executing:

	```bash
	python app.py
	```

	By default the app listens on `http://localhost:7860`. Open this URL
	in your browser, enter a claim such as
	`"Electric vehicles reduce CO₂ emissions vs gasoline cars"` and
	press Generate Context Card. The interface will populate
	three columns for supporting, contradictory and neutral evidence and
	show a table summarising source contributions.

	## Deployment on Hugging Face Spaces

	TruthLens is designed to run seamlessly on [Hugging Face
	Spaces](https://huggingface.co/spaces). To deploy your own instance:

	1. Create a new Space with the SDK option and select
	Gradio.
	2. Upload the contents of this repository (including `src/`,
	`app.py`, `requirements.txt`, etc.) to the Space. You can push
	via the web interface or connect your Space to a GitHub
	repository.
	3. The Space builder will install the dependencies and launch the app
	automatically. The first build may take several minutes while
	models download; subsequent launches will be faster.

	If you wish to enable optional API keys for retrieving from sources
	beyond Wikipedia (e.g. Bing or Google News APIs), you can add the
	appropriate API clients in `retriever.py` and pass keys via
	environment variables. This repository contains only the Wikipedia
	retriever for simplicity.

	## Architecture Overview

	TruthLens consists of four primary components:

	1. Retriever (`src/retriever.py`) – Searches Wikipedia for the
	claim, fetches page content and splits it into sentences. Each
	returned item is a `(sentence, source_url)` pair. The module
	leverages the `wikipedia` library and falls back to a naive
	sentence splitter if NLTK is unavailable.

	2. Ranker (`src/ranker.py`) – Ranks candidate sentences by their
	semantic similarity to the claim. It first attempts to use a
	pre‑trained sentence‑transformer model (`all‑MiniLM‑L6‑v2`) and
	falls back to TF‑IDF cosine similarity if transformers are
	unavailable.

	3. Classifier (`src/classifier.py`) – Determines whether each
	candidate sentence supports, contradicts or is neutral with
	respect to the claim. It uses the `cross‑encoder/nli‑roberta‑base`
	model for NLI and reverts to a simple lexical heuristic when
	transformers cannot be loaded.

	4. Pipeline (`src/pipeline.py`) – Orchestrates the retrieval,
	ranking and classification steps, groups evidence by category and
	aggregates counts per source. This is the primary function
	invoked by the Gradio interface.

	The `app.py` file ties everything together, exposing a friendly UI
	with Gradio. Evidence is presented as markdown bullet lists with
	superscript citation numbers that link back to the original sources.

	## Contributing

	Pull requests are welcome! If you have ideas for improvements –
	such as additional retrievers, new metrics, multilingual support or
	robustness tests – feel free to open an issue or submit a PR. Please
	include unit tests for new functionality where applicable.

	## License

	This project is licensed under the MIT License – see the
	[LICENSE](LICENSE) file for details.