metadata

title: AB Testing RAG Agent
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 3.14
app_port: 8501
pinned: false

AB Testing RAG Agent

This application is a Streamlit-based frontend for an AB Testing QA system that uses a carefully designed retrieval-augmented generation (RAG) approach with a LangGraph architecture.

Features

QA system specialized in AB Testing topics
Intelligent query routing with LangGraph
Source citations for all answers
Streamlit interface for easy interaction

Setup for Development

Prerequisites

Python 3.9+
OpenAI API key
Huggingface account and token (for deployment)

Environment Setup

Clone this repository

Create a .env file in the root directory with the following content:

OPENAI_API_KEY=your_openai_api_key_here
HF_TOKEN=your_huggingface_token_here

Process the PDFs

Before running the app, you need to process the PDF files to create the vectorstore:

python process_data.py

This will:

Load PDFs from notebook_version/data/
Process, chunk, and embed the documents
Create a Qdrant vectorstore in data/processed_data/

Running the App Locally

Once the data is processed, you can run the Streamlit app:

streamlit run app/app.py

Deployment to Huggingface Spaces

Prerequisites for Deployment

Huggingface account
Docker installed locally

Steps to Deploy

Process the PDFs locally: python process_data.py
Build the Docker image: docker build -t ab-testing-qa .
Create a new Huggingface Space (Docker-based)
Add your Huggingface token and OpenAI API key as secrets in the space
Push the Docker image to Huggingface

Huggingface Spaces Configuration

The application is configured to use the following secrets:

OPENAI_API_KEY: Your OpenAI API key
HF_TOKEN: Your Huggingface token

System Architecture

The AB Testing QA system uses a sophisticated LangGraph architecture:

Initial RAG Node: Retrieves documents and attempts to answer the query
Helpfulness Judge: Determines if:
- The query is related to AB Testing
- The initial response is helpful enough
Agent Node: If needed, uses specialized tools to improve the answer:
- Standard retrieval tool
- Query-rephrasing retrieval tool
- ArXiv search tool

Data Processing

The system processes PDFs using a specific approach:

Merges PDF pages while maintaining page metadata
Uses RecursiveCharacterTextSplitter with specific parameters
Embeds using OpenAI's text-embedding-3-small model
Stores in a Qdrant vectorstore