File size: 2,639 Bytes
799e64a 03375c9 799e64a 03375c9 799e64a 03375c9 4cb83a0 03375c9 4cb83a0 03375c9 0b78313 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
title: AB Testing RAG Agent
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 3.14
app_port: 8501
pinned: false
---
# AB Testing RAG Agent
This application is a Streamlit-based frontend for an AB Testing QA system that uses a carefully designed retrieval-augmented generation (RAG) approach with a LangGraph architecture.
## Features
- QA system specialized in AB Testing topics
- Intelligent query routing with LangGraph
- Source citations for all answers
- Streamlit interface for easy interaction
## Setup for Development
### Prerequisites
- Python 3.9+
- OpenAI API key
- Huggingface account and token (for deployment)
### Environment Setup
1. Clone this repository
2. Create a `.env` file in the root directory with the following content:
```
OPENAI_API_KEY=your_openai_api_key_here
HF_TOKEN=your_huggingface_token_here
```
### Process the PDFs
Before running the app, you need to process the PDF files to create the vectorstore:
```bash
python process_data.py
```
This will:
1. Load PDFs from `notebook_version/data/`
2. Process, chunk, and embed the documents
3. Create a Qdrant vectorstore in `data/processed_data/`
### Running the App Locally
Once the data is processed, you can run the Streamlit app:
```bash
streamlit run app/app.py
```
## Deployment to Huggingface Spaces
### Prerequisites for Deployment
1. Huggingface account
2. Docker installed locally
### Steps to Deploy
1. Process the PDFs locally: `python process_data.py`
2. Build the Docker image: `docker build -t ab-testing-qa .`
3. Create a new Huggingface Space (Docker-based)
4. Add your Huggingface token and OpenAI API key as secrets in the space
5. Push the Docker image to Huggingface
### Huggingface Spaces Configuration
The application is configured to use the following secrets:
- `OPENAI_API_KEY`: Your OpenAI API key
- `HF_TOKEN`: Your Huggingface token
## System Architecture
The AB Testing QA system uses a sophisticated LangGraph architecture:
1. **Initial RAG Node**: Retrieves documents and attempts to answer the query
2. **Helpfulness Judge**: Determines if:
- The query is related to AB Testing
- The initial response is helpful enough
3. **Agent Node**: If needed, uses specialized tools to improve the answer:
- Standard retrieval tool
- Query-rephrasing retrieval tool
- ArXiv search tool
## Data Processing
The system processes PDFs using a specific approach:
1. Merges PDF pages while maintaining page metadata
2. Uses RecursiveCharacterTextSplitter with specific parameters
3. Embeds using OpenAI's text-embedding-3-small model
4. Stores in a Qdrant vectorstore |