File size: 2,639 Bytes
799e64a
03375c9
 
 
799e64a
 
03375c9
 
799e64a
 
 
03375c9
4cb83a0
03375c9
4cb83a0
03375c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0b78313
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: AB Testing RAG Agent
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 3.14
app_port: 8501
pinned: false
---

# AB Testing RAG Agent

This application is a Streamlit-based frontend for an AB Testing QA system that uses a carefully designed retrieval-augmented generation (RAG) approach with a LangGraph architecture.

## Features

- QA system specialized in AB Testing topics
- Intelligent query routing with LangGraph
- Source citations for all answers
- Streamlit interface for easy interaction

## Setup for Development

### Prerequisites

- Python 3.9+
- OpenAI API key
- Huggingface account and token (for deployment)

### Environment Setup

1. Clone this repository
2. Create a `.env` file in the root directory with the following content:
   ```
   OPENAI_API_KEY=your_openai_api_key_here
   HF_TOKEN=your_huggingface_token_here
   ```

### Process the PDFs

Before running the app, you need to process the PDF files to create the vectorstore:

```bash
python process_data.py
```

This will:
1. Load PDFs from `notebook_version/data/`
2. Process, chunk, and embed the documents
3. Create a Qdrant vectorstore in `data/processed_data/`

### Running the App Locally

Once the data is processed, you can run the Streamlit app:

```bash
streamlit run app/app.py
```

## Deployment to Huggingface Spaces

### Prerequisites for Deployment

1. Huggingface account
2. Docker installed locally

### Steps to Deploy

1. Process the PDFs locally: `python process_data.py`
2. Build the Docker image: `docker build -t ab-testing-qa .`
3. Create a new Huggingface Space (Docker-based)
4. Add your Huggingface token and OpenAI API key as secrets in the space
5. Push the Docker image to Huggingface

### Huggingface Spaces Configuration

The application is configured to use the following secrets:
- `OPENAI_API_KEY`: Your OpenAI API key
- `HF_TOKEN`: Your Huggingface token

## System Architecture

The AB Testing QA system uses a sophisticated LangGraph architecture:

1. **Initial RAG Node**: Retrieves documents and attempts to answer the query
2. **Helpfulness Judge**: Determines if:
   - The query is related to AB Testing
   - The initial response is helpful enough
3. **Agent Node**: If needed, uses specialized tools to improve the answer:
   - Standard retrieval tool
   - Query-rephrasing retrieval tool
   - ArXiv search tool

## Data Processing

The system processes PDFs using a specific approach:
1. Merges PDF pages while maintaining page metadata
2. Uses RecursiveCharacterTextSplitter with specific parameters
3. Embeds using OpenAI's text-embedding-3-small model
4. Stores in a Qdrant  vectorstore