|
--- |
|
title: AB Testing RAG Agent |
|
emoji: 🤖 |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: docker |
|
sdk_version: 3.14 |
|
app_port: 8501 |
|
pinned: false |
|
--- |
|
|
|
# AB Testing RAG Agent |
|
|
|
This application is a Streamlit-based frontend for an AB Testing QA system that uses a carefully designed retrieval-augmented generation (RAG) approach with a LangGraph architecture. |
|
|
|
## Features |
|
|
|
- QA system specialized in AB Testing topics |
|
- Intelligent query routing with LangGraph |
|
- Source citations for all answers |
|
- Streamlit interface for easy interaction |
|
|
|
## Setup for Development |
|
|
|
### Prerequisites |
|
|
|
- Python 3.9+ |
|
- OpenAI API key |
|
- Huggingface account and token (for deployment) |
|
|
|
### Environment Setup |
|
|
|
1. Clone this repository |
|
2. Create a `.env` file in the root directory with the following content: |
|
``` |
|
OPENAI_API_KEY=your_openai_api_key_here |
|
HF_TOKEN=your_huggingface_token_here |
|
``` |
|
|
|
### Process the PDFs |
|
|
|
Before running the app, you need to process the PDF files to create the vectorstore: |
|
|
|
```bash |
|
python process_data.py |
|
``` |
|
|
|
This will: |
|
1. Load PDFs from `notebook_version/data/` |
|
2. Process, chunk, and embed the documents |
|
3. Create a Qdrant vectorstore in `data/processed_data/` |
|
|
|
### Running the App Locally |
|
|
|
Once the data is processed, you can run the Streamlit app: |
|
|
|
```bash |
|
streamlit run app/app.py |
|
``` |
|
|
|
## Deployment to Huggingface Spaces |
|
|
|
### Prerequisites for Deployment |
|
|
|
1. Huggingface account |
|
2. Docker installed locally |
|
|
|
### Steps to Deploy |
|
|
|
1. Process the PDFs locally: `python process_data.py` |
|
2. Build the Docker image: `docker build -t ab-testing-qa .` |
|
3. Create a new Huggingface Space (Docker-based) |
|
4. Add your Huggingface token and OpenAI API key as secrets in the space |
|
5. Push the Docker image to Huggingface |
|
|
|
### Huggingface Spaces Configuration |
|
|
|
The application is configured to use the following secrets: |
|
- `OPENAI_API_KEY`: Your OpenAI API key |
|
- `HF_TOKEN`: Your Huggingface token |
|
|
|
## System Architecture |
|
|
|
The AB Testing QA system uses a sophisticated LangGraph architecture: |
|
|
|
1. **Initial RAG Node**: Retrieves documents and attempts to answer the query |
|
2. **Helpfulness Judge**: Determines if: |
|
- The query is related to AB Testing |
|
- The initial response is helpful enough |
|
3. **Agent Node**: If needed, uses specialized tools to improve the answer: |
|
- Standard retrieval tool |
|
- Query-rephrasing retrieval tool |
|
- ArXiv search tool |
|
|
|
## Data Processing |
|
|
|
The system processes PDFs using a specific approach: |
|
1. Merges PDF pages while maintaining page metadata |
|
2. Uses RecursiveCharacterTextSplitter with specific parameters |
|
3. Embeds using OpenAI's text-embedding-3-small model |
|
4. Stores in a Qdrant vectorstore |