QA-system-in-Vietnamese-law / PRESENTATION_SETUP.md
loringuyen's picture
Upload folder using huggingface_hub
32238e9 verified
# Vietnamese Legal Chatbot - Presentation Setup Guide
## Quick Start for Presentation
### Prerequisites
1. **Python 3.8+** installed
2. **Docker** installed and running
3. **Google API Key** for Gemini (optional for demo)
### Step 1: Install Dependencies
```bash
pip install -r requirements.txt
```
### Step 2: Start Qdrant Database
```bash
python start_qdrant.py
```
This will:
- Pull Qdrant Docker image
- Start Qdrant on http://localhost:6333
- Create persistent storage in `qdrant_data/` folder
### Step 3: Set Up the System
```bash
python setup_system.py
```
This will:
- Load 3,271 legal documents
- Create 61,068 document chunks
- Build vector and BM25 indices
- Set up the RAG system
### Step 4: Run the Application
```bash
python app.py
```
This will:
- Start the Gradio web interface
- Open at http://localhost:7860
- Show initialization progress
## Demo Questions for Presentation
### Sample Legal Questions to Try:
1. **"Điều kiện thành lập doanh nghiệp là gì?"**
- Tests basic legal knowledge retrieval
2. **"Quy định về thời gian làm việc tối đa trong ngày?"**
- Tests labor law knowledge
3. **"Thủ tục đăng ký kết hôn cần những gì?"**
- Tests civil law procedures
4. **"Mức phạt vi phạm giao thông đường bộ?"**
- Tests administrative law
## Presentation Structure (10-12 minutes)
### 1. Introduction (2 min)
- **Problem**: Legal information access in Vietnam
- **Solution**: AI-powered legal assistant using RAG
- **Technology**: Hybrid search (BM25 + Vector) + LLM
### 2. Technical Architecture (3 min)
- Show the system components
- Explain hybrid retrieval approach
- Highlight Vietnamese-specific optimizations
### 3. Live Demo (3 min)
- Show the web interface
- Ask sample questions
- Demonstrate response quality and citations
### 4. Performance Results (2 min)
- Show performance table from `results_table.txt`
- Highlight 60.82% MRR achievement
- Compare different methods
### 5. Future Work (1 min)
- Expand legal corpus
- Mobile app development
- Integration with legal services
## Troubleshooting
### If Qdrant fails to start:
```bash
# Check Docker status
docker ps
# Restart Qdrant
python start_qdrant.py stop
python start_qdrant.py
```
### If setup fails:
```bash
# Clean up and retry
rm -rf qdrant_data/
python start_qdrant.py
python setup_system.py
```
### If app fails to start:
- Check if Google API key is set (optional)
- Ensure Qdrant is running on port 6333
- Check console for error messages
## Key Features to Highlight
1. **Hybrid Search**: Combines keyword (BM25) and semantic (vector) search
2. **Vietnamese-Specific**: Uses specialized Vietnamese embedding models
3. **Reranking**: Advanced document re-ranking for better relevance
4. **Real-time Interface**: Gradio web interface with progress indicators
5. **Source Attribution**: Always cites specific legal documents
6. **Fallback System**: Can search Google if local documents insufficient
## Performance Metrics
- **Best Method**: Hybrid 2 + Reranking
- **MRR**: 60.82%
- **Coverage**: 88.99%
- **Response Time**: ~0.6 seconds
- **Documents**: 3,271 legal documents, 61,068 chunks