File size: 3,171 Bytes
32238e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# Vietnamese Legal Chatbot - Presentation Setup Guide
## Quick Start for Presentation
### Prerequisites
1. **Python 3.8+** installed
2. **Docker** installed and running
3. **Google API Key** for Gemini (optional for demo)
### Step 1: Install Dependencies
```bash
pip install -r requirements.txt
```
### Step 2: Start Qdrant Database
```bash
python start_qdrant.py
```
This will:
- Pull Qdrant Docker image
- Start Qdrant on http://localhost:6333
- Create persistent storage in `qdrant_data/` folder
### Step 3: Set Up the System
```bash
python setup_system.py
```
This will:
- Load 3,271 legal documents
- Create 61,068 document chunks
- Build vector and BM25 indices
- Set up the RAG system
### Step 4: Run the Application
```bash
python app.py
```
This will:
- Start the Gradio web interface
- Open at http://localhost:7860
- Show initialization progress
## Demo Questions for Presentation
### Sample Legal Questions to Try:
1. **"Điều kiện thành lập doanh nghiệp là gì?"**
- Tests basic legal knowledge retrieval
2. **"Quy định về thời gian làm việc tối đa trong ngày?"**
- Tests labor law knowledge
3. **"Thủ tục đăng ký kết hôn cần những gì?"**
- Tests civil law procedures
4. **"Mức phạt vi phạm giao thông đường bộ?"**
- Tests administrative law
## Presentation Structure (10-12 minutes)
### 1. Introduction (2 min)
- **Problem**: Legal information access in Vietnam
- **Solution**: AI-powered legal assistant using RAG
- **Technology**: Hybrid search (BM25 + Vector) + LLM
### 2. Technical Architecture (3 min)
- Show the system components
- Explain hybrid retrieval approach
- Highlight Vietnamese-specific optimizations
### 3. Live Demo (3 min)
- Show the web interface
- Ask sample questions
- Demonstrate response quality and citations
### 4. Performance Results (2 min)
- Show performance table from `results_table.txt`
- Highlight 60.82% MRR achievement
- Compare different methods
### 5. Future Work (1 min)
- Expand legal corpus
- Mobile app development
- Integration with legal services
## Troubleshooting
### If Qdrant fails to start:
```bash
# Check Docker status
docker ps
# Restart Qdrant
python start_qdrant.py stop
python start_qdrant.py
```
### If setup fails:
```bash
# Clean up and retry
rm -rf qdrant_data/
python start_qdrant.py
python setup_system.py
```
### If app fails to start:
- Check if Google API key is set (optional)
- Ensure Qdrant is running on port 6333
- Check console for error messages
## Key Features to Highlight
1. **Hybrid Search**: Combines keyword (BM25) and semantic (vector) search
2. **Vietnamese-Specific**: Uses specialized Vietnamese embedding models
3. **Reranking**: Advanced document re-ranking for better relevance
4. **Real-time Interface**: Gradio web interface with progress indicators
5. **Source Attribution**: Always cites specific legal documents
6. **Fallback System**: Can search Google if local documents insufficient
## Performance Metrics
- **Best Method**: Hybrid 2 + Reranking
- **MRR**: 60.82%
- **Coverage**: 88.99%
- **Response Time**: ~0.6 seconds
- **Documents**: 3,271 legal documents, 61,068 chunks |