|
# Vietnamese Legal Chatbot - Presentation Setup Guide |
|
|
|
## Quick Start for Presentation |
|
|
|
### Prerequisites |
|
1. **Python 3.8+** installed |
|
2. **Docker** installed and running |
|
3. **Google API Key** for Gemini (optional for demo) |
|
|
|
### Step 1: Install Dependencies |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Step 2: Start Qdrant Database |
|
```bash |
|
python start_qdrant.py |
|
``` |
|
This will: |
|
- Pull Qdrant Docker image |
|
- Start Qdrant on http://localhost:6333 |
|
- Create persistent storage in `qdrant_data/` folder |
|
|
|
### Step 3: Set Up the System |
|
```bash |
|
python setup_system.py |
|
``` |
|
This will: |
|
- Load 3,271 legal documents |
|
- Create 61,068 document chunks |
|
- Build vector and BM25 indices |
|
- Set up the RAG system |
|
|
|
### Step 4: Run the Application |
|
```bash |
|
python app.py |
|
``` |
|
This will: |
|
- Start the Gradio web interface |
|
- Open at http://localhost:7860 |
|
- Show initialization progress |
|
|
|
## Demo Questions for Presentation |
|
|
|
### Sample Legal Questions to Try: |
|
1. **"Điều kiện thành lập doanh nghiệp là gì?"** |
|
- Tests basic legal knowledge retrieval |
|
|
|
2. **"Quy định về thời gian làm việc tối đa trong ngày?"** |
|
- Tests labor law knowledge |
|
|
|
3. **"Thủ tục đăng ký kết hôn cần những gì?"** |
|
- Tests civil law procedures |
|
|
|
4. **"Mức phạt vi phạm giao thông đường bộ?"** |
|
- Tests administrative law |
|
|
|
## Presentation Structure (10-12 minutes) |
|
|
|
### 1. Introduction (2 min) |
|
- **Problem**: Legal information access in Vietnam |
|
- **Solution**: AI-powered legal assistant using RAG |
|
- **Technology**: Hybrid search (BM25 + Vector) + LLM |
|
|
|
### 2. Technical Architecture (3 min) |
|
- Show the system components |
|
- Explain hybrid retrieval approach |
|
- Highlight Vietnamese-specific optimizations |
|
|
|
### 3. Live Demo (3 min) |
|
- Show the web interface |
|
- Ask sample questions |
|
- Demonstrate response quality and citations |
|
|
|
### 4. Performance Results (2 min) |
|
- Show performance table from `results_table.txt` |
|
- Highlight 60.82% MRR achievement |
|
- Compare different methods |
|
|
|
### 5. Future Work (1 min) |
|
- Expand legal corpus |
|
- Mobile app development |
|
- Integration with legal services |
|
|
|
## Troubleshooting |
|
|
|
### If Qdrant fails to start: |
|
```bash |
|
# Check Docker status |
|
docker ps |
|
|
|
# Restart Qdrant |
|
python start_qdrant.py stop |
|
python start_qdrant.py |
|
``` |
|
|
|
### If setup fails: |
|
```bash |
|
# Clean up and retry |
|
rm -rf qdrant_data/ |
|
python start_qdrant.py |
|
python setup_system.py |
|
``` |
|
|
|
### If app fails to start: |
|
- Check if Google API key is set (optional) |
|
- Ensure Qdrant is running on port 6333 |
|
- Check console for error messages |
|
|
|
## Key Features to Highlight |
|
|
|
1. **Hybrid Search**: Combines keyword (BM25) and semantic (vector) search |
|
2. **Vietnamese-Specific**: Uses specialized Vietnamese embedding models |
|
3. **Reranking**: Advanced document re-ranking for better relevance |
|
4. **Real-time Interface**: Gradio web interface with progress indicators |
|
5. **Source Attribution**: Always cites specific legal documents |
|
6. **Fallback System**: Can search Google if local documents insufficient |
|
|
|
## Performance Metrics |
|
- **Best Method**: Hybrid 2 + Reranking |
|
- **MRR**: 60.82% |
|
- **Coverage**: 88.99% |
|
- **Response Time**: ~0.6 seconds |
|
- **Documents**: 3,271 legal documents, 61,068 chunks |