Spaces:

loringuyen
/

QA-system-in-Vietnamese-law

Sleeping

File size: 3,171 Bytes

32238e9

# Vietnamese Legal Chatbot - Presentation Setup Guide

## Quick Start for Presentation

### Prerequisites
1. **Python 3.8+** installed
2. **Docker** installed and running
3. **Google API Key** for Gemini (optional for demo)

### Step 1: Install Dependencies
```bash
pip install -r requirements.txt
```

### Step 2: Start Qdrant Database
```bash
python start_qdrant.py
```
This will:
- Pull Qdrant Docker image
- Start Qdrant on http://localhost:6333
- Create persistent storage in `qdrant_data/` folder

### Step 3: Set Up the System
```bash
python setup_system.py
```
This will:
- Load 3,271 legal documents
- Create 61,068 document chunks
- Build vector and BM25 indices
- Set up the RAG system

### Step 4: Run the Application
```bash
python app.py
```
This will:
- Start the Gradio web interface
- Open at http://localhost:7860
- Show initialization progress

## Demo Questions for Presentation

### Sample Legal Questions to Try:
1. **"Điều kiện thành lập doanh nghiệp là gì?"**
   - Tests basic legal knowledge retrieval

2. **"Quy định về thời gian làm việc tối đa trong ngày?"**
   - Tests labor law knowledge

3. **"Thủ tục đăng ký kết hôn cần những gì?"**
   - Tests civil law procedures

4. **"Mức phạt vi phạm giao thông đường bộ?"**
   - Tests administrative law

## Presentation Structure (10-12 minutes)

### 1. Introduction (2 min)
- **Problem**: Legal information access in Vietnam
- **Solution**: AI-powered legal assistant using RAG
- **Technology**: Hybrid search (BM25 + Vector) + LLM

### 2. Technical Architecture (3 min)
- Show the system components
- Explain hybrid retrieval approach
- Highlight Vietnamese-specific optimizations

### 3. Live Demo (3 min)
- Show the web interface
- Ask sample questions
- Demonstrate response quality and citations

### 4. Performance Results (2 min)
- Show performance table from `results_table.txt`
- Highlight 60.82% MRR achievement
- Compare different methods

### 5. Future Work (1 min)
- Expand legal corpus
- Mobile app development
- Integration with legal services

## Troubleshooting

### If Qdrant fails to start:
```bash
# Check Docker status
docker ps

# Restart Qdrant
python start_qdrant.py stop
python start_qdrant.py
```

### If setup fails:
```bash
# Clean up and retry
rm -rf qdrant_data/
python start_qdrant.py
python setup_system.py
```

### If app fails to start:
- Check if Google API key is set (optional)
- Ensure Qdrant is running on port 6333
- Check console for error messages

## Key Features to Highlight

1. **Hybrid Search**: Combines keyword (BM25) and semantic (vector) search
2. **Vietnamese-Specific**: Uses specialized Vietnamese embedding models
3. **Reranking**: Advanced document re-ranking for better relevance
4. **Real-time Interface**: Gradio web interface with progress indicators
5. **Source Attribution**: Always cites specific legal documents
6. **Fallback System**: Can search Google if local documents insufficient

## Performance Metrics
- **Best Method**: Hybrid 2 + Reranking
- **MRR**: 60.82%
- **Coverage**: 88.99%
- **Response Time**: ~0.6 seconds
- **Documents**: 3,271 legal documents, 61,068 chunks