QA-system-in-Vietnamese-law / PRESENTATION_SETUP.md
loringuyen's picture
Upload folder using huggingface_hub
32238e9 verified

A newer version of the Gradio SDK is available: 5.46.0

Upgrade

Vietnamese Legal Chatbot - Presentation Setup Guide

Quick Start for Presentation

Prerequisites

  1. Python 3.8+ installed
  2. Docker installed and running
  3. Google API Key for Gemini (optional for demo)

Step 1: Install Dependencies

pip install -r requirements.txt

Step 2: Start Qdrant Database

python start_qdrant.py

This will:

  • Pull Qdrant Docker image
  • Start Qdrant on http://localhost:6333
  • Create persistent storage in qdrant_data/ folder

Step 3: Set Up the System

python setup_system.py

This will:

  • Load 3,271 legal documents
  • Create 61,068 document chunks
  • Build vector and BM25 indices
  • Set up the RAG system

Step 4: Run the Application

python app.py

This will:

Demo Questions for Presentation

Sample Legal Questions to Try:

  1. "Điều kiện thành lập doanh nghiệp là gì?"

    • Tests basic legal knowledge retrieval
  2. "Quy định về thời gian làm việc tối đa trong ngày?"

    • Tests labor law knowledge
  3. "Thủ tục đăng ký kết hôn cần những gì?"

    • Tests civil law procedures
  4. "Mức phạt vi phạm giao thông đường bộ?"

    • Tests administrative law

Presentation Structure (10-12 minutes)

1. Introduction (2 min)

  • Problem: Legal information access in Vietnam
  • Solution: AI-powered legal assistant using RAG
  • Technology: Hybrid search (BM25 + Vector) + LLM

2. Technical Architecture (3 min)

  • Show the system components
  • Explain hybrid retrieval approach
  • Highlight Vietnamese-specific optimizations

3. Live Demo (3 min)

  • Show the web interface
  • Ask sample questions
  • Demonstrate response quality and citations

4. Performance Results (2 min)

  • Show performance table from results_table.txt
  • Highlight 60.82% MRR achievement
  • Compare different methods

5. Future Work (1 min)

  • Expand legal corpus
  • Mobile app development
  • Integration with legal services

Troubleshooting

If Qdrant fails to start:

# Check Docker status
docker ps

# Restart Qdrant
python start_qdrant.py stop
python start_qdrant.py

If setup fails:

# Clean up and retry
rm -rf qdrant_data/
python start_qdrant.py
python setup_system.py

If app fails to start:

  • Check if Google API key is set (optional)
  • Ensure Qdrant is running on port 6333
  • Check console for error messages

Key Features to Highlight

  1. Hybrid Search: Combines keyword (BM25) and semantic (vector) search
  2. Vietnamese-Specific: Uses specialized Vietnamese embedding models
  3. Reranking: Advanced document re-ranking for better relevance
  4. Real-time Interface: Gradio web interface with progress indicators
  5. Source Attribution: Always cites specific legal documents
  6. Fallback System: Can search Google if local documents insufficient

Performance Metrics

  • Best Method: Hybrid 2 + Reranking
  • MRR: 60.82%
  • Coverage: 88.99%
  • Response Time: ~0.6 seconds
  • Documents: 3,271 legal documents, 61,068 chunks