# =============================== # 📦 Embedding + Vector Search # =============================== chromadb sentence-transformers # Compatible with huggingface-hub 0.30.1 torch # Stable with sentence-transformers # =============================== # 🤖 LLM-Based QA # =============================== transformers # Works well with huggingface-hub 0.30.1 accelerate huggingface-hub # Compatible with transformers 4.37.2 # =============================== # 📄 PDF Parsing # =============================== pymupdf # PyMuPDF for full-page text extraction pdfminer.six # Optional: structured layout extraction # =============================== # 🖼️ OCR + Image Handling # =============================== pytesseract # Requires separate install of Tesseract binary Pillow # =============================== # 🌐 UI Interface # =============================== gradio # Gradio 4+ for modern UI requests # =============================== # 🛠 Utilities and Fixes # =============================== beautifulsoup4 # Parsing for HTML-in-PDFs (e.g., diagrams/tables) pydantic # Chromadb is not yet compatible with pydantic 2.x numpy # Ensures compatibility with chromadb and transformers tqdm # Progress bar (used in embedding scripts) # Natural Language Toolkit========= nltk