--- title: SmartManuals-AI emoji: ๐Ÿง  colorFrom: indigo colorTo: red sdk: gradio sdk_version: 4.29.0 app_file: app.py pinned: false license: apache-2.0 tags: - RAG - LLM - Chroma - Gradio - OCR - HuggingFace - PDF - Word - SemanticSearch - SmartManualsAI --- # โœ… SmartManuals-AI for Hugging Face Spaces SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs. This app is optimized for Hugging Face Spaces and **requires no user upload** โ€” just preload your manuals in a `Manuals/` folder. --- ## ๐Ÿ”ง Features - ๐Ÿง  Ask **natural-language questions** against your own manuals - ๐Ÿ“„ Supports both **PDF** and **Word (.docx)** documents - ๐Ÿ” Uses `sentence-transformers` for semantic search - ๐Ÿ—ƒ๏ธ Indexes chunks in **ChromaDB** (stored locally) - ๐Ÿ’ฌ Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**) - ๐Ÿ–ฅ๏ธ Clean **Gradio interface** for querying --- ## ๐Ÿ“ Folder Structure ``` SmartManuals-AI/ โ”œโ”€โ”€ app.py # Main Hugging Face app โ”œโ”€โ”€ Manuals/ # Place your PDF and DOCX manuals here โ”‚ โ”œโ”€โ”€ OM_Treadmill.pdf โ”‚ โ””โ”€โ”€ Parts_Bike.docx โ”œโ”€โ”€ chroma_store/ # Vector database (auto-generated) โ”œโ”€โ”€ requirements.txt # Dependencies โ””โ”€โ”€ README.md # This file ``` --- ## ๐Ÿš€ Usage on Hugging Face Spaces ### ๐Ÿ” Environment Variable Add this secret in your Space settings: | Name | Value | |-----------|----------------------| | `HF_TOKEN` | Your Hugging Face token | > **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`. --- ### ๐Ÿ“ค Uploading Manuals - Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository. - No need for file uploads via the interface. --- ### ๐Ÿง  How It Works - On app startup: - Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files - Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2` - Chunks are stored in a local **ChromaDB vector database** - At query time: - Your question is embedded and semantically compared against chunks - The most relevant chunks are passed to the LLM - The **LLM (LLaMA 3.1)** generates a focused answer from context only --- ## ๐Ÿค– Default Model - This app uses: **`meta-llama/Llama-3.1-8B-Instruct`** - More models are supported behind-the-scenes (e.g. Mistral, Gemma) - **No need to manually pick** models, doc types, or categories --- ## ๐Ÿงฉ Supported File Types - โœ… PDF (`.pdf`) with OCR fallback using Tesseract - โœ… Word Documents (`.docx`) --- ## ๐Ÿงช Local Development Clone and run locally: ```bash git clone https://github.com/damoojeje/SmartManuals-AI.git cd SmartManuals-AI pip install -r requirements.txt python app.py ``` > ๐Ÿ“ Place your manuals inside the `Manuals/` directory before running. --- ## ๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป Created By **Damilare Eniolabi** ๐Ÿ“ง [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com) ๐Ÿ”— GitHub: [@damoojeje](https://github.com/damoojeje) --- ## ๐Ÿ”– Tags `RAG` ยท `LLM` ยท `Gradio` ยท `ChromaDB` ยท `OCR` ยท `SemanticSearch` ยท `PDF` ยท `Word` ยท `SmartManualsAI` ยท `EquipmentQA`