---
title: SmartManuals-AI
emoji: 🧠
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 4.29.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - RAG
  - LLM
  - Chroma
  - Gradio
  - OCR
  - HuggingFace
  - PDF
  - Word
  - SemanticSearch
  - SmartManualsAI
---

# ✅ SmartManuals-AI for Hugging Face Spaces

SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs.

This app is optimized for Hugging Face Spaces and **requires no user upload** — just preload your manuals in a `Manuals/` folder.

---

## 🔧 Features

- 🧠 Ask **natural-language questions** against your own manuals
- 📄 Supports both **PDF** and **Word (.docx)** documents
- 🔍 Uses `sentence-transformers` for semantic search
- 🗃️ Indexes chunks in **ChromaDB** (stored locally)
- 💬 Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**)
- 🖥️ Clean **Gradio interface** for querying

---

## 📁 Folder Structure

```
SmartManuals-AI/
├── app.py                 # Main Hugging Face app
├── Manuals/               # Place your PDF and DOCX manuals here
│   ├── OM_Treadmill.pdf
│   └── Parts_Bike.docx
├── chroma_store/         # Vector database (auto-generated)
├── requirements.txt      # Dependencies
└── README.md             # This file
```

---

## 🚀 Usage on Hugging Face Spaces

### 🔐 Environment Variable

Add this secret in your Space settings:

| Name      | Value                |
|-----------|----------------------|
| `HF_TOKEN` | Your Hugging Face token |

> **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`.

---

### 📤 Uploading Manuals

- Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository.
- No need for file uploads via the interface.

---

### 🧠 How It Works

- On app startup:
  - Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files
  - Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2`
  - Chunks are stored in a local **ChromaDB vector database**

- At query time:
  - Your question is embedded and semantically compared against chunks
  - The most relevant chunks are passed to the LLM
  - The **LLM (LLaMA 3.1)** generates a focused answer from context only

---

## 🤖 Default Model

- This app uses: **`meta-llama/Llama-3.1-8B-Instruct`**
- More models are supported behind-the-scenes (e.g. Mistral, Gemma)
- **No need to manually pick** models, doc types, or categories

---

## 🧩 Supported File Types

- ✅ PDF (`.pdf`) with OCR fallback using Tesseract
- ✅ Word Documents (`.docx`)

---

## 🧪 Local Development

Clone and run locally:

```bash
git clone https://github.com/damoojeje/SmartManuals-AI.git
cd SmartManuals-AI
pip install -r requirements.txt
python app.py
```

> 📁 Place your manuals inside the `Manuals/` directory before running.

---

## 👨🏽‍💻 Created By

**Damilare Eniolabi**  
📧 [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com)  
🔗 GitHub: [@damoojeje](https://github.com/damoojeje)

---

## 🔖 Tags

`RAG` · `LLM` · `Gradio` · `ChromaDB` · `OCR` · `SemanticSearch` · `PDF` · `Word` · `SmartManualsAI` · `EquipmentQA`