File size: 3,469 Bytes
cf15804
 
 
 
d70ec6f
cf15804
c3d8950
cf15804
 
 
 
c3d8950
 
 
 
 
 
 
 
 
 
cf15804
 
d0bba59
 
cf15804
 
 
d0bba59
 
 
 
 
cf15804
 
 
 
 
 
d0bba59
 
 
 
cf15804
d0bba59
 
cf15804
 
d0bba59
 
cf15804
 
 
d0bba59
 
 
 
cf15804
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0bba59
cf15804
 
d0bba59
cf15804
d0bba59
cf15804
d0bba59
cf15804
 
 
 
d0bba59
cf15804
 
 
 
d0bba59
 
 
cf15804
 
 
 
 
d0bba59
 
 
 
cf15804
 
 
d0bba59
c76542a
d0bba59
 
cf15804
 
 
d0bba59
cf15804
 
d0bba59
 
 
 
cf15804
 
d0bba59
 
cf15804
 
 
 
 
d0bba59
c76542a
 
cf15804
 
c3d8950
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: SmartManuals-AI
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- RAG
- LLM
- Chroma
- Gradio
- OCR
- HuggingFace
- PDF
- Word
- SemanticSearch
- SmartManualsAI
---

# βœ… SmartManuals-AI for Hugging Face Spaces

SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs.

This app is optimized for Hugging Face Spaces and **requires no user upload** β€” just preload your manuals in a `Manuals/` folder.

---

## πŸ”§ Features

- 🧠 Ask **natural-language questions** against your own manuals
- πŸ“„ Supports both **PDF** and **Word (.docx)** documents
- πŸ” Uses `sentence-transformers` for semantic search
- πŸ—ƒοΈ Indexes chunks in **ChromaDB** (stored locally)
- πŸ’¬ Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**)
- πŸ–₯️ Clean **Gradio interface** for querying

---

## πŸ“ Folder Structure

```
SmartManuals-AI/
β”œβ”€β”€ app.py                 # Main Hugging Face app
β”œβ”€β”€ Manuals/               # Place your PDF and DOCX manuals here
β”‚   β”œβ”€β”€ OM_Treadmill.pdf
β”‚   └── Parts_Bike.docx
β”œβ”€β”€ chroma_store/         # Vector database (auto-generated)
β”œβ”€β”€ requirements.txt      # Dependencies
└── README.md             # This file
```

---

## πŸš€ Usage on Hugging Face Spaces

### πŸ” Environment Variable

Add this secret in your Space settings:

| Name      | Value                |
|-----------|----------------------|
| `HF_TOKEN` | Your Hugging Face token |

> **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`.

---

### πŸ“€ Uploading Manuals

- Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository.
- No need for file uploads via the interface.

---

### 🧠 How It Works

- On app startup:
  - Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files
  - Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2`
  - Chunks are stored in a local **ChromaDB vector database**

- At query time:
  - Your question is embedded and semantically compared against chunks
  - The most relevant chunks are passed to the LLM
  - The **LLM (LLaMA 3.1)** generates a focused answer from context only

---

## πŸ€– Default Model

- This app uses: **`meta-llama/Llama-3.1-8B-Instruct`**
- More models are supported behind-the-scenes (e.g. Mistral, Gemma)
- **No need to manually pick** models, doc types, or categories

---

## 🧩 Supported File Types

- βœ… PDF (`.pdf`) with OCR fallback using Tesseract
- βœ… Word Documents (`.docx`)

---

## πŸ§ͺ Local Development

Clone and run locally:

```bash
git clone https://github.com/damoojeje/SmartManuals-AI.git
cd SmartManuals-AI
pip install -r requirements.txt
python app.py
```

> πŸ“ Place your manuals inside the `Manuals/` directory before running.

---

## πŸ‘¨πŸ½β€πŸ’» Created By

**Damilare Eniolabi**  
πŸ“§ [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com)  
πŸ”— GitHub: [@damoojeje](https://github.com/damoojeje)

---

## πŸ”– Tags

`RAG` Β· `LLM` Β· `Gradio` Β· `ChromaDB` Β· `OCR` Β· `SemanticSearch` Β· `PDF` Β· `Word` Β· `SmartManualsAI` Β· `EquipmentQA`