damoojeje commited on
Commit
cf15804
Β·
verified Β·
1 Parent(s): bc25066

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -42
README.md CHANGED
@@ -1,87 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # βœ… SmartManuals-AI for Hugging Face Spaces
2
 
3
- SmartManuals-AI is a local-first document QA system that uses RAG (retrieval-augmented generation), OCR, and embedding search to answer technical questions from PDFs **and Word documents**.
 
 
4
 
5
  ---
6
 
7
  ## πŸ”§ Features
8
 
9
- - πŸ” **Ask natural-language questions** to your manuals
10
- - πŸ“„ Handles both **PDFs** and **Word `.docx`** files
11
- - 🧠 Uses **semantic search** with `sentence-transformers`
12
- - πŸ—ƒοΈ ChromaDB for fast local vector indexing
13
- - πŸ’¬ Answers generated by **Meta LLaMA 3.1 8B Instruct** (default)
14
- - πŸ“Š Gradio dashboard for interaction
15
 
16
  ---
17
 
18
  ## πŸ“ Folder Structure
 
19
  ```
20
  SmartManuals-AI/
21
- β”œβ”€β”€ app.py # Hugging Face Spaces main app
22
- β”œβ”€β”€ Manuals/ # πŸ“‚ Upload your PDF and Word manuals here
23
  β”‚ β”œβ”€β”€ OM_Treadmill.pdf
24
  β”‚ └── Parts_Bike.docx
25
- β”œβ”€β”€ chroma_store/ # ⛓️ ChromaDB vector DB (auto-generated)
26
- β”œβ”€β”€ requirements.txt # πŸ“¦ Dependencies
27
- └── README.md # πŸ“– This file
28
  ```
29
 
30
  ---
31
 
32
- ## πŸš€ Usage in Hugging Face Spaces
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- ### πŸ” Environment Variables
35
- Add your Hugging Face token as a secret:
36
 
37
- - `HF_TOKEN`: Your Hugging Face access token (required for gated models)
38
 
39
- ### πŸ“€ Upload Your Files
40
- Put all your manuals (PDF and Word `.docx`) into the `Manuals/` folder.
41
 
42
- ### 🧠 App Behavior
43
- - On startup:
44
- - Extracts text (with OCR fallback) from PDFs
45
- - Extracts clean text from Word documents
46
- - Chunks and embeds content into ChromaDB
47
- - During inference:
48
- - Retrieves semantically relevant chunks
49
- - Sends them to LLaMA 3.1 Instruct for answer generation
50
 
51
- ### ❌ No User Upload
52
- This app is **designed to work without file uploads**. All processing is done on preloaded files in the `Manuals/` directory.
 
 
53
 
54
  ---
55
 
56
- ## 🧠 Default Model
57
- - Uses **`meta-llama/Llama-3.1-8B-Instruct`**
58
- - All question answering is **fully automatic**
59
- - User is **not required to pick a model, doc type, or filter** β€” the system decides based on question and content.
 
60
 
61
  ---
62
 
63
  ## 🧩 Supported File Types
64
- - `.pdf` (with OCR for scanned pages)
65
- - `.docx` (via `python-docx`)
 
66
 
67
  ---
68
 
69
  ## πŸ§ͺ Local Development
70
- Install dependencies:
 
 
71
  ```bash
 
 
72
  pip install -r requirements.txt
73
- ```
74
- Run locally:
75
- ```bash
76
  python app.py
77
  ```
78
 
 
 
79
  ---
80
 
81
- ## πŸ‘¨πŸ½β€πŸ’» Project by: [Damilare Eniolabi](mailto:damilareeniolabi@gmail.com)
82
- GitHub: [@damoojeje](https://github.com/damoojeje)
 
 
 
83
 
84
  ---
85
 
86
- ## πŸ“Œ Tags
87
- `RAG` `LLM` `Chroma` `OCR` `PDF` `Word` `Gradio` `HuggingFace` `SmartManualsAI`
 
 
1
+ ---
2
+ title: SmartManuals-AI
3
+ emoji: 🧠
4
+ colorFrom: indigo
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 4.29.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ tags:
12
+ - RAG
13
+ - LLM
14
+ - Chroma
15
+ - Gradio
16
+ - OCR
17
+ - HuggingFace
18
+ - PDF
19
+ - Word
20
+ - SemanticSearch
21
+ - SmartManualsAI
22
+ ---
23
+
24
  # βœ… SmartManuals-AI for Hugging Face Spaces
25
 
26
+ SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs.
27
+
28
+ This app is optimized for Hugging Face Spaces and **requires no user upload** β€” just preload your manuals in a `Manuals/` folder.
29
 
30
  ---
31
 
32
  ## πŸ”§ Features
33
 
34
+ - 🧠 Ask **natural-language questions** against your own manuals
35
+ - πŸ“„ Supports both **PDF** and **Word (.docx)** documents
36
+ - πŸ” Uses `sentence-transformers` for semantic search
37
+ - πŸ—ƒοΈ Indexes chunks in **ChromaDB** (stored locally)
38
+ - πŸ’¬ Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**)
39
+ - πŸ–₯️ Clean **Gradio interface** for querying
40
 
41
  ---
42
 
43
  ## πŸ“ Folder Structure
44
+
45
  ```
46
  SmartManuals-AI/
47
+ β”œβ”€β”€ app.py # Main Hugging Face app
48
+ β”œβ”€β”€ Manuals/ # Place your PDF and DOCX manuals here
49
  β”‚ β”œβ”€β”€ OM_Treadmill.pdf
50
  β”‚ └── Parts_Bike.docx
51
+ β”œβ”€β”€ chroma_store/ # Vector database (auto-generated)
52
+ β”œβ”€β”€ requirements.txt # Dependencies
53
+ └── README.md # This file
54
  ```
55
 
56
  ---
57
 
58
+ ## πŸš€ Usage on Hugging Face Spaces
59
+
60
+ ### πŸ” Environment Variable
61
+
62
+ Add this secret in your Space settings:
63
+
64
+ | Name | Value |
65
+ |-----------|----------------------|
66
+ | `HF_TOKEN` | Your Hugging Face token |
67
+
68
+ > **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`.
69
+
70
+ ---
71
+
72
+ ### πŸ“€ Uploading Manuals
73
 
74
+ - Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository.
75
+ - No need for file uploads via the interface.
76
 
77
+ ---
78
 
79
+ ### 🧠 How It Works
 
80
 
81
+ - On app startup:
82
+ - Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files
83
+ - Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2`
84
+ - Chunks are stored in a local **ChromaDB vector database**
 
 
 
 
85
 
86
+ - At query time:
87
+ - Your question is embedded and semantically compared against chunks
88
+ - The most relevant chunks are passed to the LLM
89
+ - The **LLM (LLaMA 3.1)** generates a focused answer from context only
90
 
91
  ---
92
 
93
+ ## πŸ€– Default Model
94
+
95
+ - This app uses: **`meta-llama/Llama-3.1-8B-Instruct`**
96
+ - More models are supported behind-the-scenes (e.g. Mistral, Gemma)
97
+ - **No need to manually pick** models, doc types, or categories
98
 
99
  ---
100
 
101
  ## 🧩 Supported File Types
102
+
103
+ - βœ… PDF (`.pdf`) with OCR fallback using Tesseract
104
+ - βœ… Word Documents (`.docx`)
105
 
106
  ---
107
 
108
  ## πŸ§ͺ Local Development
109
+
110
+ Clone and run locally:
111
+
112
  ```bash
113
+ git clone https://github.com/damoojeje/SmartManuals-AI.git
114
+ cd SmartManuals-AI
115
  pip install -r requirements.txt
 
 
 
116
  python app.py
117
  ```
118
 
119
+ > πŸ“ Place your manuals inside the `Manuals/` directory before running.
120
+
121
  ---
122
 
123
+ ## πŸ‘¨πŸ½β€πŸ’» Created By
124
+
125
+ **Damilare Eniolabi**
126
+ πŸ“§ [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com)
127
+ πŸ”— GitHub: [@damoojeje](https://github.com/damoojeje)
128
 
129
  ---
130
 
131
+ ## πŸ”– Tags
132
+
133
+ `RAG` Β· `LLM` Β· `Gradio` Β· `ChromaDB` Β· `OCR` Β· `SemanticSearch` Β· `PDF` Β· `Word` Β· `SmartManualsAI` Β· `EquipmentQA`