File size: 4,311 Bytes
0f9751c
 
 
 
 
 
 
 
 
 
 
 
 
2b2929c
4ac2d5f
 
78c1d51
 
 
 
2b2929c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: GraphiqueAcademia
emoji: 🐠
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: false
license: mit
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Sorry this is still a **WIP** 🛠️⚙️

---
tags:
  - agent-demo-track
---



# 🚀 Scientific Paper Assistant

This project is an intelligent **AI Agent** for scientific papers, built with Gradio, Modal, and the Smol Agents framework. It leverages a variety of specialized tools to analyze PDFs of AI papers or regular pdfs with educational content, create mind maps, generate data visualizations, perform web searches, and analyze images.

🌟 **Live demo:** \[link ]

---

## 🧠 What does it do?

👉 **Upload a PDF or image**
👉 **Ask a question** (e.g., "Summarize this section", "Create a mind map", "Generate Python code to implement this method", etc.)
👉 The agent dynamically chooses which specialized tool(s) to use to generate accurate, insightful answers!

---

## 📦 Key components

### 1️⃣ Core logic: `main_agent.py`

* Uses **SmolAgents** with a `CodeAgent` and a Hugging Face `InferenceClientModel`.
* Loads **custom tools**:

  * `PDFQATool`: answers questions about PDF content.
  * `MindMapTool`: generates mind maps from text.
  * `DataGraphTool`: creates data graphs and visualizations.
  * `ImageAnalysisTool`: extracts text from images (OCR).
  * `WebSearchTool`: performs real-time web searches.
* Lets the agent decide which tools to use!

---

### 2️⃣ Tools

| Tool name             | File                     | Purpose                                                            |
| --------------------- | ------------------------ | ------------------------------------------------------------------ |
| **PDFQATool**         | `pdf_qa_tool.py`         | Answers questions about scientific PDFs.                           |
| **MindMapTool**       | `mind_map_tool.py`       | Converts concepts into mind maps for clearer understanding.        |
| **DataGraphTool**     | `data_graph_tool.py`     | Creates data visualizations to illustrate key points.              |
| **ImageAnalysisTool** | `image_analysis_tool.py` | Extracts text from images using OCR (Tesseract).                   |
| **WebSearchTool**     | `web_search_tool.py`     | Performs web searches using a **Modal-deployed FastAPI** endpoint. |

---

### 3️⃣ Modal Deployments

Two **FastAPI apps** deployed on Modal for:

* 🔍 **Web search** (`modal_web_search_app.py`)
* 🖼️ **Image analysis** (`modal_image_analyzer_app.py`)

These APIs handle the heavy lifting outside of the main app, seamlessly integrated via HTTP requests.

---

### 4️⃣ User Interface: `app.py`

* Built with **Gradio**.
* Lets users:

  * Upload PDFs (`pdf_upload`).
  * Upload images (`image_upload`).
  * Enter natural language questions (`user_input`).
* The agent **logically** decides whether to:

  * Use a tool directly (e.g., PDF analysis, mind map creation).
  * Use Modal services (e.g., web search, image analysis).
  * Or combine multiple tools for complex tasks!

---

## ⚙️ Installation & Usage

1️⃣ **Install dependencies** (adjust as needed for your local dev environment):

```bash
pip install -r requirements.txt
```

2️⃣ **Set up Modal deployments** for:

* `modal_web_search_app.py`
* `modal_image_analyzer_app.py`

3️⃣ **Launch the app**:

```bash
python app.py
```

Or deploy it to **Hugging Face Spaces** for a live demo 🤗

---

## 💡 Example prompts

✅ “Summarize the uploaded paper.”
✅ “Generate a mind map of the main contributions.”
✅ “Plot a graph of the data trends discussed.”
✅ “Analyze this image (uploaded) for any text.”
✅ “Web search for related works on this topic.”
✅ “Generate code to implement the method in the paper.”

---

## 🛠️ Custom code generation

I also have plans to add a **Code Implementation Tool**. This will allow the agent to generate Python code snippets to clarify methods or experiments described in the papers!

---

## 📜 License

Open-source under the [MIT License](LICENSE).

---

## ✨ Acknowledgments

* Hugging Face
* Modal
* LangChain
* SmolAgents
* DuckDuckGo (for web search)
* PDFMiner, PyMuPDF (for PDF parsing)