onisj commited on
Commit
f95c630
·
1 Parent(s): 9cd535d

docs(readme): readme updated

Browse files
Files changed (6) hide show
  1. .gitignore +5 -0
  2. README.md +131 -41
  3. app.py +38 -18
  4. requirements.txt +4 -1
  5. test.py +8 -5
  6. tools/search.py +4 -2
.gitignore CHANGED
@@ -47,6 +47,7 @@ coverage.xml
47
  *.log.*
48
  *.tmp
49
  temp/
 
50
 
51
  # Dependency directories
52
  pip-wheel-metadata/
@@ -72,3 +73,7 @@ pip-wheel-metadata/
72
  *~
73
  *.bak
74
  *.old
 
 
 
 
 
47
  *.log.*
48
  *.tmp
49
  temp/
50
+ *.json
51
 
52
  # Dependency directories
53
  pip-wheel-metadata/
 
73
  *~
74
  *.bak
75
  *.old
76
+
77
+ project_struct.txt
78
+ test.py
79
+ result.txt
README.md CHANGED
@@ -1,62 +1,81 @@
1
  ---
2
  title: JARVIS Gaia Agent
3
- emoji: 🐢
4
  colorFrom: indigo
5
  colorTo: green
6
- sdk: docker
7
  pinned: false
8
  license: mit
9
  short_description: Enhanced JARVIS AI agent for GAIA benchmark
 
 
 
 
 
10
  ---
11
 
12
  # Evolved JARVIS Gaia Agent
13
 
14
- An advanced Python-based AI agent combining `langchain`, `smolagents`, SERPAPI, and OCR for web searches, file parsing, and data retrieval. Deployed as a Hugging Face Space for GAIA benchmark evaluation.
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- #### Directory Structure
17
  ```
18
  jarvis_gaia_agent/
19
- ├── app.py # Main application with Gradio interface and agent logic
20
- ├── state.py # Defines JARVISState for state management
21
- ├── retriever.py # Guest info retriever tool
22
  ├── tools/ # Directory for all tools
23
  │ ├── __init__.py # Exports all tools
24
- │ ├── search.py # Web search tools (SERPAPI-based)
25
- │ ├── file_parser.py # File parsing tool (CSV, TXT, PDF, Excel)
26
- │ ├── image_parser.py # Image parsing tool (OCR)
27
- │ ├── calculator.py # Calculator tool
28
- │ ├── document_retriever.py # Document retrieval tool
29
- │ ├── duckduckgo_search.py # DuckDuckGo search tool (from smolagents)
30
- │ ├── weather_info.py # Weather info tool (OpenWeatherMap)
31
- │ ├── hub_stats.py # Hugging Face Hub stats tool
32
- │ ├── guest_info.py # Guest info retriever tool (moved from retriever.py)
33
  ├── requirements.txt # Python dependencies
34
- ├── Dockerfile # Docker configuration
35
  ├── README.md # Project documentation
36
- ├── .env # Environment variables (not committed)
 
37
  ```
38
 
39
- ## Features
40
 
41
- - **Web Search**: SERPAPI and DuckDuckGo for robust searches.
42
- - **File Parsing**: Handles CSV, TXT, PDF, and Excel files.
43
- - **Image Parsing**: OCR with `easyocr` for image-based questions.
44
- - **Data Retrieval**: Guest info retriever for structured data.
45
- - **External APIs**: Weather (OpenWeatherMap), Hugging Face Hub stats.
46
- - **State Management**: `langgraph` for multi-step reasoning.
47
- - **Exact-Match Answers**: Optimized for GAIA Level 1 questions.
48
 
49
  ## Prerequisites
50
 
51
- - Python 3.11
52
- - Tesseract OCR (`brew install tesseract` on macOS)
53
- - API keys in `.env`:
54
- - `HUGGINGFACEHUB_API_TOKEN`
55
- - `SERPAPI_API_KEY`
56
- - `OPENWEATHERMAP_API_KEY`
57
- - `SPACE_ID`
 
 
 
 
58
 
59
- ## Setup
60
 
61
  1. **Clone the Repository**:
62
  ```bash
@@ -64,16 +83,87 @@ jarvis_gaia_agent/
64
  cd jarvis_gaia_agent
65
  ```
66
 
67
- 2. **Set Up Environment Variables**:
68
- Create a `.env` file with your API keys.
 
 
 
69
 
70
- 3. **Run Locally**:
71
  ```bash
72
  pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  python app.py
74
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
- 4. **Deploy to Hugging Face Space**:
77
- - Push code to your Space.
78
- - Set environment variables in Space settings.
79
- - Run evaluation via Gradio interface.
 
1
  ---
2
  title: JARVIS Gaia Agent
3
+ emoji: 🦾
4
  colorFrom: indigo
5
  colorTo: green
6
+ sdk: gradio
7
  pinned: false
8
  license: mit
9
  short_description: Enhanced JARVIS AI agent for GAIA benchmark
10
+ models:
11
+ - meta-llama/Llama-3.2-1B-Instruct
12
+ - sentence-transformers/all-MiniLM-L6-v2
13
+ datasets:
14
+ - gaia-benchmark/GAIA
15
  ---
16
 
17
  # Evolved JARVIS Gaia Agent
18
 
19
+ An advanced Python-based AI agent built with `langchain`, `langgraph`, SERPAPI, and OCR capabilities for web searches, file parsing, image analysis, and data retrieval. Deployed as a Hugging Face Space (`onisj/jarvis_gaia_agent`) for evaluating performance on the GAIA benchmark, targeting a score >30% (6/20 correct).
20
+
21
+ ## Features
22
+
23
+ - **Web Search**: Integrates SERPAPI and DuckDuckGo for robust, multi-hop searches.
24
+ - **File Parsing**: Processes CSV, TXT, Excel, and PDF files for GAIA tasks.
25
+ - **Image Parsing**: Uses OCR (`easyocr`) to extract text from images.
26
+ - **Data Retrieval**: Includes a guest info retriever for structured queries.
27
+ - **External APIs**: Supports weather data (OpenWeatherMap) and Hugging Face Hub stats.
28
+ - **State Management**: Employs `langgraph` for multi-step reasoning workflows.
29
+ - **Exact-Match Answers**: Optimized for GAIA Level 1 questions with precise formatting (e.g., USD to two decimals, comma-separated lists).
30
+ - **Gradio Interface**: Provides a user-friendly UI for running evaluations and submitting answers.
31
+
32
+ ## Directory Structure
33
 
 
34
  ```
35
  jarvis_gaia_agent/
36
+ ├── app.py # Main Gradio application with agent logic
37
+ ├── state.py # Defines JARVISState for LangGraph state management
38
+ ├── search.py # Web search tools (SERPAPI, multi-hop search)
39
  ├── tools/ # Directory for all tools
40
  │ ├── __init__.py # Exports all tools
41
+ │ ├── file_parser.py # Parses CSV, TXT, Excel, and PDF files
42
+ │ ├── image_parser.py # OCR-based image parsing
43
+ │ ├── calculator.py # Mathematical calculations
44
+ │ ├── document_retriever.py # PDF document retrieval
45
+ │ ├── duckduckgo_search.py # DuckDuckGo search integration
46
+ │ ├── weather_info.py # Weather data via OpenWeatherMap
47
+ │ ├── hub_stats.py # Hugging Face Hub statistics
48
+ │ ├── guest_info.py # Guest information retrieval
 
49
  ├── requirements.txt # Python dependencies
 
50
  ├── README.md # Project documentation
51
+ ├── .gitignore # Excludes .env, temp/, etc.
52
+ ├── temp/ # Temporary directory for GAIA files (created at runtime)
53
  ```
54
 
55
+ ## Models and Datasets
56
 
57
+ - **Models**:
58
+ - `meta-llama/Llama-3.2-1B-Instruct`: Primary LLM for reasoning and tool selection (Hugging Face Inference API or local).
59
+ - `sentence-transformers/all-MiniLM-L6-v2`: Embedding model for text similarity tasks.
60
+ - Note: Together AI models (`meta-llama/Llama-3.3-70B-Instruct-Turbo-Free`, `deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free`) are used via API but not hosted on Hugging Face, so they’re not listed in metadata.
61
+ - **Datasets**:
62
+ - `gaia-benchmark/GAIA`: Benchmark dataset for evaluating agent performance.
 
63
 
64
  ## Prerequisites
65
 
66
+ - **Python**: 3.9 or higher.
67
+ - **Tesseract OCR**: Required for image parsing.
68
+ - macOS: `brew install tesseract`
69
+ - Ubuntu: `sudo apt-get install tesseract-ocr`
70
+ - Windows: Install via [Tesseract Installer](https://github.com/UB-Mannheim/tesseract/wiki).
71
+ - **API Keys**: Set in `.env` (local) or Hugging Face Space Secrets (deployment):
72
+ - `HUGGINGFACEHUB_API_TOKEN`: Hugging Face token for model access.
73
+ - `TOGETHER_API_KEY`: Together AI API key for LLM inference.
74
+ - `SERPAPI_API_KEY`: SERPAPI key for web searches.
75
+ - `OPENWEATHERMAP_API_KEY`: OpenWeatherMap key for weather queries.
76
+ - `SPACE_ID`: `onisj/jarvis_gaia_agent`.
77
 
78
+ ## Setup and Local Testing
79
 
80
  1. **Clone the Repository**:
81
  ```bash
 
83
  cd jarvis_gaia_agent
84
  ```
85
 
86
+ 2. **Create Virtual Environment**:
87
+ ```bash
88
+ python -m venv venv
89
+ source venv/bin/activate # Windows: venv\Scripts\activate
90
+ ```
91
 
92
+ 3. **Install Dependencies**:
93
  ```bash
94
  pip install -r requirements.txt
95
+ ```
96
+
97
+ 4. **Configure Environment Variables**:
98
+ Create a `.env` file:
99
+ ```text
100
+ SPACE_ID=onisj/jarvis_gaia_agent
101
+ HUGGINGFACEHUB_API_TOKEN=your_hf_token
102
+ TOGETHER_API_KEY=your_together_api_key
103
+ SERPAPI_API_KEY=your_serpapi_key
104
+ OPENWEATHERMAP_API_KEY=your_openweather_key
105
+ ```
106
+
107
+ 5. **Test with Mock File** (optional):
108
+ ```bash
109
+ mkdir temp
110
+ echo "Item,Type,Sales\nBurger,Food,1000\nCola,Drink,500" > temp/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx
111
+ ```
112
+
113
+ 6. **Run Locally**:
114
+ ```bash
115
  python app.py
116
  ```
117
+ - Open `http://127.0.0.1:7860` (port may vary).
118
+ - Log in with Hugging Face credentials.
119
+ - Click “Run Evaluation & Submit All Answers” to test GAIA tasks.
120
+
121
+ ## Deployment to Hugging Face Space
122
+
123
+ 1. **Push Code**:
124
+ ```bash
125
+ git add .
126
+ git commit -m "Update JARVIS Gaia Agent with README metadata"
127
+ git push origin main
128
+ ```
129
+
130
+ 2. **Set Space Secrets**:
131
+ - Go to `https://huggingface.co/spaces/onisj/jarvis_gaia_agent` > Settings > Repository Secrets.
132
+ - Add:
133
+ - `SPACE_ID`: `onisj/jarvis_gaia_agent`
134
+ - `HUGGINGFACEHUB_API_TOKEN`
135
+ - `TOGETHER_API_KEY`
136
+ - `SERPAPI_API_KEY`
137
+ - `OPENWEATHERMAP_API_KEY`
138
+
139
+ 3. **Build and Run**:
140
+ - Hugging Face auto-builds the Space after pushing.
141
+ - Access the Gradio interface at `https://onisj-jarvis-gaia-agent.hf.space`.
142
+ - Log in and click “Run Evaluation & Submit All Answers” to submit GAIA answers.
143
+
144
+ 4. **Verify Submission**:
145
+ - Check `status_output` for:
146
+ ```
147
+ Submission Successful!
148
+ User: your_username
149
+ Overall Score: XX% (Y/20 correct)
150
+ Message: ...
151
+ ```
152
+ - Aim for >30% (6/20 correct).
153
+
154
+ ## Troubleshooting
155
+
156
+ - **Model Access (404)**: Verify API keys; test `initialize_llm` locally.
157
+ - **SERPAPI Timeout**: Ensure `SERPAPI_API_KEY` is valid; check `search.py` logs.
158
+ - **GAIA File Access**: Confirm `temp/` directory permissions; test `download_file`.
159
+ - **Low GAIA Score**: Analyze `results_table` for errors; enhance `multi_hop_search_tool` or answer formatting.
160
+ - **Logs**: Check Space > Settings > Logs for build/run errors.
161
+
162
+ ## License
163
+
164
+ MIT License. See [LICENSE](LICENSE) for details.
165
+
166
+ ## Acknowledgements
167
 
168
+ - Built with `langchain`, `langgraph`, and Hugging Face tools.
169
+ - Evaluated on the GAIA benchmark (`gaia-benchmark/GAIA`).
 
 
app.py CHANGED
@@ -15,6 +15,7 @@ import gradio as gr
15
  from dotenv import load_dotenv
16
  from huggingface_hub import InferenceClient
17
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
18
  from state import JARVISState
19
  from tools import (
20
  search_tool, multi_hop_search_tool, file_parser_tool, image_parser_tool,
@@ -55,24 +56,23 @@ HF_MODEL = "meta-llama/Llama-3.2-1B-Instruct"
55
 
56
  # Initialize LLM clients
57
  def initialize_llm():
 
58
  for model in TOGETHER_MODELS:
59
  try:
60
- client = InferenceClient(
61
- model=model,
62
- api_key=TOGETHER_API_KEY,
63
- base_url="https://api.together.ai/v1",
64
- timeout=30
65
- )
66
- client.chat.completions.create(
67
  model=model,
68
  messages=[{"role": "user", "content": "Test"}],
69
- max_tokens=10,
70
  )
71
  logger.info(f"Initialized Together AI model: {model}")
72
  return client, "together"
73
  except Exception as e:
74
- logger.warning(f"Failed to initialize {model}: {e}")
75
 
 
76
  try:
77
  client = InferenceClient(
78
  model=HF_MODEL,
@@ -84,9 +84,10 @@ def initialize_llm():
84
  except Exception as e:
85
  logger.warning(f"Failed to initialize HF Inference API: {e}")
86
 
 
87
  try:
88
  tokenizer = AutoTokenizer.from_pretrained(HF_MODEL, token=HF_API_TOKEN)
89
- model = AutoModelForCausalLM.from_pretrained(HF_MODEL, token=HF_API_TOKEN, device_map="mps")
90
  logger.info(f"Initialized local Hugging Face model: {HF_MODEL}")
91
  return (model, tokenizer), "hf_local"
92
  except Exception as e:
@@ -155,13 +156,24 @@ async def parse_question(state: JARVISState) -> JARVISState:
155
  inputs = tokenizer.apply_chat_template(
156
  [{"role": "system", "content": prompt[0].content}, {"role": "user", "content": prompt[1].content}],
157
  return_tensors="pt"
158
- ).to("mps")
159
  outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
160
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
161
  tools_needed = json.loads(response.strip())
162
- else:
163
  response = llm_client.chat.completions.create(
164
- model=llm_client.model if llm_type == "together" else HF_MODEL,
 
 
 
 
 
 
 
 
 
 
 
165
  messages=[
166
  {"role": "system", "content": prompt[0].content},
167
  {"role": "user", "content": prompt[1].content}
@@ -322,12 +334,20 @@ Document results: {document_results}""")
322
  try:
323
  if llm_type == "hf_local":
324
  model, tokenizer = llm_client
325
- inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("mps")
326
  outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
327
  answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
328
- else:
 
 
 
 
 
 
 
 
329
  response = llm_client.chat.completions.create(
330
- model=llm_client.model if llm_type == "together" else HF_MODEL,
331
  messages=messages,
332
  max_tokens=512,
333
  temperature=0.7
@@ -518,8 +538,8 @@ with gr.Blocks() as demo:
518
  """
519
  )
520
  with gr.Row():
521
- gr.LoginButton()
522
- gr.LogoutButton()
523
  run_button = gr.Button("Run Evaluation & Submit All Answers")
524
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
525
  results_table = gr.DataFrame(label="Questions and Answers", wrap=True, headers=["Task ID", "Question", "Answer"])
 
15
  from dotenv import load_dotenv
16
  from huggingface_hub import InferenceClient
17
  from transformers import AutoTokenizer, AutoModelForCausalLM
18
+ import together
19
  from state import JARVISState
20
  from tools import (
21
  search_tool, multi_hop_search_tool, file_parser_tool, image_parser_tool,
 
56
 
57
  # Initialize LLM clients
58
  def initialize_llm():
59
+ # Try Together AI models
60
  for model in TOGETHER_MODELS:
61
  try:
62
+ together.api_key = TOGETHER_API_KEY
63
+ client = together.Together()
64
+ # Test the model
65
+ response = client.chat.completions.create(
 
 
 
66
  model=model,
67
  messages=[{"role": "user", "content": "Test"}],
68
+ max_tokens=10
69
  )
70
  logger.info(f"Initialized Together AI model: {model}")
71
  return client, "together"
72
  except Exception as e:
73
+ logger.warning(f"Failed to initialize Together AI model {model}: {e}")
74
 
75
+ # Fallback to Hugging Face Inference API
76
  try:
77
  client = InferenceClient(
78
  model=HF_MODEL,
 
84
  except Exception as e:
85
  logger.warning(f"Failed to initialize HF Inference API: {e}")
86
 
87
+ # Fallback to local Hugging Face model
88
  try:
89
  tokenizer = AutoTokenizer.from_pretrained(HF_MODEL, token=HF_API_TOKEN)
90
+ model = AutoModelForCausalLM.from_pretrained(HF_MODEL, token=HF_API_TOKEN, device_map="auto")
91
  logger.info(f"Initialized local Hugging Face model: {HF_MODEL}")
92
  return (model, tokenizer), "hf_local"
93
  except Exception as e:
 
156
  inputs = tokenizer.apply_chat_template(
157
  [{"role": "system", "content": prompt[0].content}, {"role": "user", "content": prompt[1].content}],
158
  return_tensors="pt"
159
+ ).to(model.device)
160
  outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
161
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
162
  tools_needed = json.loads(response.strip())
163
+ elif llm_type == "together":
164
  response = llm_client.chat.completions.create(
165
+ model=llm_client.model,
166
+ messages=[
167
+ {"role": "system", "content": prompt[0].content},
168
+ {"role": "user", "content": prompt[1].content}
169
+ ],
170
+ max_tokens=512,
171
+ temperature=0.7
172
+ )
173
+ tools_needed = json.loads(response.choices[0].message.content.strip())
174
+ else: # hf_api
175
+ response = llm_client.chat.completions.create(
176
+ model=HF_MODEL,
177
  messages=[
178
  {"role": "system", "content": prompt[0].content},
179
  {"role": "user", "content": prompt[1].content}
 
334
  try:
335
  if llm_type == "hf_local":
336
  model, tokenizer = llm_client
337
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
338
  outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
339
  answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
340
+ elif llm_type == "together":
341
+ response = llm_client.chat.completions.create(
342
+ model=llm_client.model,
343
+ messages=messages,
344
+ max_tokens=512,
345
+ temperature=0.7
346
+ )
347
+ answer = response.choices[0].message.content.strip()
348
+ else: # hf_api
349
  response = llm_client.chat.completions.create(
350
+ model=HF_MODEL,
351
  messages=messages,
352
  max_tokens=512,
353
  temperature=0.7
 
538
  """
539
  )
540
  with gr.Row():
541
+ gr.LoginButton(value="Login to Hugging Face")
542
+ # Removed gr.LogoutButton due to deprecation
543
  run_button = gr.Button("Run Evaluation & Submit All Answers")
544
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
545
  results_table = gr.DataFrame(label="Questions and Answers", wrap=True, headers=["Task ID", "Question", "Answer"])
requirements.txt CHANGED
@@ -20,4 +20,7 @@ transformers
20
  asyncio
21
  serpapi
22
  duckduckgo-search
23
- torch
 
 
 
 
20
  asyncio
21
  serpapi
22
  duckduckgo-search
23
+ torch
24
+ together
25
+ google-search-results
26
+ beautifulsoup4
test.py CHANGED
@@ -1,7 +1,10 @@
1
- import os
2
- import requests
3
 
 
 
 
 
4
 
5
- headers = {"Authorization": f"Bearer {os.getenv('TOGETHER_API_KEY')}"}
6
- response = requests.get("https://api.together.ai/models", headers=headers)
7
- print(response.json())
 
1
+ from serpapi import GoogleSearch
 
2
 
3
+ params = {
4
+ "q": "drop shipping",
5
+ "api_key": "e44c79583cac0e507fee32d564f190b7290a313d886edd5ba5fccc93df932733"
6
+ }
7
 
8
+ search = GoogleSearch(params)
9
+ results = search.get_dict()
10
+ ai_overview = results["ai_overview"]
tools/search.py CHANGED
@@ -1,7 +1,9 @@
1
  import os
2
- from serpapi import GoogleSearch
3
- from langchain.tools import Tool
4
  import asyncio
 
 
 
5
  from typing import List, Dict, Any
6
  from langchain_core.prompts import ChatPromptTemplate
7
  from langchain_core.messages import SystemMessage, HumanMessage
 
1
  import os
2
+ import json
 
3
  import asyncio
4
+ # from serpapi import GoogleSearch
5
+ from google_search_results import GoogleSearch
6
+ from langchain.tools import Tool
7
  from typing import List, Dict, Any
8
  from langchain_core.prompts import ChatPromptTemplate
9
  from langchain_core.messages import SystemMessage, HumanMessage