Questions

In [1]:
# Load metadata.jsonl
import json
# Load the metadata.jsonl file
with open('metadata.jsonl', 'r') as jsonl_file:
    json_list = list(jsonl_file)

json_QA = []
for json_str in json_list:
    json_data = json.loads(json_str)
    json_QA.append(json_data)

In [2]:
json_QA

[{'task_id': 'c61d22de-5f6c-4958-a7f6-5e9707bd3466',
  'Question': 'A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?',
  'Level': 2,
  'Final answer': 'egalitarian',
  'file_name': '',
  'Annotator Metadata': {'Steps': '1. Go to arxiv.org and navigate to the Advanced Search page.\n2. Enter "AI regulation" in the search box and select "All fields" from the dropdown.\n3. Enter 2022-06-01 and 2022-07-01 into the date inputs, select "Submission date (original)", and submit the search.\n4. Go through the search results to find the article that has a figure with three axes and labels on each end of the axes, titled "Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation".\n5. Note the six words used as labels

In [5]:
# randomly select 3 samples
# {"task_id": "c61d22de-5f6c-4958-a7f6-5e9707bd3466", "Question": "A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?", "Level": 2, "Final answer": "egalitarian", "file_name": "", "Annotator Metadata": {"Steps": "1. Go to arxiv.org and navigate to the Advanced Search page.\n2. Enter \"AI regulation\" in the search box and select \"All fields\" from the dropdown.\n3. Enter 2022-06-01 and 2022-07-01 into the date inputs, select \"Submission date (original)\", and submit the search.\n4. Go through the search results to find the article that has a figure with three axes and labels on each end of the axes, titled \"Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation\".\n5. Note the six words used as labels: deontological, egalitarian, localized, standardized, utilitarian, and consequential.\n6. Go back to arxiv.org\n7. Find \"Physics and Society\" and go to the page for the \"Physics and Society\" category.\n8. Note that the tag for this category is \"physics.soc-ph\".\n9. Go to the Advanced Search page.\n10. Enter \"physics.soc-ph\" in the search box and select \"All fields\" from the dropdown.\n11. Enter 2016-08-11 and 2016-08-12 into the date inputs, select \"Submission date (original)\", and submit the search.\n12. Search for instances of the six words in the results to find the paper titled \"Phase transition from egalitarian to hierarchical societies driven by competition between cognitive and social constraints\", indicating that \"egalitarian\" is the correct answer.", "Number of steps": "12", "How long did this take?": "8 minutes", "Tools": "1. Web browser\n2. Image recognition tools (to identify and parse a figure with three axes)", "Number of tools": "2"}}

import random
random.seed(42)
random_samples = random.sample(json_QA, 1)
for sample in random_samples:
    print("=" * 50)
    print(f"Task ID: {sample['task_id']}")
    print(f"Question: {sample['Question']}")
    print(f"Level: {sample['Level']}")
    print(f"Final Answer: {sample['Final answer']}")
    print(f"Annotator Metadata: ")
    print(f"  ├── Steps: ")
    for step in sample['Annotator Metadata']['Steps'].split('\n'):
        print(f"  │      ├── {step}")
    print(f"  ├── Number of steps: {sample['Annotator Metadata']['Number of steps']}")
    print(f"  ├── How long did this take?: {sample['Annotator Metadata']['How long did this take?']}")
    print(f"  ├── Tools:")
    for tool in sample['Annotator Metadata']['Tools'].split('\n'):
        print(f"  │      ├── {tool}")
    print(f"  └── Number of tools: {sample['Annotator Metadata']['Number of tools']}")
print("=" * 50)



Task ID: 853c8244-429e-46ca-89f2-addf40dfb2bd
Question: In the 2015 Metropolitan Museum of Art exhibition titled after the Chinese zodiac animal of 2015, how many of the "twelve animals of the Chinese zodiac" have a hand visible?
Level: 2
Final Answer: 11
Annotator Metadata: 
  ├── Steps: 
  │      ├── 1. Search "2015 Chinese zodiac animal" on Google search.
  │      ├── 2. Note the animal (ram).
  │      ├── 3. Search "Metropolitan Museum of Art" on Google search.
  │      ├── 4. Open the Metropolitan Museum of Art website.
  │      ├── 5. Click "Exhibitions" under "Exhibitions and Events" 
  │      ├── 6. Click "Past".
  │      ├── 7. Set the year to 2015.
  │      ├── 8. Scroll to find the exhibit mentioning rams and click "Celebration of the Year of the Ram".
  │      ├── 9. Click "View All Objects".
  │      ├── 10. Click "Twelve animals of the Chinese zodiac" to open the image.
  │      ├── 11. Count how many have a visible hand.
  ├── Number of steps: 11
  ├── How long did this 

Build a vector database based on the metadata.jsonl

In [14]:
import json
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.schema import TextNode
import chromadb

In [15]:
# 1. Load your JSONL data (assuming same structure as original)
with open("metadata.jsonl", "r") as f:
    json_QA = [json.loads(line) for line in f]

In [16]:
# 2. Initialize ChromaDB
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("qa_documents")

In [17]:
# 3. Set up embeddings (same model as original)
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2")

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [20]:
# 4. Prepare nodes for indexing
nodes = []
for sample in json_QA:
    content = f"Question: {sample['Question']}\n\nFinal answer: {sample['Final answer']}"
    node = TextNode(
        text=content,
        metadata={
            "source": sample['task_id'],
            "level": sample['Level'],
            "final_answer": sample['Final answer'],
            "steps": sample['Annotator Metadata']['Steps'],
            "number_of_steps": sample['Annotator Metadata']['Number of steps'],
            "how_long_did_this_take": sample['Annotator Metadata']['How long did this take?'],
            "tools": sample['Annotator Metadata']['Tools'],
            "number_of_tools": sample['Annotator Metadata']['Number of tools'],
            # Add other metadata fields if needed
        },
        embedding=embed_model.get_text_embedding(content)  # Generate embedding
    )
    nodes.append(node)


In [23]:
# 5. Create and populate vector store
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex(
    nodes=nodes,
    embed_model=embed_model,
    vector_store=vector_store,
    store_nodes_override=True  # Important! Stores original text+metadata
)

In [30]:
# 6. Query 
query = "What is the final answer to the question about AI regulation?"
query_engine = index.as_query_engine(similarity_top_k=2)
response = query_engine.query(query)

# Extract and format the best match
best_match = response.source_nodes[0]

# Human-readable output
print("🔍 Query:")
print(f"{query}\n")
print("✅ Best Answer:")
print(f"{best_match.metadata.get('final_answer', 'N/A')}\n")
print("📝 Supporting Details:")
print(f"- Source ID: {best_match.metadata['source']}")
print(f"- Confidence Score: {best_match.score:.3f}")
print(f"- Steps Taken: {best_match.metadata.get('steps', 'N/A')}")
print(f"- Tools Used: {best_match.metadata.get('tools', 'N/A')}")

🔍 Query:
What is the final answer to the question about AI regulation?

✅ Best Answer:
Mapping Human Oriented Information to Software Agents for Online Systems Usage

📝 Supporting Details:
- Source ID: 46719c30-f4c3-4cad-be07-d5cb21eee6bb
- Confidence Score: 0.383
- Steps Taken: 1. Searched "Pie Menus or Linear Menus, Which Is Better?" on Google.
2. Opened "Pie Menus or Linear Menus, Which Is Better?" on https://oda.oslomet.no/oda-xmlui/handle/10642/3162.
3. Clicked each author's name.
4. Noted the name that had no other papers listed.
5. Searched "Murano, Pietro" on Google.
6. Opened http://www.pietromurano.org/.
7. Clicked "Publications".
8. Found the earliest paper he contributed to.
- Tools Used: 1. Web browser
2. Search engine


In [27]:
# list of the tools used in all the samples
from collections import Counter, OrderedDict

tools = []
for sample in json_QA:
    for tool in sample['Annotator Metadata']['Tools'].split('\n'):
        tool = tool[2:].strip().lower()
        if tool.startswith("("):
            tool = tool[11:].strip()
        tools.append(tool)
tools_counter = OrderedDict(Counter(tools))
print("List of tools used in all samples:")
print("Total number of tools used:", len(tools_counter))
for tool, count in tools_counter.items():
    print(f"  ├── {tool}: {count}")



List of tools used in all samples:
Total number of tools used: 83
  ├── web browser: 107
  ├── image recognition tools (to identify and parse a figure with three axes): 1
  ├── search engine: 101
  ├── calculator: 34
  ├── unlambda compiler (optional): 1
  ├── a web browser.: 2
  ├── a search engine.: 2
  ├── a calculator.: 1
  ├── microsoft excel: 5
  ├── google search: 1
  ├── ne: 9
  ├── pdf access: 7
  ├── file handling: 2
  ├── python: 3
  ├── image recognition tools: 12
  ├── jsonld file access: 1
  ├── video parsing: 1
  ├── python compiler: 1
  ├── video recognition tools: 3
  ├── pdf viewer: 7
  ├── microsoft excel / google sheets: 3
  ├── word document access: 1
  ├── tool to extract text from images: 1
  ├── a word reversal tool / script: 1
  ├── counter: 1
  ├── excel: 3
  ├── image recognition: 5
  ├── color recognition: 3
  ├── excel file access: 3
  ├── xml file access: 1
  ├── access to the internet archive, web.archive.org: 1
  ├── text processing/diff tool: 1
  ├── gi

System prompt

In [34]:
system_prompt = """
You are a helpful assistant tasked with answering questions using a set of tools.
If the tool is not available, you can try to find the information online. You can also use your own knowledge to answer the question. 
You need to provide a step-by-step explanation of how you arrived at the answer.
==========================
Here is a few examples showing you how to answer the question step by step.
"""

# Extract examples from our nodes (using first 3 as samples)
for i, node in enumerate(nodes[:3]):  # Using the nodes we created earlier
    # Parse the question and metadata from node content and attributes
    question = node.text.split("Question: ")[1].split("\n\nFinal answer:")[0]
    final_answer = node.text.split("Final answer: ")[1]
    
    system_prompt += f"""
Question {i+1}: {question}
Steps:
{node.metadata.get('steps', 'N/A')}
Tools:
{node.metadata.get('tools', 'N/A')}
Final Answer: {final_answer}
"""

system_prompt += "\n==========================\n"
system_prompt += "Now, please answer the following question step by step.\n"

# Save to file
with open('system_prompt.txt', 'w') as f:
    f.write(system_prompt)

print("System prompt generated successfully!")

System prompt generated successfully!


In [35]:
# load the system prompt from the file
with open('system_prompt.txt', 'r') as f:
    system_prompt = f.read()
print(system_prompt)


You are a helpful assistant tasked with answering questions using a set of tools.
If the tool is not available, you can try to find the information online. You can also use your own knowledge to answer the question. 
You need to provide a step-by-step explanation of how you arrived at the answer.
Here is a few examples showing you how to answer the question step by step.

Question 1: A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?
Steps:
1. Go to arxiv.org and navigate to the Advanced Search page.
2. Enter "AI regulation" in the search box and select "All fields" from the dropdown.
3. Enter 2022-06-01 and 2022-07-01 into the date inputs, select "Submission date (original)", and submit the search.
4. Go through the search results to find th

In [None]:
import os
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from typing import Annotated
from tavily import TavilyClient
import chromadb

In [49]:
# Load environment variables
load_dotenv()

True

In [59]:
# Initialize components
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("qa_documents")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store)

# Initialize LLM
#llm = OpenAI(model="gpt-4-turbo-preview")
llm = OpenAI(model="gpt-4o-mini")

In [51]:
# Load system prompt
with open('system_prompt.txt', 'r') as f:
    system_prompt = f.read()

In [56]:
from llama_index.core.tools import FunctionTool
from pydantic import BaseModel, Field

In [65]:
class MultiplyInput(BaseModel):
    a: int = Field(..., description="First number")
    b: int = Field(..., description="Second number")

# Define the function normally
def multiply(a: int, b: int) -> int:
    """Multiplies two numbers together."""
    return a * b

# Create tool using the new syntax
multiply_tool = FunctionTool.from_defaults(
    fn=multiply,
    name="multiply",
    description="Multiplies two numbers",
    fn_schema=MultiplyInput  # Adds input validation
)

tools = [multiply_tool]

In [66]:
agent = ReActAgent.from_tools(
    tools=tools,
    llm=llm,
    system_prompt=system_prompt,
    verbose=True
)

# 4. Usage remains identical
response = agent.chat("What's 12 multiplied by 5?")
print(response)

> Running step d49e47ff-a91f-4bdd-8b9d-7436ae1577d9. Step input: What's 12 multiplied by 5?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 12, 'b': 5}
[0m[1;3;34mObservation: 60
[0m> Running step dc2d6a7a-efa7-427f-894d-dd8ebf5d302b. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer.
Answer: 12 multiplied by 5 is 60.
[0m12 multiplied by 5 is 60.


In [85]:
# Define tools correctly
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

def subtract(a: int, b: int) -> int:
    """Subtract two numbers."""
    return a - b

def divide(a: int, b: int) -> int:
    """Divide two numbers."""
    if b == 0:
        raise ValueError("Cannot divide by zero")
    return a / b

def modulus(a: int, b: int) -> int:
    """Get modulus of two numbers."""
    return a % b

def similar_question_search(question: str) -> str:
    """Search for similar questions in vector database."""
    query_engine = index.as_query_engine(similarity_top_k=3)
    response = query_engine.query(question)
    return "\n\n".join([
        f"Question: {node.text.split('Question: ')[1].split('Final answer:')[0]}\n"
        f"Answer: {node.text.split('Final answer: ')[1]}\n"
        f"Source: {node.metadata['source']}"
        for node in response.source_nodes
    ])

def web_search(query: str) -> str:
    """Perform a web search using Tavily API.
    
    Args:
        query: Search query string
        
    Returns:
        Formatted string containing search results with titles and links
        or error message if search fails
    """
    try:
        client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))  # Get key from environment variables
        response = client.search(
            query=query,
            include_answer=True,       # Get direct answer if available
            search_depth="advanced",   # Better for academic/technical queries
            max_results=5              # Increase from default 3
        )
        
        # Format results
        results = []
        if response.get("answer"):
            results.append(f"Direct Answer: {response['answer']}")
            
        for result in response.get("results", []):
            results.append(
                f"Title: {result.get('title', 'N/A')}\n"
                f"Link: {result.get('url', 'N/A')}\n"
                f"Snippet: {result.get('content', 'N/A')}"
            )
            
        return "\n\n".join(results) if results else "No results found"
        
    except Exception as e:
        return f"Search failed: {str(e)}"


# Create tool instances
#multiply_tool = FunctionTool.from_defaults(fn=multiply, name="multiply")
#add_tool = FunctionTool.from_defaults(fn=add, name="add")
#subtract_tool = FunctionTool.from_defaults(fn=subtract, name="subtract")
#divide_tool = FunctionTool.from_defaults(fn=divide, name="divide")
#modulus_tool = FunctionTool.from_defaults(fn=modulus, name="modulus")
#search_tool = FunctionTool.from_defaults(fn=similar_question_search, name="similar_question_search")
#web_search_tool = FunctionTool.from_defaults(fn=web_search, name="web_search")

#tools = [
#    multiply_tool,
#    add_tool,
#    subtract_tool,
#    divide_tool,
#    modulus_tool,
#    search_tool,
#    web_search_tool
#]


math_tools = [
    FunctionTool.from_defaults(fn=multiply, name="multiply"),
    FunctionTool.from_defaults(fn=add, name="add"),
    FunctionTool.from_defaults(fn=subtract, name="subtract"),
    FunctionTool.from_defaults(fn=divide, name="divide"),
    FunctionTool.from_defaults(fn=modulus, name="modulus")
]

search_tools = [
    FunctionTool.from_defaults(fn=similar_question_search, name="similar_question_search"),
    FunctionTool.from_defaults(fn=web_search, name="web_search")
]



agent = ReActAgent.from_tools(
    tools=math_tools + search_tools,
    llm=llm,
    system_prompt=system_prompt,
    verbose=True,
    max_iterations=10  # Prevent infinite loops
)

In [86]:
# Example usage
response = agent.chat("What's 12 multiplied by 5?")
print(str(response))

> Running step c1c72895-472b-40df-957e-66757cbf622f. Step input: What's 12 multiplied by 5?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 12, 'b': 5}
[0m[1;3;34mObservation: 60
[0m> Running step 6961249e-70b6-400e-acbd-f1a9bc0db045. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer.
Answer: 12 multiplied by 5 is 60.
[0m12 multiplied by 5 is 60.


In [87]:
# For question answering:
question = "On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?"
response = agent.chat(question)
print(f"\nFinal Answer: {response}")
print(f"Sources: {response.sources}")

> Running step 54756943-8ce6-46d0-aaae-c8d8c8c450d6. Step input: On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: web_search
Action Input: {'query': 'June 6 2023 Carolyn Collins Petersen Universe Today article NASA award number R. G. Arendt'}
[0m[1;3;34mObservation: Direct Answer: Carolyn Collins Petersen's June 6, 2023, article in Universe Today linked to a NASA award for R. G. Arendt's work. The exact award number is not provided here. Further details require the article itself.

Title: [PDF] INTERNET OF AGENTS: WEAVING A WEB OF HET - OpenReview
Link: https://openreview.net/pdf/1006483e763807a740f78d00968