{ "cells": [ { "cell_type": "markdown", "id": "1e417d15", "metadata": {}, "source": [ "# **Explore GAIA Questions Data**\n", "\n", "Explore the `metadata.jsonl` file in order to gain a deeper comprehension of the dataset." ] }, { "cell_type": "markdown", "id": "8a696d11", "metadata": {}, "source": [ "#### **Imports**" ] }, { "cell_type": "code", "execution_count": 183, "id": "d3e11d83", "metadata": {}, "outputs": [], "source": [ "import os\n", "import re\n", "import json\n", "import random\n", "import psycopg2\n", "import pandas as pd\n", "from collections import Counter, OrderedDict\n", "\n", "from dotenv import load_dotenv\n", "from huggingface_hub import login\n", "\n", "from langchain.schema import Document\n", "from langchain_community.retrievers import BM25Retriever\n", "from langchain.tools import Tool, StructuredTool\n", "from langchain_core.tools import tool\n", "from langchain_huggingface import HuggingFaceEmbeddings\n", "from langchain_community.vectorstores import SupabaseVectorStore\n", "\n", "from supabase import Client, create_client\n", "from supabase.client import ClientOptions" ] }, { "cell_type": "code", "execution_count": 194, "id": "17734566", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of QAs: 165\n" ] }, { "data": { "text/plain": [ "{'task_id': 'c61d22de-5f6c-4958-a7f6-5e9707bd3466',\n", " 'Question': 'A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?',\n", " 'Level': 2,\n", " 'Final answer': 'egalitarian',\n", " 'file_name': '',\n", " 'Annotator Metadata': {'Steps': '1. Go to arxiv.org and navigate to the Advanced Search page.\\n2. Enter \"AI regulation\" in the search box and select \"All fields\" from the dropdown.\\n3. Enter 2022-06-01 and 2022-07-01 into the date inputs, select \"Submission date (original)\", and submit the search.\\n4. Go through the search results to find the article that has a figure with three axes and labels on each end of the axes, titled \"Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation\".\\n5. Note the six words used as labels: deontological, egalitarian, localized, standardized, utilitarian, and consequential.\\n6. Go back to arxiv.org\\n7. Find \"Physics and Society\" and go to the page for the \"Physics and Society\" category.\\n8. Note that the tag for this category is \"physics.soc-ph\".\\n9. Go to the Advanced Search page.\\n10. Enter \"physics.soc-ph\" in the search box and select \"All fields\" from the dropdown.\\n11. Enter 2016-08-11 and 2016-08-12 into the date inputs, select \"Submission date (original)\", and submit the search.\\n12. Search for instances of the six words in the results to find the paper titled \"Phase transition from egalitarian to hierarchical societies driven by competition between cognitive and social constraints\", indicating that \"egalitarian\" is the correct answer.',\n", " 'Number of steps': '12',\n", " 'How long did this take?': '8 minutes',\n", " 'Tools': '1. Web browser\\n2. Image recognition tools (to identify and parse a figure with three axes)',\n", " 'Number of tools': '2'}}" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open(\"metadata.jsonl\") as dataset_file:\n", " json_list = list(dataset_file)\n", "\n", "QAs = [json.loads(qa) for qa in json_list]\n", "print(f\"Number of QAs: {len(QAs)}\")\n", "QAs[0]" ] }, { "cell_type": "code", "execution_count": 89, "id": "40328df2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TaskId: 7a4a336d-dcfa-45a0-b014-824c7619e8de\n", "Level: 2\n", "Question: At the two-minute mark in the YouTube video uploaded by the channel “GameGrumps” on May 14, 2017 as part of their playthrough of the game Mario Kart 8 Deluxe, the shows’ hosts are competing on one of the game’s racetracks. What was the world record time for that track in the game’s 150cc mode as of June 7, 2023? Express your answer in minutes and seconds, rounding the seconds to the nearest hundredth, e.g. 1:01.001.\n", "Ground Truth: 1:41.614\n", "Additional file: \n", "Annotator Metadata:\n", " - Steps:\n", " 1. Search the web for “gamegrumps mario kart 8 deluxe may 14 2017”.\n", " 2. Click on the YouTube video result.\n", " 3. Navigate to two minutes into the video.\n", " 4. Scroll further back until I see the name of the racecourse, Yoshi Circuit.\n", " 5. Search the web for “mario kart 8 deluxe yoshi circuit world record 150cc”\n", " 6. Scroll down until I find a reliable world record listing site.\n", " 7. Navigate through the site until I find the record that meets the specified criteria.\n", " 8. Read the date the record was set to confirm that it applies to the question’s specified date.\n", " - Number of steps: 8\n", " - How long did this take: 5-10 minutes\n", " - Tools [4]:\n", " 1. Search engine\n", " 2. Web browser\n", " 3. YouTube\n", " 4. OCR\n", "- Number of tools: 4\n" ] } ], "source": [ "random_samples = random.sample(QAs, 1)\n", "for samp in random_samples:\n", " print(\n", " f\"TaskId: {samp['task_id']}\\nLevel: {samp['Level']}\\n\"\n", " f\"Question: {samp['Question']}\\nGround Truth: {samp['Final answer']}\\n\"\n", " f\"Additional file: {samp['file_name']}\"\n", " )\n", " print(\"Annotator Metadata:\")\n", " print(\" - Steps:\")\n", " metadata = samp['Annotator Metadata']\n", " steps = metadata['Steps'].split(\"\\n\")\n", " for step in steps:\n", " print(f\" {step}\")\n", " print(f\" - Number of steps: {metadata['Number of steps']}\")\n", " print(f\" - How long did this take: {metadata['How long did this take?']}\")\n", " tools = metadata['Tools'].split(\"\\n\")\n", " print(f\" - Tools [{len(tools)}]:\")\n", " for t in tools:\n", " print(f\" {t}\")\n", " print(f\"- Number of tools: {metadata['Number of tools']}\")\n" ] }, { "cell_type": "markdown", "id": "52ca1954", "metadata": {}, "source": [ "As we can see, the `Dataset` contains:\n", "\n", "- **task_id** : The unique identifier for the task\n", "\n", "- **Level** : Difficulty level of the GAIA task\n", "\n", "- **Question** : The specific GAIA task\n", "\n", "- **Final answer** : The ground truth for the GAIA task\n", "\n", "- **file_name** : The additional file related to the task\n", "\n", "- **Annotator Metadata** : \n", "\n", " - **Steps** : The **sequence** of steps followed to accomplish the correct answer\n", "\n", " - **Number of steps** : Total number of steps to accomplish the correct answer\n", "\n", " - **Tools** : The list of `tools` used to answer the question/task\n", "\n", " - **Number of tools** : Total number of tools used" ] }, { "cell_type": "markdown", "id": "ccc5f181", "metadata": {}, "source": [ "**GAIA Agent** must be an `Agentic RAG`. This way the agent will be able to combine retrieval system, accessing the QAs `dataset`." ] }, { "cell_type": "markdown", "id": "bfe371b8", "metadata": {}, "source": [ "#### **Explore Dataset Tools Types**\n", "\n", "Since the *`dataset`* provides for each question a list of `Tools` used to reaching the final answer, it is useful to explore these tools in order to define an efficient and relevant set of tools for our agent to incorporate:" ] }, { "cell_type": "code", "execution_count": 169, "id": "f470a028", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of Tools used in entire set: 55\n", "Tools used in QAs:\n" ] }, { "data": { "text/html": [ "
\n", " | Tool | \n", "Count | \n", "
---|---|---|
0 | \n", "SEARCH ENGINE | \n", "35 | \n", "
1 | \n", "CALCULATOR | \n", "33 | \n", "
2 | \n", "WEB BROWSER | \n", "12 | \n", "
3 | \n", "NE | \n", "9 | \n", "
4 | \n", "IMAGE RECOGNITION TOOLS | \n", "8 | \n", "
5 | \n", "PDF VIEWER | \n", "6 | \n", "
6 | \n", "A CALCULATOR | \n", "5 | \n", "
7 | \n", "OCR | \n", "3 | \n", "
8 | \n", "VIDEO RECOGNITION TOOLS | \n", "3 | \n", "
9 | \n", "MICROSOFT EXCEL | \n", "2 | \n", "
10 | \n", "PDF ACCESS | \n", "2 | \n", "
11 | \n", "MICROSOFT EXCEL / GOOGLE SHEETS | \n", "2 | \n", "
12 | \n", "IMAGE RECOGNITION | \n", "2 | \n", "
13 | \n", "A SPEECH-TO-TEXT TOOL | \n", "2 | \n", "
14 | \n", "A SEARCH ENGINE | \n", "1 | \n", "
15 | \n", "IMAGE RECOGNITION/OCR | \n", "1 | \n", "
16 | \n", "GOOGLE MAPS | \n", "1 | \n", "
17 | \n", "SPREADSHEET EDITOR | \n", "1 | \n", "
18 | \n", "TOOLS REQUIRED | \n", "1 | \n", "
19 | \n", "B BROWSER | \n", "1 | \n", "