{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "is5mcO3am-g2" }, "outputs": [], "source": [ "# Installation\n", "! pip install smolagents\n", "# To install from source instead of the last release, comment the command above and uncomment the following one.\n", "# ! pip install git+https://github.com/huggingface/smolagents.git" ] }, { "cell_type": "markdown", "metadata": { "id": "QmOAmqNSm-g3" }, "source": [ "# Web Browser Automation with Agents šŸ¤–šŸŒ" ] }, { "cell_type": "markdown", "metadata": { "id": "8KNjj3xWm-g3" }, "source": [ "In this notebook, we'll create an **agent-powered web browser automation system**! This system can navigate websites, interact with elements, and extract information automatically.\n", "\n", "The agent will be able to:\n", "\n", "- [x] Navigate to web pages\n", "- [x] Click on elements\n", "- [x] Search within pages\n", "- [x] Handle popups and modals\n", "- [x] Extract information\n", "\n", "Let's set up this system step by step!\n", "\n", "First, run these lines to install the required dependencies:\n", "\n", "```bash\n", "pip install smolagents selenium helium pillow -q\n", "```\n", "\n", "Let's import our required libraries and set up environment variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nAKrZrxWm-g4" }, "outputs": [], "source": [ "from io import BytesIO\n", "from time import sleep\n", "\n", "import helium\n", "from dotenv import load_dotenv\n", "from PIL import Image\n", "from selenium import webdriver\n", "from selenium.webdriver.common.by import By\n", "from selenium.webdriver.common.keys import Keys\n", "\n", "from smolagents import CodeAgent, tool\n", "from smolagents.agents import ActionStep\n", "\n", "# Load environment variables\n", "load_dotenv()" ] }, { "cell_type": "markdown", "metadata": { "id": "wyEvPcbWm-g4" }, "source": [ "Now let's create our core browser interaction tools that will allow our agent to navigate and interact with web pages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aFQ0HEwSm-g4" }, "outputs": [], "source": [ "@tool\n", "def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:\n", " \"\"\"\n", " Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.\n", " Args:\n", " text: The text to search for\n", " nth_result: Which occurrence to jump to (default: 1)\n", " \"\"\"\n", " elements = driver.find_elements(By.XPATH, f\"//*[contains(text(), '{text}')]\")\n", " if nth_result > len(elements):\n", " raise Exception(f\"Match n°{nth_result} not found (only {len(elements)} matches found)\")\n", " result = f\"Found {len(elements)} matches for '{text}'.\"\n", " elem = elements[nth_result - 1]\n", " driver.execute_script(\"arguments[0].scrollIntoView(true);\", elem)\n", " result += f\"Focused on element {nth_result} of {len(elements)}\"\n", " return result\n", "\n", "@tool\n", "def go_back() -> None:\n", " \"\"\"Goes back to previous page.\"\"\"\n", " driver.back()\n", "\n", "@tool\n", "def close_popups() -> str:\n", " \"\"\"\n", " Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows!\n", " This does not work on cookie consent banners.\n", " \"\"\"\n", " webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()" ] }, { "cell_type": "markdown", "metadata": { "id": "YdEVE-Nlm-g5" }, "source": [ "Let's set up our browser with Chrome and configure screenshot capabilities:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5WBUdv5km-g5" }, "outputs": [], "source": [ "# Configure Chrome options\n", "chrome_options = webdriver.ChromeOptions()\n", "chrome_options.add_argument(\"--force-device-scale-factor=1\")\n", "chrome_options.add_argument(\"--window-size=1000,1350\")\n", "chrome_options.add_argument(\"--disable-pdf-viewer\")\n", "chrome_options.add_argument(\"--window-position=0,0\")\n", "\n", "# Initialize the browser\n", "driver = helium.start_chrome(headless=False, options=chrome_options)\n", "\n", "# Set up screenshot callback\n", "def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None:\n", " sleep(1.0) # Let JavaScript animations happen before taking the screenshot\n", " driver = helium.get_driver()\n", " current_step = memory_step.step_number\n", " if driver is not None:\n", " for previous_memory_step in agent.memory.steps: # Remove previous screenshots for lean processing\n", " if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2:\n", " previous_memory_step.observations_images = None\n", " png_bytes = driver.get_screenshot_as_png()\n", " image = Image.open(BytesIO(png_bytes))\n", " print(f\"Captured a browser screenshot: {image.size} pixels\")\n", " memory_step.observations_images = [image.copy()] # Create a copy to ensure it persists\n", "\n", " # Update observations with current URL\n", " url_info = f\"Current url: {driver.current_url}\"\n", " memory_step.observations = (\n", " url_info if memory_step.observations is None else memory_step.observations + \"\\n\" + url_info\n", " )" ] }, { "cell_type": "markdown", "metadata": { "id": "kcFffSVJm-g5" }, "source": [ "Now let's create our web automation agent:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XdQciC-tm-g5" }, "outputs": [], "source": [ "from smolagents import InferenceClientModel\n", "\n", "# Initialize the model\n", "model_id = \"meta-llama/Llama-3.3-70B-Instruct\" # You can change this to your preferred model\n", "model = InferenceClientModel(model_id=model_id)\n", "\n", "# Create the agent\n", "agent = CodeAgent(\n", " tools=[go_back, close_popups, search_item_ctrl_f],\n", " model=model,\n", " additional_authorized_imports=[\"helium\"],\n", " step_callbacks=[save_screenshot],\n", " max_steps=20,\n", " verbosity_level=2,\n", ")\n", "\n", "# Import helium for the agent\n", "agent.python_executor(\"from helium import *\", agent.state)" ] }, { "cell_type": "markdown", "metadata": { "id": "4IVaEozSm-g6" }, "source": [ "The agent needs instructions on how to use Helium for web automation. Here are the instructions we'll provide:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VlvuP6rem-g6" }, "outputs": [], "source": [ "helium_instructions = \"\"\"\n", "You can use helium to access websites. Don't bother about the helium driver, it's already managed.\n", "We've already ran \"from helium import *\"\n", "Then you can go to pages!\n", "Code:\n", "go_to('github.com/trending')\n", "```\n", "\n", "You can directly click clickable elements by inputting the text that appears on them.\n", "Code:\n", "click(\"Top products\")\n", "```\n", "\n", "If it's a link:\n", "Code:\n", "click(Link(\"Top products\"))\n", "```\n", "\n", "If you try to interact with an element and it's not found, you'll get a LookupError.\n", "In general stop your action after each button click to see what happens on your screenshot.\n", "Never try to login in a page.\n", "\n", "To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from.\n", "Code:\n", "scroll_down(num_pixels=1200) # This will scroll one viewport down\n", "```\n", "\n", "When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails).\n", "Just use your built-in tool `close_popups` to close them:\n", "Code:\n", "close_popups()\n", "```\n", "\n", "You can use .exists() to check for the existence of an element. For example:\n", "Code:\n", "if Text('Accept cookies?').exists():\n", " click('I accept')\n", "```\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "id": "FBr7mYAAm-g6" }, "source": [ "Now we can run our agent with a task! Let's try finding information on Wikipedia:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OvRfT4oWm-g6" }, "outputs": [], "source": [ "search_request = \"\"\"\n", "Please navigate to https://en.wikipedia.org/wiki/Chicago and give me a sentence containing the word \"1992\" that mentions a construction accident.\n", "\"\"\"\n", "\n", "agent_output = agent.run(search_request + helium_instructions)\n", "print(\"Final output:\")\n", "print(agent_output)" ] }, { "cell_type": "markdown", "metadata": { "id": "Utp3NWsBm-g6" }, "source": [ "You can run different tasks by modifying the request. For example, here's for me to know if I should work harder:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "S7z5mWCQm-g6" }, "outputs": [], "source": [ "github_request = \"\"\"\n", "I'm trying to find how hard I have to work to get a repo in github.com/trending.\n", "Can you navigate to the profile for the top author of the top trending repo, and give me their total number of commits over the last year?\n", "\"\"\"\n", "\n", "agent_output = agent.run(github_request + helium_instructions)\n", "print(\"Final output:\")\n", "print(agent_output)" ] }, { "cell_type": "markdown", "metadata": { "id": "3E520frtm-g6" }, "source": [ "The system is particularly effective for tasks like:\n", "- Data extraction from websites\n", "- Web research automation\n", "- UI testing and verification\n", "- Content monitoring" ] } ], "metadata": { "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }