--- title: Web MCP emoji: 🔎 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.36.2 app_file: app.py pinned: false short_description: Search & fetch the web with per-tool analytics thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png --- # Web MCP Server A Model Context Protocol (MCP) server that exposes two composable tools—`search` (Serper metadata) and `fetch` (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor. ## Highlights - Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses. - Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days. - Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals. - Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development. - Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption. ## Requirements - Python 3.8 or newer. - Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints. - Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage. Install everything with: ```bash pip install -r requirements.txt ``` ## Configuration 1. Export your Serper API key: ```bash export SERPER_API_KEY="your-api-key" ``` 2. (Optional) Override the analytics storage path: ```bash export ANALYTICS_DATA_DIR="/path/to/persistent/storage" ``` If unset, the app automatically prefers `/data` when available, otherwise `./data`. 3. (Optional) Control private/local address policy for `fetch`: - `FETCH_ALLOW_PRIVATE` — set to `1`/`true` to disable the SSRF guard entirely (not recommended except for trusted, local testing). - `FETCH_PRIVATE_ALLOWLIST` — comma/space separated host patterns allowed even if they resolve to private/local IPs, e.g.: ```bash export FETCH_PRIVATE_ALLOWLIST="*.corp.local, my-proxy.internal" ``` If neither is set, the fetcher refuses URLs whose host resolves to private, loopback, link‑local, multicast, reserved, or unspecified addresses. It also re-checks the final redirect target. The request counters live in `/request_counts.json`, guarded by a file lock to support concurrent MCP calls. ## Running Locally Launch the Gradio server (with MCP support enabled) via: ```bash python app.py ``` This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`. ### Connecting From MCP Clients - **Claude Desktop** – update `claude_desktop_config.json`: ```json { "mcpServers": { "web-search": { "command": "python", "args": ["/absolute/path/to/app.py"], "env": { "SERPER_API_KEY": "your-api-key" } } } } ``` - **URL-based MCP clients** – run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`. ## Tool Reference ### `search` - **Purpose**: Retrieve metadata-only results from Serper (general web or news). - **Inputs**: - `query` *(str, required)* – search terms. - `search_type` *("search" | "news", default "search")* – switch to `news` for recency-focused results. - `num_results` *(int, default 4, range 1–20)* – number of hits to return. - **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news. ### `fetch` - **Purpose**: Download a single URL and extract the readable article text via Trafilatura. - **Inputs**: - `url` *(str, required)* – must start with `http://` or `https://`. - `timeout` *(int, default 20 seconds)* – client timeout for the HTTP request. - **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration. Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic. ## Analytics Dashboard Open the **Analytics** tab in the Gradio UI to inspect daily activity: - **Daily Search Count** (left column) – bar chart for the past 14 days of `search` tool requests. - **Daily Fetch Count** (right column) – bar chart for the past 14 days of `fetch` tool requests. - Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key. Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write. ## Rate Limiting & Error Handling - Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`). - Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments. ## Troubleshooting - **`SERPER_API_KEY is not set`** – export the key in the environment where the server runs. - **`Rate limit exceeded`** – pause requests or reduce client concurrency. - **Empty extraction** – some sites block bots; try another URL. - **Storage permissions** – ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary. ## Licensing & Contributions Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.