web

Running

web / README.md

fetch: add private-host allowlist + env switch; include resolved IPs in error and recheck redirects; document env vars in README

f71c1c7 10 days ago

preview code

raw

history blame contribute delete

5.92 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: Web MCP
emoji: 🔎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search & fetch the web with per-tool analytics
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png

Web MCP Server

A Model Context Protocol (MCP) server that exposes two composable tools—search (Serper metadata) and fetch (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor.

Highlights

Dual MCP tools with shared rate limiting (360 requests/hour) and structured JSON responses.
Daily analytics split by tool: the Analytics tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days.
Persistent request counters keyed by UTC date and tool: {"YYYY-MM-DD": {"search": n, "fetch": m}}, with automatic migration from legacy totals.
Pluggable storage: respects ANALYTICS_DATA_DIR, otherwise falls back to /data (if writable) or ./data for local development.
Ready-to-serve Gradio app with MCP endpoints exposed via gr.api for direct client consumption.

Requirements

Python 3.8 or newer.
Serper API key (SERPER_API_KEY) with access to the Search and News endpoints.
Dependencies listed in requirements.txt, including filelock and pandas for analytics storage.

Install everything with:

pip install -r requirements.txt

Configuration

Export your Serper API key:
```
export SERPER_API_KEY="your-api-key"
```
(Optional) Override the analytics storage path:
```
export ANALYTICS_DATA_DIR="/path/to/persistent/storage"
```
If unset, the app automatically prefers /data when available, otherwise ./data.
(Optional) Control private/local address policy for fetch:
- FETCH_ALLOW_PRIVATE — set to 1/true to disable the SSRF guard entirely (not recommended except for trusted, local testing).
- FETCH_PRIVATE_ALLOWLIST — comma/space separated host patterns allowed even if they resolve to private/local IPs, e.g.:
```
export FETCH_PRIVATE_ALLOWLIST="*.corp.local, my-proxy.internal"
```
If neither is set, the fetcher refuses URLs whose host resolves to private, loopback, link‑local, multicast, reserved, or unspecified addresses. It also re-checks the final redirect target.

The request counters live in <DATA_DIR>/request_counts.json, guarded by a file lock to support concurrent MCP calls.

Running Locally

Launch the Gradio server (with MCP support enabled) via:

python app.py

This starts a local UI at http://localhost:7860 and exposes the MCP SSE endpoint at http://localhost:7860/gradio_api/mcp/sse.

Connecting From MCP Clients

Claude Desktop – update claude_desktop_config.json:

{
  "mcpServers": {
    "web-search": {
      "command": "python",
      "args": ["/absolute/path/to/app.py"],
      "env": {
        "SERPER_API_KEY": "your-api-key"
      }
    }
  }
}

URL-based MCP clients – run python app.py, then point the client to http://localhost:7860/gradio_api/mcp/sse.

Tool Reference

`search`

Purpose: Retrieve metadata-only results from Serper (general web or news).
Inputs:
- query (str, required) – search terms.
- search_type ("search" | "news", default "search") – switch to news for recency-focused results.
- num_results (int, default 4, range 1–20) – number of hits to return.
Output: JSON containing the query echo, result count, timing, and an array of entries with position, title, link, domain, and optional source/date for news.

`fetch`

Purpose: Download a single URL and extract the readable article text via Trafilatura.
Inputs:
- url (str, required) – must start with http:// or https://.
- timeout (int, default 20 seconds) – client timeout for the HTTP request.
Output: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned content, and duration.

Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic.

Analytics Dashboard

Open the Analytics tab in the Gradio UI to inspect daily activity:

Daily Search Count (left column) – bar chart for the past 14 days of search tool requests.
Daily Fetch Count (right column) – bar chart for the past 14 days of fetch tool requests.
Tooltips reveal the display label (e.g., Sep 17), raw count, and ISO date key.

Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write.

Rate Limiting & Error Handling

Global moving-window limit of 360 requests per hour shared across both tools (powered by limits).
Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments.

Troubleshooting

SERPER_API_KEY is not set – export the key in the environment where the server runs.
Rate limit exceeded – pause requests or reduce client concurrency.
Empty extraction – some sites block bots; try another URL.
Storage permissions – ensure the chosen data directory is writable; adjust ANALYTICS_DATA_DIR if necessary.

Licensing & Contributions

Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.