Spaces:

Agents-MCP-Hackathon
/

search-web-MCP-server

Running

App Files Files Community

search-web-MCP-server / README.md

Olaemad

Update README.md

e19b252 verified 2 months ago

preview code

raw

history blame contribute delete

4.19 kB

	---
	title: web search MCP-server
	sdk: gradio
	colorFrom: green
	colorTo: green
	short_description: MCP server for general and custom search on web
	sdk_version: 5.34.0
	tags:
	- mcp-server-track
	app_file: app.py
	pinned: true

	---
	# Search Tool

	## Overview

	Search Tool is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.

	## Demo video
	Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing
	![Demo](./demo.webm)

	## Features

	- Custom Site Search: Search within a specified list of websites.
	- Custom Domain Search: Restrict searches to specific domains (e.g., `.edu`, `.gov`).
	- General Web Search: Perform open web searches.
	- Content Scraping: Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/).
	- AI Analysis: Summarizes and analyzes scraped content using OpenAI models.
	- Validation: Ensures URLs are valid before processing.
	- Extensible Architecture: Easily add new searchers, scrapers, or analyzers.

	## Project Structure

	```
	search_tool/
	├── src/
	│ ├── analyzer/ # AI-powered analyzers (e.g., OpenAI)
	│ ├── core/
	│ │ ├── factory/ # Factories for searcher, scraper,
	│ │ ├── interface/ # Abstract interfaces for extensibility
	│ │ └── types.py # Enums and constants
	│ ├── mcp_servers/ # MCP server integration
	│ ├── models/ # Pydantic models for data validation
	│ ├── scraper/ # Web scrapers (e.g., Trafilatura)
	│ ├── searcher/ # Search engine integrations
	│ ├── tools/ # User-facing tool functions
	│ └── utils/ # Utility functions (e.g., URL validation)
	├── test.py # Example/test script
	├── requirements.txt # Python dependencies
	├── pyproject.toml # Project metadata and dependencies
	├── .env # Environment variables (e.g., API keys)
	└── README.md # Project documentation
	```

	## Installation

	1. Clone the repository:
	```sh
	git clone https://github.com/ola172/web-search-mcp-server.git
	cd search_tool
	```

	2. Set up a virtual environment (recommended):
	```sh
	python3 -m venv .venv
	source .venv/bin/activate
	```

	3. Install dependencies:
	```sh
	pip install -r requirements.txt
	```

	4. Configure environment variables:
	- Copy `.env.example` to `.env`
	- Add your secrets:

	## Usage

	### Core Tools

	Each tool validates input, performs the search, scrapes the results, and analyzes the content.

	- General Web Search: `search_on_web`
	- Custom Sites Search: `search_custom_sites`
	- Custom Domains Search: `search_custom_domain`

	### MCP Server Integration

	The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools.

	## Extending the Framework

	- Add a new searcher: Implement the `SearchInterface` and register it in `SearcherFactory`.
	- Add a new scraper: Implement the `ScraperInterface` and register it in `ScraperFactory`.
	- Add a new analyzer: Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`.

	## Configuration

	- API Keys: Store sensitive keys (e.g., OpenAI) in the `.env` file.
	- Search Engine IDs: For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules.

	## Dependencies

	- `openai`
	- `trafilatura`
	- `pydantic`
	- `googlesearch-python`
	- `python-dotenv`
	- `google-api-python-client`

	See `requirements.txt` for the full list.

	## License

	This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.

	## Acknowledgements

	- OpenAI
	- Trafilatura
	- Google Custom Search

	For questions or contributions, please open an issue or pull request.