|
--- |
|
title: web search MCP-server |
|
sdk: gradio |
|
colorFrom: green |
|
colorTo: green |
|
short_description: MCP server for general and custom search on web |
|
sdk_version: 5.34.0 |
|
tags: |
|
- mcp-server-track |
|
app_file: app.py |
|
pinned: true |
|
|
|
--- |
|
# Search Tool |
|
|
|
## Overview |
|
|
|
**Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers. |
|
|
|
## Demo video |
|
Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing |
|
 |
|
|
|
## Features |
|
|
|
- **Custom Site Search:** Search within a specified list of websites. |
|
- **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`). |
|
- **General Web Search:** Perform open web searches. |
|
- **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/). |
|
- **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models. |
|
- **Validation:** Ensures URLs are valid before processing. |
|
- **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers. |
|
|
|
## Project Structure |
|
|
|
``` |
|
search_tool/ |
|
βββ src/ |
|
β βββ analyzer/ # AI-powered analyzers (e.g., OpenAI) |
|
β βββ core/ |
|
β β βββ factory/ # Factories for searcher, scraper, |
|
β β βββ interface/ # Abstract interfaces for extensibility |
|
β β βββ types.py # Enums and constants |
|
β βββ mcp_servers/ # MCP server integration |
|
β βββ models/ # Pydantic models for data validation |
|
β βββ scraper/ # Web scrapers (e.g., Trafilatura) |
|
β βββ searcher/ # Search engine integrations |
|
β βββ tools/ # User-facing tool functions |
|
β βββ utils/ # Utility functions (e.g., URL validation) |
|
βββ test.py # Example/test script |
|
βββ requirements.txt # Python dependencies |
|
βββ pyproject.toml # Project metadata and dependencies |
|
βββ .env # Environment variables (e.g., API keys) |
|
βββ README.md # Project documentation |
|
``` |
|
|
|
## Installation |
|
|
|
1. **Clone the repository:** |
|
```sh |
|
git clone https://github.com/ola172/web-search-mcp-server.git |
|
cd search_tool |
|
``` |
|
|
|
2. **Set up a virtual environment (recommended):** |
|
```sh |
|
python3 -m venv .venv |
|
source .venv/bin/activate |
|
``` |
|
|
|
3. **Install dependencies:** |
|
```sh |
|
pip install -r requirements.txt |
|
``` |
|
|
|
4. **Configure environment variables:** |
|
- Copy `.env.example` to `.env` |
|
- Add your secrets: |
|
|
|
## Usage |
|
|
|
### Core Tools |
|
|
|
Each tool validates input, performs the search, scrapes the results, and analyzes the content. |
|
|
|
- **General Web Search:** `search_on_web` |
|
- **Custom Sites Search:** `search_custom_sites` |
|
- **Custom Domains Search:** `search_custom_domain` |
|
|
|
### MCP Server Integration |
|
|
|
The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools. |
|
|
|
## Extending the Framework |
|
|
|
- **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`. |
|
- **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`. |
|
- **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`. |
|
|
|
## Configuration |
|
|
|
- **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file. |
|
- **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules. |
|
|
|
## Dependencies |
|
|
|
- `openai` |
|
- `trafilatura` |
|
- `pydantic` |
|
- `googlesearch-python` |
|
- `python-dotenv` |
|
- `google-api-python-client` |
|
|
|
See `requirements.txt` for the full list. |
|
|
|
## License |
|
|
|
This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used. |
|
|
|
## Acknowledgements |
|
|
|
- OpenAI |
|
- Trafilatura |
|
- Google Custom Search |
|
|
|
For questions or contributions, please open an issue or pull request. |