Olaemad's picture
Update README.md
e19b252 verified
---
title: web search MCP-server
sdk: gradio
colorFrom: green
colorTo: green
short_description: MCP server for general and custom search on web
sdk_version: 5.34.0
tags:
- mcp-server-track
app_file: app.py
pinned: true
---
# Search Tool
## Overview
**Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.
## Demo video
Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing
![Demo](./demo.webm)
## Features
- **Custom Site Search:** Search within a specified list of websites.
- **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`).
- **General Web Search:** Perform open web searches.
- **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/).
- **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models.
- **Validation:** Ensures URLs are valid before processing.
- **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers.
## Project Structure
```
search_tool/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ analyzer/ # AI-powered analyzers (e.g., OpenAI)
β”‚ β”œβ”€β”€ core/
β”‚ β”‚ β”œβ”€β”€ factory/ # Factories for searcher, scraper,
β”‚ β”‚ β”œβ”€β”€ interface/ # Abstract interfaces for extensibility
β”‚ β”‚ └── types.py # Enums and constants
β”‚ β”œβ”€β”€ mcp_servers/ # MCP server integration
β”‚ β”œβ”€β”€ models/ # Pydantic models for data validation
β”‚ β”œβ”€β”€ scraper/ # Web scrapers (e.g., Trafilatura)
β”‚ β”œβ”€β”€ searcher/ # Search engine integrations
β”‚ β”œβ”€β”€ tools/ # User-facing tool functions
β”‚ └── utils/ # Utility functions (e.g., URL validation)
β”œβ”€β”€ test.py # Example/test script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
β”œβ”€β”€ .env # Environment variables (e.g., API keys)
└── README.md # Project documentation
```
## Installation
1. **Clone the repository:**
```sh
git clone https://github.com/ola172/web-search-mcp-server.git
cd search_tool
```
2. **Set up a virtual environment (recommended):**
```sh
python3 -m venv .venv
source .venv/bin/activate
```
3. **Install dependencies:**
```sh
pip install -r requirements.txt
```
4. **Configure environment variables:**
- Copy `.env.example` to `.env`
- Add your secrets:
## Usage
### Core Tools
Each tool validates input, performs the search, scrapes the results, and analyzes the content.
- **General Web Search:** `search_on_web`
- **Custom Sites Search:** `search_custom_sites`
- **Custom Domains Search:** `search_custom_domain`
### MCP Server Integration
The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools.
## Extending the Framework
- **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`.
- **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`.
- **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`.
## Configuration
- **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file.
- **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules.
## Dependencies
- `openai`
- `trafilatura`
- `pydantic`
- `googlesearch-python`
- `python-dotenv`
- `google-api-python-client`
See `requirements.txt` for the full list.
## License
This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.
## Acknowledgements
- OpenAI
- Trafilatura
- Google Custom Search
For questions or contributions, please open an issue or pull request.