File size: 4,189 Bytes
240e7aa 7952072 e19b252 240e7aa 12f9530 d7d254e 4a6206f d7d254e 12f9530 240e7aa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
title: web search MCP-server
sdk: gradio
colorFrom: green
colorTo: green
short_description: MCP server for general and custom search on web
sdk_version: 5.34.0
tags:
- mcp-server-track
app_file: app.py
pinned: true
---
# Search Tool
## Overview
**Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.
## Demo video
Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing

## Features
- **Custom Site Search:** Search within a specified list of websites.
- **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`).
- **General Web Search:** Perform open web searches.
- **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/).
- **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models.
- **Validation:** Ensures URLs are valid before processing.
- **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers.
## Project Structure
```
search_tool/
βββ src/
β βββ analyzer/ # AI-powered analyzers (e.g., OpenAI)
β βββ core/
β β βββ factory/ # Factories for searcher, scraper,
β β βββ interface/ # Abstract interfaces for extensibility
β β βββ types.py # Enums and constants
β βββ mcp_servers/ # MCP server integration
β βββ models/ # Pydantic models for data validation
β βββ scraper/ # Web scrapers (e.g., Trafilatura)
β βββ searcher/ # Search engine integrations
β βββ tools/ # User-facing tool functions
β βββ utils/ # Utility functions (e.g., URL validation)
βββ test.py # Example/test script
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project metadata and dependencies
βββ .env # Environment variables (e.g., API keys)
βββ README.md # Project documentation
```
## Installation
1. **Clone the repository:**
```sh
git clone https://github.com/ola172/web-search-mcp-server.git
cd search_tool
```
2. **Set up a virtual environment (recommended):**
```sh
python3 -m venv .venv
source .venv/bin/activate
```
3. **Install dependencies:**
```sh
pip install -r requirements.txt
```
4. **Configure environment variables:**
- Copy `.env.example` to `.env`
- Add your secrets:
## Usage
### Core Tools
Each tool validates input, performs the search, scrapes the results, and analyzes the content.
- **General Web Search:** `search_on_web`
- **Custom Sites Search:** `search_custom_sites`
- **Custom Domains Search:** `search_custom_domain`
### MCP Server Integration
The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools.
## Extending the Framework
- **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`.
- **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`.
- **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`.
## Configuration
- **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file.
- **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules.
## Dependencies
- `openai`
- `trafilatura`
- `pydantic`
- `googlesearch-python`
- `python-dotenv`
- `google-api-python-client`
See `requirements.txt` for the full list.
## License
This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.
## Acknowledgements
- OpenAI
- Trafilatura
- Google Custom Search
For questions or contributions, please open an issue or pull request. |