File size: 4,189 Bytes
240e7aa
 
 
 
 
 
7952072
 
 
e19b252
 
 
240e7aa
12f9530
 
 
 
 
 
d7d254e
4a6206f
d7d254e
 
12f9530
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240e7aa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: web search MCP-server
sdk: gradio
colorFrom: green
colorTo: green
short_description: MCP server for general and custom search on web
sdk_version: 5.34.0
tags:
 - mcp-server-track
app_file: app.py
pinned: true

---
# Search Tool

## Overview

**Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.

## Demo video
Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing
![Demo](./demo.webm)

## Features

- **Custom Site Search:** Search within a specified list of websites.  
- **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`).  
- **General Web Search:** Perform open web searches.  
- **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/).  
- **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models.  
- **Validation:** Ensures URLs are valid before processing.  
- **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers.

## Project Structure

```
search_tool/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ analyzer/         # AI-powered analyzers (e.g., OpenAI)
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ factory/      # Factories for searcher, scraper,    
β”‚   β”‚   β”œβ”€β”€ interface/    # Abstract interfaces for extensibility
β”‚   β”‚   └── types.py      # Enums and constants
β”‚   β”œβ”€β”€ mcp_servers/      # MCP server integration
β”‚   β”œβ”€β”€ models/           # Pydantic models for data validation
β”‚   β”œβ”€β”€ scraper/          # Web scrapers (e.g., Trafilatura)
β”‚   β”œβ”€β”€ searcher/         # Search engine integrations
β”‚   β”œβ”€β”€ tools/            # User-facing tool functions
β”‚   └── utils/            # Utility functions (e.g., URL validation)
β”œβ”€β”€ test.py               # Example/test script
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ pyproject.toml        # Project metadata and dependencies
β”œβ”€β”€ .env                  # Environment variables (e.g., API keys)
└── README.md             # Project documentation
```

## Installation

1. **Clone the repository:**
   ```sh
   git clone https://github.com/ola172/web-search-mcp-server.git
   cd search_tool
   ```

2. **Set up a virtual environment (recommended):**
   ```sh
   python3 -m venv .venv
   source .venv/bin/activate
   ```

3. **Install dependencies:**
   ```sh
   pip install -r requirements.txt
   ```

4. **Configure environment variables:**
   - Copy `.env.example` to `.env` 
   - Add your secrets:

## Usage

### Core Tools

Each tool validates input, performs the search, scrapes the results, and analyzes the content.

- **General Web Search:** `search_on_web`
- **Custom Sites Search:** `search_custom_sites`
- **Custom Domains Search:** `search_custom_domain`

### MCP Server Integration

The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools.

## Extending the Framework

- **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`.  
- **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`.  
- **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`.

## Configuration

- **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file.  
- **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules.

## Dependencies

- `openai`  
- `trafilatura`  
- `pydantic`  
- `googlesearch-python`  
- `python-dotenv`  
- `google-api-python-client`  

See `requirements.txt` for the full list.

## License

This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.

## Acknowledgements

- OpenAI  
- Trafilatura  
- Google Custom Search  

For questions or contributions, please open an issue or pull request.