|
<!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file --> |
|
|
|
# Web Scraper Project Instructions |
|
|
|
This is a Python Gradio application for web scraping that: |
|
|
|
- Scrapes text content from websites |
|
- Formats content as markdown |
|
- Generates sitemaps from page links |
|
- Provides MCP (Model Context Protocol) server functionality |
|
|
|
## Key Libraries |
|
|
|
- gradio[mcp]: For the web interface and MCP server capabilities |
|
- requests: For HTTP requests |
|
- beautifulsoup4: For HTML parsing |
|
- markdownify: For converting HTML to markdown |
|
- urllib.parse: For URL handling |
|
|
|
## Project Structure |
|
|
|
- `app.py`: Main web interface application |
|
- `mcp_server.py`: MCP server that exposes tools for AI integration |
|
|
|
## MCP Tools |
|
|
|
The MCP server exposes three main tools: |
|
|
|
- `scrape_content`: Extract website content as markdown |
|
- `generate_sitemap`: Create sitemap from page links |
|
- `analyze_website`: Complete analysis with content and sitemap |
|
|
|
## Code Style |
|
|
|
- Use type hints where appropriate |
|
- Include proper error handling for web requests |
|
- Follow PEP 8 style guidelines |
|
- Add docstrings for functions with clear parameter descriptions |
|
- MCP functions should have descriptive docstrings as they become tool descriptions |
|
|