web-scraper / .github\copilot-instructions.md
spagestic's picture
Create .github\copilot-instructions.md
8088298 verified
<!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file -->
# Web Scraper Project Instructions
This is a Python Gradio application for web scraping that:
- Scrapes text content from websites
- Formats content as markdown
- Generates sitemaps from page links
- Provides MCP (Model Context Protocol) server functionality
## Key Libraries
- gradio[mcp]: For the web interface and MCP server capabilities
- requests: For HTTP requests
- beautifulsoup4: For HTML parsing
- markdownify: For converting HTML to markdown
- urllib.parse: For URL handling
## Project Structure
- `app.py`: Main web interface application
- `mcp_server.py`: MCP server that exposes tools for AI integration
## MCP Tools
The MCP server exposes three main tools:
- `scrape_content`: Extract website content as markdown
- `generate_sitemap`: Create sitemap from page links
- `analyze_website`: Complete analysis with content and sitemap
## Code Style
- Use type hints where appropriate
- Include proper error handling for web requests
- Follow PEP 8 style guidelines
- Add docstrings for functions with clear parameter descriptions
- MCP functions should have descriptive docstrings as they become tool descriptions