A Model Context Protocol (MCP) server that provides documentation scraping functionality. This server converts web-based documentation into markdown format using jina.ai's conversion service.
- Scrapes documentation from any web URL
- Converts HTML documentation to markdown format
- Saves the converted documentation to a specified output path
- Integrates with the Model Context Protocol (MCP)
To install Doc Scraper for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @askjohngeorge/mcp-doc-scraper --client claude
- Clone the repository:
git clone https://github.com/askjohngeorge/mcp-doc-scraper.git
cd mcp-doc-scraper
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the dependencies:
pip install -e .
The server can be run using Python:
python -m mcp_doc_scraper
The server provides a single tool:
- Name:
scrape_docs
- Description: Scrape documentation from a URL and save as markdown
- Input Parameters:
url
: The URL of the documentation to scrapeoutput_path
: The path where the markdown file should be saved
doc_scraper/
├── __init__.py
├── __main__.py
└── server.py
- aiohttp
- mcp
- pydantic
To set up the development environment:
- Install development dependencies:
pip install -r requirements.txt
- The server uses the Model Context Protocol. Make sure to familiarize yourself with MCP documentation.
MIT License