- PDF Reader MCP Server (@shtse8/pdf-reader-mcp)
PDF Reader MCP Server (@shtse8/pdf-reader-mcp)
Empower your AI agents (like Cline/Claude) with the ability to read and extract information from PDF files within your project, using a single, flexible tool.
This Node.js server implements the
Model Context Protocol (MCP) to
provide a consolidated read_pdf
tool for interacting with PDF documents (local
or URL) located within a defined project root directory.
⭐ Why Use This Server?
- 🛡️ Secure Project Root Focus:
- All local file operations are strictly confined to the project root directory (determined by the server's launch context), preventing unauthorized access.
- Uses relative paths for local files. Important: The server
determines its project root from its own Current Working Directory (
cwd
) at launch. The process starting the server (e.g., your MCP host) must set thecwd
to your intended project directory.
- 🌐 URL Support: Can directly process PDFs from public URLs.
- ⚡ Efficient PDF Processing:
- Leverages the
pdf-parse
library for extracting text, metadata, and page information.
- Leverages the
- 🔧 Flexible & Consolidated Tool:
- A single
read_pdf
tool handles various extraction needs via parameters, simplifying agent interaction.
- A single
- 🚀 Easy Integration: Get started quickly using
npx
with minimal configuration. - 🐳 Containerized Option: Also available as a Docker image for consistent deployment environments.
- ✅ Robust Validation: Uses Zod schemas to validate all incoming tool arguments.
npx
)
🚀 Quick Start: Usage with MCP Host (Recommended: The simplest way is via npx
, configured in your MCP host (e.g.,
mcp_settings.json
).
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "npx",
"args": [
"@shtse8/pdf-reader-mcp"
],
"name": "PDF Reader (npx)"
}
}
}
(Alternative) Using bunx
:
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "bunx",
"args": [
"@shtse8/pdf-reader-mcp"
],
"name": "PDF Reader (bunx)"
}
}
}
Important: Ensure your MCP Host launches the command with the cwd
set to
your project's root directory for local file access.
read_pdf
Tool
✨ The This server provides a single, powerful tool: read_pdf
.
- Description: Reads content, metadata, or page count from a PDF file (local or URL), controlled by parameters.
- Input: An object containing:
sources
(array): Required. An array of source objects. Each object must contain eitherpath
(string, relative path to local PDF) orurl
(string, URL of PDF). Each source object can optionally include:pages
(string | number[], optional): Extract text only from specific pages (1-based) or ranges (e.g.,[1, 3, 5]
or'1,3-5,7'
) for this specific source. If provided, the globalinclude_full_text
flag is ignored for this source.
include_full_text
(boolean, optional, defaultfalse
): Include the full text content for each PDF. Ignored ifpages
is provided.include_metadata
(boolean, optional, defaulttrue
): Include metadata (info
andmetadata
objects) for each PDF.include_page_count
(boolean, optional, defaulttrue
): Include the total number of pages (num_pages
) for each PDF.
- Output: An object containing a
results
array. Each element corresponds to a source in the inputsources
array. Processing continues even if some sources fail. Each result object has the following structure:source
(string): The original path or URL provided for identification.success
(boolean): Indicates if processing this specific source was successful.error
(string, optional): Provides an error message ifsuccess
is false for this source.data
(object, optional): Contains the extracted data ifsuccess
is true for this source:full_text
(string, optional)page_texts
(array, optional): Array of{ page: number, text: string }
.missing_pages
(array, optional)info
(object, optional)metadata
(object, optional)num_pages
(number, optional)warnings
(array, optional): Non-critical warnings for this source (e.g., requested page out of bounds).
-
Get metadata and page count for multiple files:
{ "sources": [ { "path": "report.pdf" }, { "url": "http://example.com/another.pdf" }, { "path": "nonexistent.pdf" } ] }
(Example Output:
{ "results": [ { "source": "report.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 10 } }, { "source": "http://example.com/another.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 5 } }, { "source": "nonexistent.pdf", "success": false, "error": "File not found..." } ] }
) -
Get full text for one file:
{ "sources": [{ "url": "http://example.com/document.pdf" }], "include_full_text": true, "include_metadata": false, "include_page_count": false }
(Example Output:
{ "results": [ { "source": "http://example.com/document.pdf", "success": true, "data": { "full_text": "..." } } ] }
) -
Get text from different pages for different files:
{ "sources": [ { "path": "manual.pdf", "pages": "1-2" }, { "url": "http://example.com/report.pdf", "pages": [5] } ], "include_metadata": false, /* Default is true, explicitly set false */ "include_page_count": false /* Default is true, explicitly set false */ }
(Example Output:
{ "results": [ { "source": "manual.pdf", "success": true, "data": { "page_texts": [...] } }, { "source": "http://example.com/report.pdf", "success": true, "data": { "page_texts": [...] } } ] }
)
🐳 Alternative Usage: Docker
Configure your MCP Host to run the Docker container, mounting your project
directory to /app
.
{
"mcpServers": {
"pdf-reader-mcp": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-v",
"/path/to/your/project:/app",
"shtse8/pdf-reader-mcp:latest"
],
"name": "PDF Reader (Docker)"
}
}
}
Note on Volume Mount Path: Instead of hardcoding /path/to/your/project
,
you can often use shell variables to automatically use the current working
directory:
- Linux/macOS:
-v "$PWD:/app"
- Windows Cmd:
-v "%CD%:/app"
- Windows PowerShell:
-v "${PWD}:/app"
- VS Code Tasks/Launch: You might be able to use
${workspaceFolder}
if supported by your MCP host integration.
🛠️ Other Usage Options
Local Build (For Development)
- Clone:
git clone https://github.com/shtse8/pdf-reader-mcp.git
- Install:
cd pdf-reader-mcp && npm install
- Build:
npm run build
- Configure MCP Host:
{ "mcpServers": { "pdf-reader-mcp": { "command": "node", "args": ["/path/to/cloned/repo/pdf-reader-mcp/build/index.js"], "name": "PDF Reader (Local Build)" } } }
💻 Development
- Clone,
npm install
,npm run build
. npm run watch
for auto-recompile.
🚢 Publishing (via GitHub Actions)
Uses GitHub Actions (.github/workflows/publish.yml
) to publish to npm and
Docker Hub on pushes to main
. Requires NPM_TOKEN
, DOCKERHUB_USERNAME
,
DOCKERHUB_TOKEN
secrets.
🙌 Contributing
Contributions welcome! Open an issue or PR.