Spaces:
Sleeping
Sleeping
Deep Chavda commited on
Commit ·
4ccde7a
0
Parent(s):
feat: initial release — PDF to Markdown MCP server
Browse files- FastMCP SSE server with two tools: pdf_to_markdown and pdf_to_structured_markdown
- Mistral OCR integration with lifespan-managed client
- Pydantic Settings: secrets in .env, non-secret config in development.yml
- Loguru structured logging across all layers
- CORS middleware + /health liveness probe
- Railway deployment config (railway.json, proxy_headers, PORT injection)
- .gitignore, sample.env, and uv lockfile included
- .gitignore +29 -0
- .python-version +1 -0
- README.md +281 -0
- app/core/config.py +69 -0
- app/core/exceptions.py +10 -0
- app/core/lifespan.py +27 -0
- app/core/logger.py +20 -0
- app/server.py +52 -0
- app/services/ocr_service.py +88 -0
- app/tools/__init__.py +3 -0
- app/tools/markdown_tools.py +60 -0
- app/utils/response.py +30 -0
- app/utils/validators.py +30 -0
- development.yml +18 -0
- main.py +18 -0
- pyproject.toml +26 -0
- railway.json +13 -0
- sample.env +1 -0
- uv.lock +0 -0
.gitignore
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python-generated files
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[oc]
|
| 4 |
+
build/
|
| 5 |
+
dist/
|
| 6 |
+
wheels/
|
| 7 |
+
*.egg-info
|
| 8 |
+
|
| 9 |
+
# Virtual environments
|
| 10 |
+
.venv
|
| 11 |
+
|
| 12 |
+
# Secrets — never commit real credentials
|
| 13 |
+
.env
|
| 14 |
+
|
| 15 |
+
# Local dev output
|
| 16 |
+
output/
|
| 17 |
+
docs/
|
| 18 |
+
temp/
|
| 19 |
+
logs/
|
| 20 |
+
|
| 21 |
+
# Test files
|
| 22 |
+
test.py
|
| 23 |
+
tests/
|
| 24 |
+
|
| 25 |
+
# OS / editor noise
|
| 26 |
+
.DS_Store
|
| 27 |
+
.idea/
|
| 28 |
+
.vscode/
|
| 29 |
+
|
.python-version
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
3.12
|
README.md
ADDED
|
@@ -0,0 +1,281 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<p>
|
| 2 |
+
<div align="center">
|
| 3 |
+
<h1>
|
| 4 |
+
PDF to Markdown MCP
|
| 5 |
+
<br /> <br />
|
| 6 |
+
<a href="">
|
| 7 |
+
<img
|
| 8 |
+
src="https://img.shields.io/badge/python%20%7C%203.12-blue"
|
| 9 |
+
alt="Python 3.12"
|
| 10 |
+
/>
|
| 11 |
+
</a>
|
| 12 |
+
<a href="https://github.com/astral-sh/uv">
|
| 13 |
+
<img
|
| 14 |
+
src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json"
|
| 15 |
+
alt="uv"
|
| 16 |
+
/>
|
| 17 |
+
</a>
|
| 18 |
+
<a href="https://modelcontextprotocol.io/">
|
| 19 |
+
<img
|
| 20 |
+
src="https://img.shields.io/badge/MCP-FastMCP-6C47FF"
|
| 21 |
+
alt="FastMCP"
|
| 22 |
+
/>
|
| 23 |
+
</a>
|
| 24 |
+
<a href="https://mistral.ai/">
|
| 25 |
+
<img
|
| 26 |
+
src="https://img.shields.io/badge/Mistral%20AI-FF7000?logoColor=white"
|
| 27 |
+
alt="Mistral AI"
|
| 28 |
+
/>
|
| 29 |
+
</a>
|
| 30 |
+
<a href="https://www.starlette.io/">
|
| 31 |
+
<img
|
| 32 |
+
src="https://img.shields.io/badge/Starlette-ASGI-009688"
|
| 33 |
+
alt="Starlette"
|
| 34 |
+
/>
|
| 35 |
+
</a>
|
| 36 |
+
<a href="https://www.uvicorn.org/">
|
| 37 |
+
<img
|
| 38 |
+
src="https://img.shields.io/badge/Uvicorn-server-4051B5"
|
| 39 |
+
alt="Uvicorn"
|
| 40 |
+
/>
|
| 41 |
+
</a>
|
| 42 |
+
<a href="https://loguru.readthedocs.io/">
|
| 43 |
+
<img
|
| 44 |
+
src="https://img.shields.io/badge/Loguru-logging-FF6B6B"
|
| 45 |
+
alt="Loguru"
|
| 46 |
+
/>
|
| 47 |
+
</a>
|
| 48 |
+
</h1>
|
| 49 |
+
</div>
|
| 50 |
+
</p>
|
| 51 |
+
|
| 52 |
+
An MCP (Model Context Protocol) server that converts PDFs and documents into Markdown using **Mistral OCR**.
|
| 53 |
+
|
| 54 |
+
## Features
|
| 55 |
+
|
| 56 |
+
- **`pdf_to_markdown`** — Convert any publicly accessible PDF/document URL to merged Markdown.
|
| 57 |
+
- **`pdf_to_structured_markdown`** — Convert and get per-page structured output (page index, individual markdown, merged result).
|
| 58 |
+
- CORS-enabled SSE transport — connect from any MCP client or inspector.
|
| 59 |
+
- `/health` endpoint for liveness probing.
|
| 60 |
+
- Structured, colorized logging via Loguru.
|
| 61 |
+
|
| 62 |
+
## Project Structure
|
| 63 |
+
|
| 64 |
+
```
|
| 65 |
+
pdf_to_md_mcp/
|
| 66 |
+
├── main.py # Entry point — uvicorn runner
|
| 67 |
+
├── pyproject.toml
|
| 68 |
+
├── sample.env # Secrets reference (copy to .env)
|
| 69 |
+
├── development.yml # Non-secret config (server, CORS, OCR model)
|
| 70 |
+
└── app/
|
| 71 |
+
├── server.py # ASGI app factory (MCP + CORS + health)
|
| 72 |
+
├── core/
|
| 73 |
+
│ ├── config.py # Pydantic settings (loads .env + development.yml)
|
| 74 |
+
│ ├── logger.py # Loguru logger
|
| 75 |
+
│ ├── lifespan.py # AppContext + Mistral client lifecycle
|
| 76 |
+
│ └── exceptions.py # Domain exceptions
|
| 77 |
+
├── services/
|
| 78 |
+
│ └── ocr_service.py # Mistral OCR business logic
|
| 79 |
+
├── tools/
|
| 80 |
+
│ └── markdown_tools.py # @mcp.tool() definitions
|
| 81 |
+
└── utils/
|
| 82 |
+
├── response.py # create_response() helper
|
| 83 |
+
└── validators.py # URL validation
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## Setup
|
| 87 |
+
|
| 88 |
+
```bash
|
| 89 |
+
# Install uv if not already installed
|
| 90 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
| 91 |
+
|
| 92 |
+
# Install dependencies
|
| 93 |
+
uv sync
|
| 94 |
+
|
| 95 |
+
# Configure secrets
|
| 96 |
+
cp sample.env .env
|
| 97 |
+
# Edit .env — set MISTRAL_API_KEY
|
| 98 |
+
# Non-secret config (server, CORS, OCR model) lives in development.yml
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
## Run
|
| 102 |
+
|
| 103 |
+
```bash
|
| 104 |
+
uv run main.py
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
Server starts at `http://127.0.0.1:8000` by default.
|
| 108 |
+
|
| 109 |
+
| Endpoint | Description |
|
| 110 |
+
| --- | --- |
|
| 111 |
+
| `GET /health` | Liveness probe |
|
| 112 |
+
| `GET /sse` | MCP SSE transport |
|
| 113 |
+
| `POST /messages/` | MCP message handler |
|
| 114 |
+
|
| 115 |
+
## MCP Tools
|
| 116 |
+
|
| 117 |
+
### `pdf_to_markdown`
|
| 118 |
+
|
| 119 |
+
Convert a document URL to merged Markdown (all pages concatenated).
|
| 120 |
+
|
| 121 |
+
**Input**
|
| 122 |
+
|
| 123 |
+
| Parameter | Type | Description |
|
| 124 |
+
| --- | --- | --- |
|
| 125 |
+
| `document_url` | `string` | Publicly accessible URL of a PDF or image document |
|
| 126 |
+
|
| 127 |
+
**Returns** — `string`
|
| 128 |
+
|
| 129 |
+
```
|
| 130 |
+
# Introduction
|
| 131 |
+
|
| 132 |
+
This paper presents...
|
| 133 |
+
|
| 134 |
+
## Section 2
|
| 135 |
+
|
| 136 |
+
...
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
### `pdf_to_structured_markdown`
|
| 142 |
+
|
| 143 |
+
Convert a document URL and get per-page structured output alongside the merged result.
|
| 144 |
+
|
| 145 |
+
**Input**
|
| 146 |
+
|
| 147 |
+
| Parameter | Type | Description |
|
| 148 |
+
| --- | --- | --- |
|
| 149 |
+
| `document_url` | `string` | Publicly accessible URL of a PDF or image document |
|
| 150 |
+
|
| 151 |
+
**Returns** — `object`
|
| 152 |
+
|
| 153 |
+
```json
|
| 154 |
+
{
|
| 155 |
+
"page_count": 3,
|
| 156 |
+
"pages": [
|
| 157 |
+
{ "index": 0, "markdown": "# Page 1\n..." },
|
| 158 |
+
{ "index": 1, "markdown": "## Page 2\n..." },
|
| 159 |
+
{ "index": 2, "markdown": "### Page 3\n..." }
|
| 160 |
+
],
|
| 161 |
+
"markdown": "# Page 1\n...\n\n## Page 2\n...\n\n### Page 3\n..."
|
| 162 |
+
}
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
## Debugging with MCP Inspector
|
| 166 |
+
|
| 167 |
+
```bash
|
| 168 |
+
npx -y @modelcontextprotocol/inspector
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
Connect to `http://127.0.0.1:8000/sse` locally or your Railway URL in production.
|
| 172 |
+
|
| 173 |
+
## Deploy to Railway
|
| 174 |
+
|
| 175 |
+
### 1. Push to GitHub
|
| 176 |
+
|
| 177 |
+
```bash
|
| 178 |
+
git init
|
| 179 |
+
git add .
|
| 180 |
+
git commit -m "initial commit"
|
| 181 |
+
gh repo create pdf-to-md-mcp --public --source=. --push
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
### 2. Create a Railway project
|
| 185 |
+
|
| 186 |
+
Go to [railway.app](https://railway.app) → **New Project** → **Deploy from GitHub repo** → select your repo.
|
| 187 |
+
|
| 188 |
+
Railway detects the `railway.json` and uses `uv run main.py` as the start command automatically.
|
| 189 |
+
|
| 190 |
+
### 3. Set environment variables
|
| 191 |
+
|
| 192 |
+
In Railway → your service → **Variables**, add:
|
| 193 |
+
|
| 194 |
+
| Variable | Value |
|
| 195 |
+
|---|---|
|
| 196 |
+
| `MISTRAL_API_KEY` | your Mistral API key |
|
| 197 |
+
| `HOST` | `0.0.0.0` |
|
| 198 |
+
|
| 199 |
+
> `PORT` is injected automatically by Railway — do **not** set it manually.
|
| 200 |
+
> All other config (`MISTRAL_OCR_MODEL`, `LOG_LEVEL`, etc.) is read from `development.yml`.
|
| 201 |
+
|
| 202 |
+
### 4. Deploy
|
| 203 |
+
|
| 204 |
+
Railway triggers a deploy on every push to your default branch. Once live, your public SSE URL will be:
|
| 205 |
+
|
| 206 |
+
```
|
| 207 |
+
https://<your-service>.up.railway.app/sse
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
Use that URL in any MCP client or pass it to the inspector:
|
| 211 |
+
|
| 212 |
+
```bash
|
| 213 |
+
npx -y @modelcontextprotocol/inspector
|
| 214 |
+
# connect to: https://<your-service>.up.railway.app/sse
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
### Why it works
|
| 218 |
+
|
| 219 |
+
- Railway injects `PORT` as an env var — pydantic-settings reads env vars before `development.yml`, so it's picked up automatically.
|
| 220 |
+
- `HOST=0.0.0.0` (set via Railway Variables) overrides the local `127.0.0.1` default so the container is reachable.
|
| 221 |
+
- `proxy_headers=True` in `main.py` makes uvicorn trust Railway's `X-Forwarded-*` headers.
|
| 222 |
+
- `/health` is set as Railway's healthcheck path in `railway.json`.
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
|
| 226 |
+
## Configuration
|
| 227 |
+
|
| 228 |
+
Configuration is split across two files to separate secrets from non-sensitive settings.
|
| 229 |
+
|
| 230 |
+
### `.env` — Secrets only
|
| 231 |
+
|
| 232 |
+
```dotenv
|
| 233 |
+
MISTRAL_API_KEY=your_mistral_api_key_here
|
| 234 |
+
```
|
| 235 |
+
|
| 236 |
+
### `development.yml` — Non-secret config
|
| 237 |
+
|
| 238 |
+
```yaml
|
| 239 |
+
# Mistral
|
| 240 |
+
MISTRAL_OCR_MODEL: mistral-ocr-latest
|
| 241 |
+
MISTRAL_TABLE_FORMAT: markdown
|
| 242 |
+
|
| 243 |
+
# Server
|
| 244 |
+
APP_NAME: "Markdown & Layout Extractor"
|
| 245 |
+
HOST: "127.0.0.1"
|
| 246 |
+
PORT: 8000
|
| 247 |
+
LOG_LEVEL: INFO
|
| 248 |
+
|
| 249 |
+
# CORS
|
| 250 |
+
CORS_ALLOW_ORIGINS:
|
| 251 |
+
- "*"
|
| 252 |
+
CORS_ALLOW_METHODS:
|
| 253 |
+
- "*"
|
| 254 |
+
CORS_ALLOW_HEADERS:
|
| 255 |
+
- "*"
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
**Priority (highest → lowest):** environment variables → `.env` → `development.yml`
|
| 259 |
+
|
| 260 |
+
### All settings
|
| 261 |
+
|
| 262 |
+
| Variable | File | Default | Description |
|
| 263 |
+
| --- | --- | --- | --- |
|
| 264 |
+
| `MISTRAL_API_KEY` | `.env` | **required** | Mistral AI API key |
|
| 265 |
+
| `MISTRAL_OCR_MODEL` | `development.yml` | `mistral-ocr-latest` | OCR model identifier |
|
| 266 |
+
| `MISTRAL_TABLE_FORMAT` | `development.yml` | `markdown` | Table output format |
|
| 267 |
+
| `APP_NAME` | `development.yml` | `Markdown & Layout Extractor` | MCP server name |
|
| 268 |
+
| `HOST` | `development.yml` | `127.0.0.1` | Bind address |
|
| 269 |
+
| `PORT` | `development.yml` | `8000` | Bind port |
|
| 270 |
+
| `LOG_LEVEL` | `development.yml` | `INFO` | Log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`) |
|
| 271 |
+
| `CORS_ALLOW_ORIGINS` | `development.yml` | `["*"]` | Allowed CORS origins |
|
| 272 |
+
| `CORS_ALLOW_METHODS` | `development.yml` | `["*"]` | Allowed HTTP methods |
|
| 273 |
+
| `CORS_ALLOW_HEADERS` | `development.yml` | `["*"]` | Allowed HTTP headers |
|
| 274 |
+
|
| 275 |
+
## Design Notes
|
| 276 |
+
|
| 277 |
+
- **Single Starlette app** — `sse_app()` is the sole ASGI application; the health route and CORS middleware are injected directly onto it to prevent double-middleware stacking (which causes the `http.response.start` crash).
|
| 278 |
+
- **Separation of concerns** — Tools are thin wrappers around `OCRService`; business logic is independently testable.
|
| 279 |
+
- **Lifespan-managed client** — The Mistral client is initialized once at startup and shared across all tool calls.
|
| 280 |
+
- **Loguru logging** — Structured, colorized logs across all layers via Loguru.
|
| 281 |
+
- **Pydantic Settings** — Type-safe, `.env`-driven configuration with an LRU-cached singleton.
|
app/core/config.py
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import List, Tuple, Type
|
| 2 |
+
from functools import lru_cache
|
| 3 |
+
|
| 4 |
+
from pydantic import Field
|
| 5 |
+
from pydantic_settings import (
|
| 6 |
+
BaseSettings,
|
| 7 |
+
PydanticBaseSettingsSource,
|
| 8 |
+
SettingsConfigDict,
|
| 9 |
+
YamlConfigSettingsSource,
|
| 10 |
+
)
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class Settings(BaseSettings):
|
| 14 |
+
"""Centralized settings.
|
| 15 |
+
|
| 16 |
+
Priority (highest → lowest):
|
| 17 |
+
1. Environment variables
|
| 18 |
+
2. .env file ← secrets only (MISTRAL_API_KEY)
|
| 19 |
+
3. development.yml ← non-secret config (model, server, CORS)
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
model_config = SettingsConfigDict(
|
| 23 |
+
env_file=".env",
|
| 24 |
+
env_file_encoding="utf-8",
|
| 25 |
+
case_sensitive=True,
|
| 26 |
+
extra="ignore",
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
@classmethod
|
| 30 |
+
def settings_customise_sources(
|
| 31 |
+
cls,
|
| 32 |
+
settings_cls: Type[PydanticBaseSettingsSource],
|
| 33 |
+
init_settings: PydanticBaseSettingsSource,
|
| 34 |
+
env_settings: PydanticBaseSettingsSource,
|
| 35 |
+
dotenv_settings: PydanticBaseSettingsSource,
|
| 36 |
+
file_secret_settings: PydanticBaseSettingsSource,
|
| 37 |
+
) -> Tuple[PydanticBaseSettingsSource, ...]:
|
| 38 |
+
return (
|
| 39 |
+
init_settings,
|
| 40 |
+
env_settings,
|
| 41 |
+
dotenv_settings,
|
| 42 |
+
YamlConfigSettingsSource(settings_cls, yaml_file="development.yml"),
|
| 43 |
+
file_secret_settings,
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
# ── Mistral (secret in .env, rest in development.yml) ─────────────────────
|
| 47 |
+
MISTRAL_API_KEY: str
|
| 48 |
+
MISTRAL_OCR_MODEL: str = "mistral-ocr-latest"
|
| 49 |
+
MISTRAL_TABLE_FORMAT: str = "markdown"
|
| 50 |
+
|
| 51 |
+
# ── Server (development.yml) ───────────────────────────────────────────────
|
| 52 |
+
APP_NAME: str = "Markdown & Layout Extractor"
|
| 53 |
+
HOST: str = "127.0.0.1"
|
| 54 |
+
PORT: int = 8000
|
| 55 |
+
LOG_LEVEL: str = "INFO"
|
| 56 |
+
|
| 57 |
+
# ── CORS (development.yml) ─────────────────────────────────────────────────
|
| 58 |
+
CORS_ALLOW_ORIGINS: List[str] = Field(default_factory=lambda: ["*"])
|
| 59 |
+
CORS_ALLOW_METHODS: List[str] = Field(default_factory=lambda: ["*"])
|
| 60 |
+
CORS_ALLOW_HEADERS: List[str] = Field(default_factory=lambda: ["*"])
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
@lru_cache
|
| 64 |
+
def get_settings() -> Settings:
|
| 65 |
+
"""Cached settings instance — call this everywhere instead of instantiating."""
|
| 66 |
+
return Settings()
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
settings = get_settings()
|
app/core/exceptions.py
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
class MCPExtractorError(Exception):
|
| 2 |
+
"""Base exception for this application."""
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
class OCRProcessingError(MCPExtractorError):
|
| 6 |
+
"""Raised when OCR / document conversion fails."""
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
class InvalidDocumentURLError(MCPExtractorError):
|
| 10 |
+
"""Raised when the provided document URL is invalid or unreachable."""
|
app/core/lifespan.py
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from collections.abc import AsyncIterator
|
| 2 |
+
from contextlib import asynccontextmanager
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
|
| 5 |
+
from mcp.server.fastmcp import FastMCP
|
| 6 |
+
from mistralai.client import Mistral
|
| 7 |
+
|
| 8 |
+
from app.core.config import settings
|
| 9 |
+
from app.core.logger import logger
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
@dataclass
|
| 13 |
+
class AppContext:
|
| 14 |
+
"""Shared resources available to all tools via ctx.request_context.lifespan_context."""
|
| 15 |
+
|
| 16 |
+
mistral: Mistral
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
@asynccontextmanager
|
| 20 |
+
async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
|
| 21 |
+
"""Initialize and cleanly tear down shared clients."""
|
| 22 |
+
logger.info("Initializing Mistral client...")
|
| 23 |
+
client = Mistral(api_key=settings.MISTRAL_API_KEY)
|
| 24 |
+
try:
|
| 25 |
+
yield AppContext(mistral=client)
|
| 26 |
+
finally:
|
| 27 |
+
logger.info("Shutting down lifespan resources.")
|
app/core/logger.py
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import sys
|
| 2 |
+
|
| 3 |
+
from loguru import logger
|
| 4 |
+
|
| 5 |
+
from app.core.config import settings
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def _configure_logger() -> None:
|
| 9 |
+
logger.remove()
|
| 10 |
+
logger.add(
|
| 11 |
+
sys.stdout,
|
| 12 |
+
format="{time:YYYY-MM-DD HH:mm:ss} | {level:<8} | {name} | {message}",
|
| 13 |
+
level=settings.LOG_LEVEL,
|
| 14 |
+
colorize=True,
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
_configure_logger()
|
| 19 |
+
|
| 20 |
+
__all__ = ["logger"]
|
app/server.py
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from mcp.server.fastmcp import FastMCP
|
| 2 |
+
from starlette.middleware.cors import CORSMiddleware
|
| 3 |
+
from starlette.requests import Request
|
| 4 |
+
|
| 5 |
+
from app.core.config import settings
|
| 6 |
+
from app.core.lifespan import app_lifespan
|
| 7 |
+
from app.core.logger import logger
|
| 8 |
+
from app.tools import register_markdown_tools
|
| 9 |
+
from app.utils.response import create_response
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def _build_mcp_server() -> FastMCP:
|
| 13 |
+
"""Build the FastMCP instance and register every tool."""
|
| 14 |
+
mcp = FastMCP(settings.APP_NAME, lifespan=app_lifespan)
|
| 15 |
+
register_markdown_tools(mcp)
|
| 16 |
+
return mcp
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
async def _health(_: Request):
|
| 20 |
+
"""Simple liveness probe."""
|
| 21 |
+
return create_response(
|
| 22 |
+
status_value=True,
|
| 23 |
+
message="Service is healthy",
|
| 24 |
+
data={"app": settings.APP_NAME},
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def create_app():
|
| 29 |
+
"""Build the ASGI application.
|
| 30 |
+
|
| 31 |
+
Mounts CORS and the health route directly on the MCP Starlette app to avoid
|
| 32 |
+
nesting two Starlette instances (which produces a double http.response.start
|
| 33 |
+
crash with uvicorn).
|
| 34 |
+
"""
|
| 35 |
+
logger.info("Building application: {}", settings.APP_NAME)
|
| 36 |
+
mcp = _build_mcp_server()
|
| 37 |
+
|
| 38 |
+
# sse_app() returns a Starlette instance — use it as the single ASGI app.
|
| 39 |
+
app = mcp.sse_app()
|
| 40 |
+
|
| 41 |
+
# Inject health route before the MCP catch-all mount.
|
| 42 |
+
app.add_route("/health", _health, methods=["GET"])
|
| 43 |
+
|
| 44 |
+
# Add CORS once at the outermost middleware layer.
|
| 45 |
+
app = CORSMiddleware(
|
| 46 |
+
app,
|
| 47 |
+
allow_origins=settings.CORS_ALLOW_ORIGINS,
|
| 48 |
+
allow_methods=settings.CORS_ALLOW_METHODS,
|
| 49 |
+
allow_headers=settings.CORS_ALLOW_HEADERS,
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
return app
|
app/services/ocr_service.py
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Any, Dict, List
|
| 2 |
+
|
| 3 |
+
from mistralai.client import Mistral
|
| 4 |
+
|
| 5 |
+
from app.core.logger import logger
|
| 6 |
+
from app.core.config import settings
|
| 7 |
+
from app.core.exceptions import OCRProcessingError
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class OCRService:
|
| 11 |
+
"""Encapsulates document-to-markdown conversion via Mistral OCR."""
|
| 12 |
+
|
| 13 |
+
def __init__(self, client: Mistral) -> None:
|
| 14 |
+
self._client = client
|
| 15 |
+
self._model = settings.MISTRAL_OCR_MODEL
|
| 16 |
+
self._table_format = settings.MISTRAL_TABLE_FORMAT
|
| 17 |
+
|
| 18 |
+
async def document_to_markdown(self, document_url: str) -> str:
|
| 19 |
+
"""Convert a remote document (PDF / image) to markdown.
|
| 20 |
+
|
| 21 |
+
Args:
|
| 22 |
+
document_url: Public URL of the document.
|
| 23 |
+
|
| 24 |
+
Returns:
|
| 25 |
+
Concatenated markdown content (pages joined by blank lines).
|
| 26 |
+
|
| 27 |
+
Raises:
|
| 28 |
+
OCRProcessingError: If the OCR call fails or returns no pages.
|
| 29 |
+
"""
|
| 30 |
+
logger.info("Starting OCR for document: {}", document_url)
|
| 31 |
+
try:
|
| 32 |
+
response = await self._client.ocr.process_async(
|
| 33 |
+
model=self._model,
|
| 34 |
+
document={
|
| 35 |
+
"type": "document_url",
|
| 36 |
+
"document_url": document_url,
|
| 37 |
+
},
|
| 38 |
+
table_format=self._table_format,
|
| 39 |
+
)
|
| 40 |
+
except Exception as exc:
|
| 41 |
+
logger.exception("Mistral OCR call failed")
|
| 42 |
+
raise OCRProcessingError(f"OCR processing failed: {exc}") from exc
|
| 43 |
+
|
| 44 |
+
pages = getattr(response, "pages", None) or []
|
| 45 |
+
if not pages:
|
| 46 |
+
raise OCRProcessingError("OCR returned no pages for the given document.")
|
| 47 |
+
|
| 48 |
+
markdown = "\n\n".join(
|
| 49 |
+
page.markdown for page in pages if getattr(page, "markdown", None)
|
| 50 |
+
)
|
| 51 |
+
logger.info("OCR succeeded: {} pages, {} chars", len(pages), len(markdown))
|
| 52 |
+
return markdown
|
| 53 |
+
|
| 54 |
+
async def document_to_structured(self, document_url: str) -> Dict[str, Any]:
|
| 55 |
+
"""Convert a document and return per-page structure alongside merged markdown.
|
| 56 |
+
|
| 57 |
+
Useful when callers need page-level metadata (page index, individual markdown).
|
| 58 |
+
"""
|
| 59 |
+
logger.info("Starting structured OCR for document: {}", document_url)
|
| 60 |
+
try:
|
| 61 |
+
response = await self._client.ocr.process_async(
|
| 62 |
+
model=self._model,
|
| 63 |
+
document={
|
| 64 |
+
"type": "document_url",
|
| 65 |
+
"document_url": document_url,
|
| 66 |
+
},
|
| 67 |
+
table_format=self._table_format,
|
| 68 |
+
)
|
| 69 |
+
except Exception as exc:
|
| 70 |
+
logger.exception("Mistral OCR call failed")
|
| 71 |
+
raise OCRProcessingError(f"OCR processing failed: {exc}") from exc
|
| 72 |
+
|
| 73 |
+
pages: List[Dict[str, Any]] = []
|
| 74 |
+
merged_parts: List[str] = []
|
| 75 |
+
for idx, page in enumerate(getattr(response, "pages", []) or []):
|
| 76 |
+
md = getattr(page, "markdown", "") or ""
|
| 77 |
+
pages.append({"index": idx, "markdown": md})
|
| 78 |
+
if md:
|
| 79 |
+
merged_parts.append(md)
|
| 80 |
+
|
| 81 |
+
if not pages:
|
| 82 |
+
raise OCRProcessingError("OCR returned no pages for the given document.")
|
| 83 |
+
|
| 84 |
+
return {
|
| 85 |
+
"page_count": len(pages),
|
| 86 |
+
"pages": pages,
|
| 87 |
+
"markdown": "\n\n".join(merged_parts),
|
| 88 |
+
}
|
app/tools/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from app.tools.markdown_tools import register_markdown_tools
|
| 2 |
+
|
| 3 |
+
__all__ = ["register_markdown_tools"]
|
app/tools/markdown_tools.py
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Any, Dict
|
| 2 |
+
|
| 3 |
+
from mcp.server.fastmcp import Context, FastMCP
|
| 4 |
+
|
| 5 |
+
from app.core.logger import logger
|
| 6 |
+
from app.services.ocr_service import OCRService
|
| 7 |
+
from app.utils.validators import validate_document_url
|
| 8 |
+
from app.core.exceptions import InvalidDocumentURLError, OCRProcessingError
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def register_markdown_tools(mcp: FastMCP) -> None:
|
| 12 |
+
"""Attach markdown-extraction tools to the given FastMCP server."""
|
| 13 |
+
|
| 14 |
+
@mcp.tool()
|
| 15 |
+
async def pdf_to_markdown(document_url: str, ctx: Context) -> str:
|
| 16 |
+
"""Convert a PDF or document from a URL to Markdown using Mistral OCR.
|
| 17 |
+
|
| 18 |
+
Args:
|
| 19 |
+
document_url: Publicly accessible URL of the PDF / document.
|
| 20 |
+
|
| 21 |
+
Returns:
|
| 22 |
+
Markdown string (all pages concatenated).
|
| 23 |
+
"""
|
| 24 |
+
try:
|
| 25 |
+
url = validate_document_url(document_url)
|
| 26 |
+
except InvalidDocumentURLError as exc:
|
| 27 |
+
logger.warning("Invalid URL rejected: {}", exc)
|
| 28 |
+
return f"Error: {exc}"
|
| 29 |
+
|
| 30 |
+
service = OCRService(client=ctx.request_context.lifespan_context.mistral)
|
| 31 |
+
|
| 32 |
+
try:
|
| 33 |
+
return await service.document_to_markdown(url)
|
| 34 |
+
except OCRProcessingError as exc:
|
| 35 |
+
logger.error("OCR failed for {}: {}", url, exc)
|
| 36 |
+
return f"Error: {exc}"
|
| 37 |
+
|
| 38 |
+
@mcp.tool()
|
| 39 |
+
async def pdf_to_structured_markdown(
|
| 40 |
+
document_url: str, ctx: Context
|
| 41 |
+
) -> Dict[str, Any]:
|
| 42 |
+
"""Convert a document to per-page structured markdown.
|
| 43 |
+
|
| 44 |
+
Returns:
|
| 45 |
+
Dict with keys: page_count (int), pages (list of {index, markdown}),
|
| 46 |
+
markdown (str, merged).
|
| 47 |
+
"""
|
| 48 |
+
try:
|
| 49 |
+
url = validate_document_url(document_url)
|
| 50 |
+
except InvalidDocumentURLError as exc:
|
| 51 |
+
logger.warning("Invalid URL rejected: {}", exc)
|
| 52 |
+
return {"error": str(exc), "page_count": 0, "pages": [], "markdown": ""}
|
| 53 |
+
|
| 54 |
+
service = OCRService(client=ctx.request_context.lifespan_context.mistral)
|
| 55 |
+
|
| 56 |
+
try:
|
| 57 |
+
return await service.document_to_structured(url)
|
| 58 |
+
except OCRProcessingError as exc:
|
| 59 |
+
logger.error("Structured OCR failed for {}: {}", url, exc)
|
| 60 |
+
return {"error": str(exc), "page_count": 0, "pages": [], "markdown": ""}
|
app/utils/response.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Any, Dict, List, Union
|
| 2 |
+
from fastapi import status as http_status
|
| 3 |
+
from fastapi.responses import JSONResponse
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def create_response(
|
| 7 |
+
status_value: bool,
|
| 8 |
+
message: str,
|
| 9 |
+
data: Union[Dict[str, Any], List[Any], None] = None,
|
| 10 |
+
code: int = http_status.HTTP_200_OK,
|
| 11 |
+
) -> JSONResponse:
|
| 12 |
+
"""Create standardized JSON response.
|
| 13 |
+
|
| 14 |
+
Args:
|
| 15 |
+
status_value: Success/failure boolean.
|
| 16 |
+
message: Human-readable message.
|
| 17 |
+
data: Response payload (dict, list, or None).
|
| 18 |
+
code: HTTP status code (default: 200).
|
| 19 |
+
|
| 20 |
+
Returns:
|
| 21 |
+
JSONResponse with structure {status, message, data}.
|
| 22 |
+
"""
|
| 23 |
+
return JSONResponse(
|
| 24 |
+
status_code=code,
|
| 25 |
+
content={
|
| 26 |
+
"status": status_value,
|
| 27 |
+
"message": message,
|
| 28 |
+
"data": data if data is not None else {},
|
| 29 |
+
},
|
| 30 |
+
)
|
app/utils/validators.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from urllib.parse import urlparse
|
| 2 |
+
from app.core.exceptions import InvalidDocumentURLError
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
def validate_document_url(url: str) -> str:
|
| 6 |
+
"""Validate that the given string is a well-formed http(s) URL.
|
| 7 |
+
|
| 8 |
+
Args:
|
| 9 |
+
url: The URL to validate.
|
| 10 |
+
|
| 11 |
+
Returns:
|
| 12 |
+
The trimmed URL.
|
| 13 |
+
|
| 14 |
+
Raises:
|
| 15 |
+
InvalidDocumentURLError: If the URL is malformed.
|
| 16 |
+
"""
|
| 17 |
+
if not url or not isinstance(url, str):
|
| 18 |
+
raise InvalidDocumentURLError("Document URL must be a non-empty string.")
|
| 19 |
+
|
| 20 |
+
url = url.strip()
|
| 21 |
+
parsed = urlparse(url)
|
| 22 |
+
|
| 23 |
+
if parsed.scheme not in {"http", "https"}:
|
| 24 |
+
raise InvalidDocumentURLError(
|
| 25 |
+
f"Unsupported URL scheme: '{parsed.scheme}'. Use http or https."
|
| 26 |
+
)
|
| 27 |
+
if not parsed.netloc:
|
| 28 |
+
raise InvalidDocumentURLError("Document URL is missing a valid host.")
|
| 29 |
+
|
| 30 |
+
return url
|
development.yml
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ─── Mistral ───────────────────────────────────────────────────────────────────
|
| 2 |
+
MISTRAL_OCR_MODEL: mistral-ocr-latest
|
| 3 |
+
MISTRAL_TABLE_FORMAT: markdown
|
| 4 |
+
|
| 5 |
+
# ─── Server ────────────────────────────────────────────────────────────────────
|
| 6 |
+
# Local defaults — Railway overrides HOST and PORT via its Variables tab.
|
| 7 |
+
APP_NAME: "Markdown & Layout Extractor"
|
| 8 |
+
HOST: "127.0.0.1"
|
| 9 |
+
PORT: 8000
|
| 10 |
+
LOG_LEVEL: INFO
|
| 11 |
+
|
| 12 |
+
# ─── CORS ──────────────────────────────────────────────────────────────────────
|
| 13 |
+
CORS_ALLOW_ORIGINS:
|
| 14 |
+
- "*"
|
| 15 |
+
CORS_ALLOW_METHODS:
|
| 16 |
+
- "*"
|
| 17 |
+
CORS_ALLOW_HEADERS:
|
| 18 |
+
- "*"
|
main.py
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import uvicorn
|
| 2 |
+
|
| 3 |
+
from app.server import create_app
|
| 4 |
+
from app.core.config import settings
|
| 5 |
+
|
| 6 |
+
app = create_app()
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
if __name__ == "__main__":
|
| 10 |
+
uvicorn.run(
|
| 11 |
+
app,
|
| 12 |
+
host=settings.HOST,
|
| 13 |
+
port=settings.PORT,
|
| 14 |
+
log_level=settings.LOG_LEVEL.lower(),
|
| 15 |
+
# Trust X-Forwarded-* headers from Railway's edge proxy.
|
| 16 |
+
proxy_headers=True,
|
| 17 |
+
forwarded_allow_ips="*",
|
| 18 |
+
)
|
pyproject.toml
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "pdf-to-md-mcp"
|
| 3 |
+
version = "0.1.0"
|
| 4 |
+
description = "Add your description here"
|
| 5 |
+
readme = "README.md"
|
| 6 |
+
requires-python = ">=3.12"
|
| 7 |
+
dependencies = [
|
| 8 |
+
"fastapi==0.135.3",
|
| 9 |
+
"loguru==0.7.3",
|
| 10 |
+
"mcp[cli]==1.27.0",
|
| 11 |
+
"mistralai==2.3.2",
|
| 12 |
+
"pydantic-settings>=2.13.1",
|
| 13 |
+
"python-dotenv==1.2.2",
|
| 14 |
+
"pyyaml>=6.0.2",
|
| 15 |
+
"starlette==1.0.0",
|
| 16 |
+
]
|
| 17 |
+
|
| 18 |
+
[dependency-groups]
|
| 19 |
+
dev = [
|
| 20 |
+
"black==26.3.1",
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
[tool.black]
|
| 24 |
+
target-version = ["py312"]
|
| 25 |
+
|
| 26 |
+
|
railway.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"$schema": "https://railway.com/railway.schema.json",
|
| 3 |
+
"build": {
|
| 4 |
+
"builder": "NIXPACKS"
|
| 5 |
+
},
|
| 6 |
+
"deploy": {
|
| 7 |
+
"startCommand": "uv run main.py",
|
| 8 |
+
"healthcheckPath": "/health",
|
| 9 |
+
"healthcheckTimeout": 30,
|
| 10 |
+
"restartPolicyType": "ON_FAILURE",
|
| 11 |
+
"restartPolicyMaxRetries": 3
|
| 12 |
+
}
|
| 13 |
+
}
|
sample.env
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
MISTRAL_API_KEY=<YOUR-MISTRAL-API-KEY>
|
uv.lock
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|