Spaces:
Running
Running
| title: Multi-LLM API Gateway | |
| emoji: π‘οΈ | |
| colorFrom: indigo | |
| colorTo: red | |
| sdk: docker | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: LLM API Gateway with MCP interface | |
| # Multi-LLM API Gateway with MCP interface | |
| β or Universal MCP Hub (Sandboxed) | |
| β or universal AI Wrapper over SSE + Quart with some tools on a solid fundament | |
| aka: a clean, secure starting point for your own projects. | |
| Pick the description that fits your use case. They're all correct. | |
| > A production-grade MCP server that actually thinks about security. | |
| > Built on [PyFundaments](PyFundaments.md) β running on **simpleCity** and **paranoidMode**. | |
| ``` | |
| No key β no tool β no crash β no exposed secrets | |
| ``` | |
| Most MCP servers are prompts dressed up as servers. This one has a real architecture. | |
| --- | |
| ## Why this exists | |
| While building this, we kept stumbling over the same problem β the MCP | |
| ecosystem is full of servers with hardcoded keys, `os.environ` scattered | |
| everywhere, zero sandboxing. One misconfigured fork and your API keys are gone. | |
| This is exactly the kind of negligence (and worse β outright fraud) that | |
| [Wall of Shames](https://github.com/Wall-of-Shames) documents: a | |
| community project exposing fake "AI tools" that exploit non-technical users | |
| β API wrappers dressed up as custom models, Telegram payment funnels, | |
| bought stars. If you build on open source, you should know this exists. | |
| This hub was built as the antidote: | |
| - **Structural sandboxing** β `app/*` can never touch `fundaments/` or `.env`. Not by convention. By design. | |
| - **Guardian pattern** β `main.py` is the only process that reads secrets. It injects validated services as a dict. `app/*` never sees the raw environment. | |
| - **Graceful degradation** β No key? Tool doesn't register. Server still starts. No crash, no error, no empty `None` floating around. | |
| - **Single source of truth** β All tool/provider/model config lives in `app/.pyfun`. Adding a provider = edit one file. No code changes. | |
| --- | |
| ## Architecture | |
| ``` | |
| main.py (Guardian) | |
| β | |
| β reads .env / HF Secrets | |
| β initializes fundaments/* conditionally | |
| β injects validated services as dict | |
| β | |
| ββββΊ app/app.py (Orchestrator, sandboxed) | |
| β | |
| β unpacks fundaments ONCE, at startup, never stores globally | |
| β starts hypercorn (async ASGI) | |
| β routes: GET / | POST /api | GET+POST /mcp | |
| β | |
| βββ app/mcp.py β FastMCP + SSE handler | |
| βββ app/tools.py β Tool registry (key-gated) | |
| βββ app/provider.py β LLM + Search execution + fallback chain | |
| βββ app/models.py β Model limits, costs, capabilities | |
| βββ app/config.py β .pyfun parser (single source of truth) | |
| βββ app/db_sync.py β Internal SQLite IPC (app/* state only) | |
| β fundaments/postgresql.py (Guardian-only) | |
| ``` | |
| **The sandbox is structural:** | |
| ```python | |
| # app/app.py β fundaments are unpacked ONCE, NEVER stored globally | |
| async def start_application(fundaments: Dict[str, Any]) -> None: | |
| config_service = fundaments["config"] | |
| db_service = fundaments["db"] # None if not configured | |
| encryption_service = fundaments["encryption"] # None if keys missing | |
| access_control_service = fundaments["access_control"] | |
| ... | |
| # From here: app/* reads its own config from app/.pyfun only. | |
| # fundaments are never passed into other app/* modules. | |
| ``` | |
| `app/app.py` never calls `os.environ`. Never imports from `fundaments/`. Never reads `.env`. | |
| This isn't documentation. It's enforced by the import structure. | |
| ### Why Quart + hypercorn? | |
| MCP over SSE needs a proper async HTTP stack. The choice here is deliberate: | |
| **Quart** is async Flask β same API, same routing, but fully `async/await` native. This matters because FastMCP's SSE handler is async, and mixing sync Flask with async MCP would require thread hacks or `asyncio.run()` gymnastics. With Quart, the `/mcp` route hands off directly to `mcp.handle_sse(request)` β no bridging, no blocking. | |
| **hypercorn** is an ASGI server (vs. waitress/gunicorn which are WSGI). WSGI servers handle one request per thread β fine for traditional web apps, wrong for SSE where a connection stays open for minutes. hypercorn handles SSE connections as long-lived async streams without tying up threads. It also runs natively on HuggingFace Spaces without extra config. | |
| The `/mcp` route in `app.py` is also the natural interception point β auth checks, rate limiting, payload logging can all be added there before the request ever reaches FastMCP. That's not possible when FastMCP runs standalone. | |
| --- | |
| ## Two Databases β One Architecture | |
| This hub runs **two completely separate databases** with distinct responsibilities. This is not redundancy β it's a deliberate performance and security decision. | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Guardian Layer (fundaments/*) β | |
| β β | |
| β postgresql.py β Cloud DB (e.g. Neon, Supabase) β | |
| β asyncpg pool, SSL enforced β | |
| β Neon-specific quirks handled β | |
| β (statement_timeout stripped, keepalives) β | |
| β β | |
| β user_handler.py β SQLite (users + sessions tables) β | |
| β PBKDF2-SHA256 password hashing β | |
| β Session validation incl. IP + UserAgent β | |
| β Account lockout after 5 failed attempts β | |
| β Path: SQLITE_PATH env var or app/ β | |
| β β | |
| ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ | |
| β inject as fundaments dict | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β App Layer (app/*) β | |
| β β | |
| β db_sync.py β SQLite (hub_state + tool_cache tables) β | |
| β aiosqlite (async, non-blocking) β | |
| β NEVER touches users/sessions tables β | |
| β Relocated to /tmp/ on HF Spaces auto β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Why two SQLite databases?** | |
| `user_handler.py` (Guardian) owns `users` and `sessions` β authentication state that must be isolated from the app layer. `db_sync.py` (app/*) owns `hub_state` and `tool_cache` β fast, async IPC between tools that doesn't need to leave the process, let alone hit a cloud endpoint. | |
| A tool caching a previous LLM response or storing intermediate state between pipeline steps should never wait on a round-trip to Neon. Local SQLite is microseconds. Cloud PostgreSQL is 50-200ms per query. For tool-to-tool communication, that difference matters. | |
| **Table ownership β hard rule:** | |
| | Table | Owner | Access | | |
| | :--- | :--- | :--- | | |
| | `users` | `fundaments/user_handler.py` | Guardian only | | |
| | `sessions` | `fundaments/user_handler.py` | Guardian only | | |
| | `hub_state` | `app/db_sync.py` | app/* only | | |
| | `tool_cache` | `app/db_sync.py` | app/* only | | |
| `db_sync.py` uses the same SQLite path (`SQLITE_PATH`) as `user_handler.py` β same file, different tables, zero overlap. The `db_query` MCP tool exposes SELECT-only access to `hub_state` and `tool_cache`. It cannot reach `users` or `sessions`. | |
| **Cloud DB (postgresql.py):** | |
| Handles the heavy cases β persistent storage, workflow tool results that need to survive restarts, anything that benefits from a real relational DB. Neon-specific quirks are handled automatically: `statement_timeout` is stripped from the DSN (Neon doesn't support it), SSL is enforced at `require` minimum, keepalives are set, and terminated connections trigger an automatic pool restart. | |
| If no `DATABASE_URL` is set, the entire cloud DB layer is skipped cleanly. The app runs without it. | |
| --- | |
| ## Tools | |
| Tools register themselves at startup β only if the required API key exists in the environment. No key, no tool. The server always starts. | |
| | ENV Secret | Tool | Notes | | |
| | :--- | :--- | :--- | | |
| | `ANTHROPIC_API_KEY` | `llm_complete` | Claude Haiku / Sonnet / Opus | | |
| | `GEMINI_API_KEY` | `llm_complete` | Gemini 2.0 / 2.5 / 3.x Flash & Pro | | |
| | `OPENROUTER_API_KEY` | `llm_complete` | 100+ models via OpenRouter | | |
| | `HF_TOKEN` | `llm_complete` | HuggingFace Inference API | | |
| | `BRAVE_API_KEY` | `web_search` | Independent web index | | |
| | `TAVILY_API_KEY` | `web_search` | AI-optimized search with synthesized answers | | |
| | `DATABASE_URL` | `db_query` | Read-only SELECT β enforced at app level | | |
| | *(always)* | `list_active_tools` | Shows key names only β never values | | |
| | *(always)* | `health_check` | Status + uptime | | |
| | *(always)* | `get_model_info` | Limits, costs, capabilities per model | | |
| **Configured in `.pyfun` β not hardcoded:** | |
| ```ini | |
| [TOOL.code_review] | |
| active = "true" | |
| description = "Review code for bugs, security issues and improvements" | |
| provider_type = "llm" | |
| default_provider = "anthropic" | |
| timeout_sec = "60" | |
| system_prompt = "You are an expert code reviewer. Analyze the given code for bugs, security issues, and improvements. Be specific and concise." | |
| [TOOL.code_review_END] | |
| ``` | |
| Current built-in tools: `llm_complete`, `code_review`, `summarize`, `translate`, `web_search`, `db_query` | |
| Future hooks (commented, ready): `image_gen`, `code_exec`, `shellmaster`, Discord, GitHub webhooks | |
| --- | |
| ## LLM Fallback Chain | |
| All LLM providers share one `llm_complete` tool. If a provider fails, the hub automatically walks the fallback chain defined in `.pyfun`: | |
| ``` | |
| anthropic β gemini β openrouter β huggingface | |
| ``` | |
| Fallbacks are configured per-provider, not hardcoded: | |
| ```ini | |
| [LLM_PROVIDER.anthropic] | |
| fallback_to = "gemini" | |
| [LLM_PROVIDER.anthropic_END] | |
| [LLM_PROVIDER.gemini] | |
| fallback_to = "openrouter" | |
| [LLM_PROVIDER.gemini_END] | |
| ``` | |
| Same pattern applies to search providers (`brave β tavily`). | |
| --- | |
| ## Quick Start | |
| ### HuggingFace Spaces (recommended) | |
| 1. Fork / duplicate this Space | |
| 2. Go to **Settings β Variables and secrets** | |
| 3. Add the API keys you have (any subset works) | |
| 4. Space starts automatically β only tools with valid keys register | |
| That's it. No config editing. No code changes. | |
| [β Live Demo Space](https://huggingface.co/spaces/codey-lab/Multi-LLM-API-Gateway) (no LLM keys set!) | |
| ### Local / Docker | |
| ```bash | |
| git clone https://github.com/VolkanSah/Multi-LLM-API-Gateway | |
| cd Multi-LLM-API-Gateway | |
| cp example-mcp___.env .env | |
| # fill in your keys | |
| pip install -r requirements.txt | |
| python main.py | |
| ``` | |
| Minimum required ENV vars (everything else is optional): | |
| ```env | |
| PYFUNDAMENTS_DEBUG="" | |
| LOG_LEVEL="INFO" | |
| LOG_TO_TMP="" | |
| ENABLE_PUBLIC_LOGS="true" | |
| HF_TOKEN="" | |
| HUB_SPACE_URL="" | |
| MCP_TRANSPORT="sse" | |
| ``` | |
| --- | |
| ## Connect an MCP Client | |
| ### Claude Desktop / any SSE-compatible client | |
| ```json | |
| { | |
| "mcpServers": { | |
| "universal-mcp-hub": { | |
| "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| ### Private Space (with HF token) | |
| ```json | |
| { | |
| "mcpServers": { | |
| "universal-mcp-hub": { | |
| "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/sse", | |
| "headers": { | |
| "Authorization": "Bearer hf_..." | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Desktop Client | |
| A full PySide6 desktop client is included in `DESKTOP_CLIENT/hub.py` β ideal for private or non-public Spaces where you don't want to expose the SSE endpoint. | |
| ```bash | |
| pip install PySide6 httpx | |
| # optional file handling: | |
| pip install Pillow PyPDF2 pandas openpyxl | |
| python DESKTOP_CLIENT/hub.py | |
| ``` | |
| **Features:** | |
| - Multi-chat with persistent history (`~/.mcp_desktop.json`) | |
| - Tool/Provider/Model selector loaded live from your Hub | |
| - File attachments: images, PDF, CSV, Excel, ZIP, source code | |
| - Connect tab with health check + auto-load | |
| - Settings: HF Token + Hub URL saved locally, never sent anywhere except your own Hub | |
| - Full request/response log with timestamps | |
| - Runs on Windows, Linux, macOS | |
| [β Desktop Client docs](DESKTOP_CLIENT/README.md) | |
| --- | |
| ## Configuration (.pyfun) | |
| `app/.pyfun` is the single source of truth for all app behavior. Three tiers β use what you need: | |
| ``` | |
| LAZY: [HUB] + one [LLM_PROVIDER.*] β works | |
| NORMAL: + [SEARCH_PROVIDER.*] + [MODELS.*] β works better | |
| PRODUCTIVE: + [TOOLS] + [HUB_LIMITS] + [DB_SYNC] β full power | |
| ``` | |
| Adding a new LLM provider requires two steps β `.pyfun` + one line in `providers.py`: | |
| ```ini | |
| # 1. app/.pyfun β add provider block | |
| [LLM_PROVIDER.mistral] | |
| active = "true" | |
| base_url = "https://api.mistral.ai/v1" | |
| env_key = "MISTRAL_API_KEY" | |
| default_model = "mistral-large-latest" | |
| models = "mistral-large-latest, mistral-small-latest, codestral-latest" | |
| fallback_to = "" | |
| [LLM_PROVIDER.mistral_END] | |
| ``` | |
| ```python | |
| # 2. app/providers.py β uncomment the dummy + register it | |
| _PROVIDER_CLASSES = { | |
| ... | |
| "mistral": MistralProvider, # β uncomment to activate | |
| } | |
| ``` | |
| `providers.py` ships with ready-to-use commented dummy classes for OpenAI, Mistral, and xAI/Grok β each with the matching `.pyfun` block right above it. Most OpenAI-compatible APIs need zero changes to the class itself, just a different `base_url` and `env_key`. Search providers (Brave, Tavily) follow the same pattern and are next on the roadmap. | |
| Model limits, costs, and capabilities are also configured here β `get_model_info` reads directly from `.pyfun`: | |
| ```ini | |
| [MODEL.claude-sonnet-4-6] | |
| provider = "anthropic" | |
| context_tokens = "200000" | |
| max_output_tokens = "16000" | |
| requests_per_min = "50" | |
| cost_input_per_1k = "0.003" | |
| cost_output_per_1k = "0.015" | |
| capabilities = "text, code, analysis, vision" | |
| [MODEL.claude-sonnet-4-6_END] | |
| ``` | |
| --- | |
| ## Dependencies | |
| ``` | |
| # PyFundaments Core (always required) | |
| asyncpg β async PostgreSQL pool (Guardian/cloud DB) | |
| python-dotenv β .env loading | |
| passlib β PBKDF2 password hashing in user_handler.py | |
| cryptography β encryption layer in fundaments/ | |
| # MCP Hub | |
| fastmcp β MCP protocol + tool registration | |
| httpx β async HTTP for all provider API calls | |
| quart β async Flask (ASGI) β needed for SSE + hypercorn | |
| hypercorn β ASGI server β long-lived SSE connections, HF Spaces native | |
| requests β sync HTTP for tool workers | |
| # Optional (uncomment in requirements.txt as needed) | |
| # aiofiles β async file ops (ML pipelines, file uploads) | |
| # discord.py β Discord bot integration (app/discord_api.py, planned) | |
| # PyNaCl β Discord signature verification | |
| # psycopg2-binary β alternative PostgreSQL driver | |
| ``` | |
| The core stack is intentionally lean. `asyncpg` + `quart` + `hypercorn` + `fastmcp` + `httpx` covers the full MCP server. Everything else is opt-in. | |
| --- | |
| ## Security Design | |
| - API keys live in HF Secrets / `.env` β never in `.pyfun`, never in code | |
| - `list_active_tools` returns key **names** only β never values | |
| - `db_query` is SELECT-only, enforced at application level (not just docs) | |
| - `app/*` has zero import access to `fundaments/` internals | |
| - Direct execution of `app/app.py` is blocked by design β prints a warning and uses a null-fundaments dict | |
| - `fundaments/` is initialized conditionally β missing services degrade gracefully, they don't crash | |
| > PyFundaments is not perfect. But it's more secure than most of what runs in production today. | |
| [β Full Security Policy](SECURITY.md) | |
| --- | |
| ## Foundation | |
| This hub is built on [PyFundaments](PyFundaments.md) β a security-first Python boilerplate providing: | |
| - `config_handler.py` β env loading with validation | |
| - `postgresql.py` β async DB pool (Guardian-only) | |
| - `encryption.py` β key-based encryption layer | |
| - `access_control.py` β role/permission management | |
| - `user_handler.py` β user lifecycle management | |
| - `security.py` β unified security manager composing the above | |
| None of these are accessible from `app/*`. They are injected as a validated dict by `main.py`. | |
| [β PyFundaments Function Overview](PyFundaments%20β%20Function%20Overview.md) | |
| [β Module Docs](docs/app/) | |
| [β Source of this REPO](https://github.com/VolkanSah/Multi-LLM-API-Gateway) | |
| --- | |
| ## History | |
| [ShellMaster](https://github.com/VolkanSah/ChatGPT-ShellMaster) (2023, MIT) was the precursor β browser-accessible shell for ChatGPT with session memory via `/tmp/shellmaster_brain.log`, built before MCP was even a concept. Universal MCP Hub is its natural evolution. | |
| --- | |
| ## License | |
| Dual-licensed: | |
| - [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) | |
| - [Ethical Security Operations License v1.1 (ESOL)](ESOL) β mandatory, non-severable | |
| By using this software you agree to all ethical constraints defined in ESOL v1.1. | |
| --- | |
| *Architecture, security decisions, and PyFundaments by Volkan KΓΌcΓΌkbudak.* | |
| *Built with Claude (Anthropic) as a typing assistant for docs & the occasional bug.* | |
| > crafted with passion β just wanted to understand how it works, don't actually need it, have a CLI π |