Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
title: ML Starter MCP Server
emoji: ๐ง
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server that exposes a problem-specific ML codes
tags:
- building-mcp-track-enterprise
- gradio
- mcp
- retrieval
- embeddings
- python
- knowledge-base
- semantic-search
- sentence-transformers
- huggingface
ML Starter MCP Server
Gradio-powered remote-only MCP server that exposes a curated ML knowledge base through deterministic, read-only tooling. Ideal for editors like Claude Desktop, VS Code (Kilo Code), or Cursor that want a trustworthy retrieval endpoint with no side-effects.
๐งฉ Overview
The ML Starter MCP Server indexes the entire knowledge_base/ tree (audio, vision, NLP, RL, etc.) and makes it searchable through:
list_itemsโ enumerate every tutorial/script with metadata.semantic_searchโ vector search over docstrings and lead context to find the single best code example for a natural-language brief.get_codeโ return the full Python source for a safe, validated path.
The server is deterministic (seeded numpy/torch), write-protected, and designed to run as a Gradio MCP SSE endpoint suitable for Hugging Face Spaces or on-prem deployments.
๐ ML Starter Knowledge Base
- Root:
knowledge_base/ - Domains:
audio/generative/graph/nlp/rl/structured_data/timeseries/vision/
- Each file stores a complete, runnable ML example with docstring summaries leveraged during indexing.
Features exposed via MCP
- โ
Vector search via
sentence-transformers/all-MiniLM-L6-v2with cosine similarity. - โ๏ธ Safe path resolution ensures only in-repo
.pyfiles can be fetched. - ๐งฎ Metadata-first outputs (category, filename, semantic score) for quick triage.
- ๐ก๏ธ Read-only contract; zero KB mutations, uploads, or side effects.
- ๐ Spaces-ready networking with auto
0.0.0.0binding when environment variables are provided by the platform.
๐ฌ Demo
๐ Quick Start
Installation
pip install -r requirements.txt
MCP Settings
{
"mcpServers": {
"ML-Starter": {
"url": "https://mcp-1st-birthday-ml-starter.hf.space/gradio_api/mcp/"
}
}
}
Environment Variables
export TOKENIZERS_PARALLELISM=false
export PYTORCH_ENABLE_MPS_FALLBACK=1 # optional, improves macOS stability
๐ง MCP Usage
Any MCP-capable client can connect to the SSE endpoint to:
- Browse the full inventory of ML tutorials.
- Submit a markdown problem statement and receive the best-matching file path plus relevance score.
- Fetch the code immediately and render it inline (clients typically syntax-highlight the response).
The Gradio UI mirrors these capabilities via three tabs (List Items, Semantic Search, Get Code) for manual exploration.
๐ค Supported Embeddings
sentence-transformers/all-MiniLM-L6-v2
Configuration Example
embedding_model: sentence-transformers/all-MiniLM-L6-v2
batch_size: 32
similarity: cosine
๐ Retrieval Strategy
| Component | Description |
|---|---|
| Index Type | In-memory cosine index backed by numpy vectors |
| Chunking | File-level (docstring + prefix) |
| Similarity Function | Dot product on L2-normalized vectors |
| Results Returned | Top-1 match (deterministic) |
Configuration Example
retriever: cosine
max_results: 1
๐งฉ Folder Structure
ml-starter/
โโโ app.py # Optional Gradio hook
โโโ mcp_server/
โ โโโ server.py # Remote MCP entrypoint & UI builder
โ โโโ loader.py # KB scanning + safe path resolution
โ โโโ embeddings.py # MiniLM wrapper + cosine index
โ โโโ tools/
โ โโโ list_items.py # list_items()
โ โโโ semantic_search.py # semantic_search()
โ โโโ get_code.py # get_code()
โโโ knowledge_base/ # ML examples grouped by domain
โโโ requirements.txt
โโโ README.md
๐ง MCP Tools (mcp_server/server.py)
| MCP Tool | Python Function | Description |
|---|---|---|
list_items |
list_items() |
Enumerates every KB entry with category, filename, absolute path, and summary metadata. |
semantic_search |
semantic_search(problem_markdown: str) |
Embeds the prompt and returns the single best match plus cosine score. |
get_code |
get_code(path: str) |
Streams back the full Python source for a validated KB path. |
server.py registers these functions with Gradio's MCP adapter, wires docstrings into tool descriptions, and ensures the SSE endpoint stays read-only.
๐ฅ Inputs
1. list_items
No input parameters; returns the entire catalog.
2. semantic_search
Input Model
| Field | Type | Description | Example |
|---|---|---|---|
| problem_markdown | str | Natural-language description of the ML task or need. | "I need a transformer example for multilingual NER." |
3. get_code
Input Model
| Field | Type | Description | Example |
|---|---|---|---|
| path | str | KB-relative or absolute path to a .py file. |
"knowledge_base/nlp/text_classification_from_scratch.py" |
๐ค Outputs
1. list_items
Response Example
[
{
"id": "nlp/text_classification_with_transformer.py",
"category": "nlp",
"filename": "text_classification_with_transformer.py",
"path": "knowledge_base/nlp/text_classification_with_transformer.py",
"summary": "Fine-tune a Transformer for sentiment classification."
}
]
2. semantic_search
Response Example
{
"best_match": "knowledge_base/nlp/text_classification_with_transformer.py",
"score": 0.89
}
3. get_code
Response Example
{
"path": "knowledge_base/vision/grad_cam.py",
"source": "<full Python source>"
}
Each response is deterministic for the same corpus and embeddings, allowing MCP clients to trust caching and diffing workflows.
๐ฅ Team
Team Name: Hepheon
Team Members:
- Tutkum Akyildiz - @Tutkum - Product
- Emre Atilgan - @emreatilgan - Tech
๐ฃ Social Media Post
๐ ๏ธ Next Steps
Today the knowledge base focuses on curated Keras walkthroughs. Upcoming updates will expand coverage to include:
- TensorFlow
- PyTorch
- scikit-learn
- ...
These additions will land in the same deterministic retrieval flow, making mixed-framework discovery as seamless as the current experience.
๐ License
This project is licensed under the Apache License 2.0. See the LICENSE file for full terms.
Built with โค๏ธ for the ML Starter knowledge base โข Apache 2.0
