ML-Starter / README.md
emreatilgan's picture
Update README.md
d507c6e verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: ML Starter MCP Server
emoji: ๐Ÿง 
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server that exposes a problem-specific ML codes
tags:
  - building-mcp-track-enterprise
  - gradio
  - mcp
  - retrieval
  - embeddings
  - python
  - knowledge-base
  - semantic-search
  - sentence-transformers
  - huggingface

ML Starter MCP Server

ML Starter Banner

Gradio-powered remote-only MCP server that exposes a curated ML knowledge base through deterministic, read-only tooling. Ideal for editors like Claude Desktop, VS Code (Kilo Code), or Cursor that want a trustworthy retrieval endpoint with no side-effects.

Python License Status MCP Retrieval SentenceTransformers


๐Ÿงฉ Overview

The ML Starter MCP Server indexes the entire knowledge_base/ tree (audio, vision, NLP, RL, etc.) and makes it searchable through:

  • list_items โ€“ enumerate every tutorial/script with metadata.
  • semantic_search โ€“ vector search over docstrings and lead context to find the single best code example for a natural-language brief.
  • get_code โ€“ return the full Python source for a safe, validated path.

The server is deterministic (seeded numpy/torch), write-protected, and designed to run as a Gradio MCP SSE endpoint suitable for Hugging Face Spaces or on-prem deployments.


๐Ÿ“š ML Starter Knowledge Base

  • Root: knowledge_base/
  • Domains:
    • audio/
    • generative/
    • graph/
    • nlp/
    • rl/
    • structured_data/
    • timeseries/
    • vision/
  • Each file stores a complete, runnable ML example with docstring summaries leveraged during indexing.

Features exposed via MCP

  • โœ… Vector search via sentence-transformers/all-MiniLM-L6-v2 with cosine similarity.
  • โš™๏ธ Safe path resolution ensures only in-repo .py files can be fetched.
  • ๐Ÿงฎ Metadata-first outputs (category, filename, semantic score) for quick triage.
  • ๐Ÿ›ก๏ธ Read-only contract; zero KB mutations, uploads, or side effects.
  • ๐ŸŒ Spaces-ready networking with auto 0.0.0.0 binding when environment variables are provided by the platform.

๐ŸŽฌ Demo

Watch the video


๐Ÿš€ Quick Start

Installation

pip install -r requirements.txt

MCP Settings

{
  "mcpServers": {
    "ML-Starter": {
      "url": "https://mcp-1st-birthday-ml-starter.hf.space/gradio_api/mcp/"
    }
  }
}

Environment Variables

export TOKENIZERS_PARALLELISM=false
export PYTORCH_ENABLE_MPS_FALLBACK=1  # optional, improves macOS stability

๐Ÿง  MCP Usage

Any MCP-capable client can connect to the SSE endpoint to:

  • Browse the full inventory of ML tutorials.
  • Submit a markdown problem statement and receive the best-matching file path plus relevance score.
  • Fetch the code immediately and render it inline (clients typically syntax-highlight the response).

The Gradio UI mirrors these capabilities via three tabs (List Items, Semantic Search, Get Code) for manual exploration.


๐Ÿ”ค Supported Embeddings

  • sentence-transformers/all-MiniLM-L6-v2

Configuration Example

embedding_model: sentence-transformers/all-MiniLM-L6-v2
batch_size: 32
similarity: cosine

๐Ÿ” Retrieval Strategy

Component Description
Index Type In-memory cosine index backed by numpy vectors
Chunking File-level (docstring + prefix)
Similarity Function Dot product on L2-normalized vectors
Results Returned Top-1 match (deterministic)

Configuration Example

retriever: cosine
max_results: 1

๐Ÿงฉ Folder Structure

ml-starter/
โ”œโ”€โ”€ app.py                  # Optional Gradio hook
โ”œโ”€โ”€ mcp_server/
โ”‚   โ”œโ”€โ”€ server.py           # Remote MCP entrypoint & UI builder
โ”‚   โ”œโ”€โ”€ loader.py           # KB scanning + safe path resolution
โ”‚   โ”œโ”€โ”€ embeddings.py       # MiniLM wrapper + cosine index
โ”‚   โ””โ”€โ”€ tools/
โ”‚       โ”œโ”€โ”€ list_items.py   # list_items()
โ”‚       โ”œโ”€โ”€ semantic_search.py  # semantic_search()
โ”‚       โ””โ”€โ”€ get_code.py     # get_code()
โ”œโ”€โ”€ knowledge_base/         # ML examples grouped by domain
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ”ง MCP Tools (mcp_server/server.py)

MCP Tool Python Function Description
list_items list_items() Enumerates every KB entry with category, filename, absolute path, and summary metadata.
semantic_search semantic_search(problem_markdown: str) Embeds the prompt and returns the single best match plus cosine score.
get_code get_code(path: str) Streams back the full Python source for a validated KB path.

server.py registers these functions with Gradio's MCP adapter, wires docstrings into tool descriptions, and ensures the SSE endpoint stays read-only.


๐Ÿ“ฅ Inputs

1. list_items

No input parameters; returns the entire catalog.

2. semantic_search

Input Model
Field Type Description Example
problem_markdown str Natural-language description of the ML task or need. "I need a transformer example for multilingual NER."

3. get_code

Input Model
Field Type Description Example
path str KB-relative or absolute path to a .py file. "knowledge_base/nlp/text_classification_from_scratch.py"

๐Ÿ“ค Outputs

1. list_items

Response Example
[
  {
    "id": "nlp/text_classification_with_transformer.py",
    "category": "nlp",
    "filename": "text_classification_with_transformer.py",
    "path": "knowledge_base/nlp/text_classification_with_transformer.py",
    "summary": "Fine-tune a Transformer for sentiment classification."
  }
]

2. semantic_search

Response Example
{
  "best_match": "knowledge_base/nlp/text_classification_with_transformer.py",
  "score": 0.89
}

3. get_code

Response Example
{
  "path": "knowledge_base/vision/grad_cam.py",
  "source": "<full Python source>"
}

Each response is deterministic for the same corpus and embeddings, allowing MCP clients to trust caching and diffing workflows.


๐Ÿ‘ฅ Team

Team Name: Hepheon

Team Members:


๐Ÿ“ฃ Social Media Post


๐Ÿ› ๏ธ Next Steps

Today the knowledge base focuses on curated Keras walkthroughs. Upcoming updates will expand coverage to include:

  • TensorFlow
  • PyTorch
  • scikit-learn
  • ...

These additions will land in the same deterministic retrieval flow, making mixed-framework discovery as seamless as the current experience.


๐Ÿ“˜ License

This project is licensed under the Apache License 2.0. See the LICENSE file for full terms.


Built with โค๏ธ for the ML Starter knowledge base โ€ข Apache 2.0