Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: DataPass - Query Private HF Datasets via MCP
emoji: π«
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
- email
tags:
- building-mcp-track-xx
- mcp
- data-monetization
- subscription
- duckdb
license: mit
DataPass: Your Pass to Private Data
Track 1: Building MCP - Give AI assistants instant query access to private datasets, with built-in monetization
Author: waroca
Try It Now (Judges)
Test DataPass instantly with a 10-day access pass to the Nano-Banana Pro Prompt Collection - a private dataset of 80 curated AI image generation prompts across 27 categories.
Claude Desktop
Add to claude_desktop_config.json (Claude Desktop requires mcp-remote for SSE servers):
| Platform | Config Location |
|---|---|
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
{
"mcpServers": {
"nano-pro-prompts-dataset": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://waroca-datapass-server.hf.space/sse",
"--header",
"Authorization:Bearer hfdata_amcoJbyjPQW82MX5Ryd0bB7Nt35Ti0eCyBxsLRq3Pjw"
]
}
}
}
Cursor
Add to .cursor/mcp.json (Cursor has native SSE support):
{
"mcpServers": {
"nano-pro-prompts-dataset": {
"url": "https://waroca-datapass-server.hf.space/sse",
"headers": {
"Authorization": "Bearer hfdata_amcoJbyjPQW82MX5Ryd0bB7Nt35Ti0eCyBxsLRq3Pjw"
}
}
}
}
Try These Queries
Once connected, ask your AI assistant:
- "What datasets do I have access to?" - See your subscription details
- "Show me the schema of the nano prompts dataset" - Explore the structure
- "What categories of prompts are available?" - Query the data
- "Give me 3 prompts for product visualization" - Get specific results
- "How many prompts require reference images?" - Run analytics
The dataset stays private on Hugging Face - you only see query results.
The Problem
AI assistants are powerful, but they can't access private data. Dataset owners face an impossible choice:
- Share raw files? Lose control forever. No way to revoke, no way to track usage.
- Keep it locked? Miss out on revenue and users who need your data.
The Solution: Query Access, Not Download Access
DataPass is an MCP server that lets AI assistants query private Hugging Face datasets - without ever downloading them.
- Users ask questions in plain English or SQL
- DataPass validates their subscription, runs the query, returns only the results
- The raw data never leaves Hugging Face
User: "Show me the top 5 sales categories"
β
βΌ
βββββββββββββββββββββββββββ
β DataPass MCP Server β
β β
β β Valid subscription β
β β Convert to SQL β
β β Query via DuckDB ββββββΆ Private HF Dataset
β β Return results only β (never exposed)
βββββββββββββββββββββββββββ
β
βΌ
[{category: "Electronics", sales: 50000}, ...]
Why DataPass?
For Dataset Owners: Monetize Without Losing Control
| Benefit | How |
|---|---|
| Keep data private | Raw files stay on HF, users only see query results |
| Monetize instantly | Set up Stripe pricing (free trials, monthly, one-time) |
| Time-limited access | Subscriptions auto-expire, no credential management |
| Revoke anytime | Cancel a subscription, access stops immediately |
| Track usage | See who's querying, what they're asking |
For AI Users: Instant Access to Premium Data
| Benefit | How |
|---|---|
| No downloads | Query terabytes without filling your disk |
| Plain English | Ask questions naturally, DataPass handles the SQL |
| Works everywhere | Claude Desktop, Cursor, any MCP-compatible client |
| Pay for what you need | Subscribe monthly, or just get a day pass |
Quick Start
For Dataset Owners (3 minutes)
- Add your dataset to the DataPass catalog via the admin dashboard
- Set pricing - free trial, monthly subscription, or one-time access
- Share the link - users subscribe and start querying immediately
For AI Users (1 minute)
- Browse the DataPass catalog
- Subscribe (free or paid) to datasets you need
- Connect your AI assistant and start querying
// Claude Desktop config
{
"mcpServers": {
"datapass": {
"url": "https://waroca-datapass-server.hf.space/sse"
}
}
}
MCP Tools
| Tool | What it does |
|---|---|
get_dataset_catalog() |
Browse available datasets and pricing |
get_dataset_sample(dataset_id) |
Preview sample rows before subscribing |
query_dataset(dataset_id, sql) |
Run SQL queries on private data |
query_dataset_natural_language(dataset_id, question) |
Ask in plain English |
Example Conversation
You: "What datasets can I access?"
Claude: You have access to "Sales Analytics" (expires Jan 15, 2025)
- 2.3M rows, 12 columns
- Updated daily
You: "What were the top 5 product categories last month?"
Claude: Here are the top categories by revenue:
| Category | Revenue | Orders |
|-------------|-----------|--------|
| Electronics | $1.2M | 8,420 |
| Apparel | $890K | 12,100 |
| Home | $650K | 5,200 |
...
No downloads. No credentials to manage. Just answers.
Monetization Use Cases
| Use Case | How DataPass Helps |
|---|---|
| Premium datasets | Sell access to curated, high-quality data |
| Research data | Let researchers query without sharing raw data |
| Analytics as a service | Monetize proprietary business intelligence |
| Gated previews | Free samples, paid full access |
| Time-limited access | Day passes, monthly subs, or perpetual |
Architecture
βββββββββββββββββββββββ βββββββββββββββββββββββ
β Subscriber UI β β Owner Dashboard β
β (Gradio Space) β β (Gradio Space) β
β - Browse & Subscribeβ β - Add Datasets β
β - View My Passes β β - Set Pricing β
βββββββββββ¬ββββββββββββ βββββββββββ¬ββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β DataPass MCP Server (FastMCP) β
β βββββββββββββββ βββββββββββββββ ββββββββββββ β
β β Subscriptionβ β NL-to-SQL β β DuckDB β β
β β Validation β β (Hyperbolic)β β Queries β β
β βββββββββββββββ βββββββββββββββ ββββββββββββ β
β β β
β βΌ β
β Private HF Datasets (never downloaded) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Tech Stack
| Component | Technology |
|---|---|
| MCP Server | FastMCP + FastAPI |
| Query Engine | DuckDB (queries parquet directly from HF) |
| NL-to-SQL | Qwen/Qwen2.5-Coder-32B-Instruct via Hyperbolic |
| Payments | Stripe Checkout + Webhooks |
| Pass Storage | HF Dataset (append-only JSONL ledger) |
| Frontends | Gradio 6.0 + HF OAuth |
Note: The entire platform runs on Hugging Face (Spaces, Datasets, OAuth) except for the natural language to SQL translation, which uses Qwen2.5-Coder-32B-Instruct hosted on Hyperbolic for inference.
Security Model
| What | How |
|---|---|
| Dataset files | Never leave HF - queried in place via DuckDB |
| Access control | Pass validated on every request |
| Time limits | Subscription expiry enforced server-side |
| Revocation | Owner can invalidate passes anytime |
| Query limits | Results capped, no SELECT * dumps |
Local Development
# Install dependencies
uv sync
# Run MCP server
cd mcp-server && uv run uvicorn server:app --host 0.0.0.0 --port 8000
# Run frontend
cd frontend && uv run python app.py
Deployment
export HF_TOKEN=hf_...
uv run scripts/deploy.py --deploy-all --username YOUR_USERNAME --create
Project Structure
datapass/
βββ mcp-server/
β βββ server.py # MCP tools + query execution
β βββ datasets_registry.py # Dataset catalog management
β βββ subscriptions_ledger.py # Pass validation
β βββ stripe_webhook.py # Payment handling
β βββ auth.py # HF OAuth
βββ frontend/ # Subscriber Gradio app
βββ admin-frontend/ # Owner dashboard
βββ scripts/ # Deployment
Links
DataPass: Your pass to private datasets.
