datapass / README.md
waroca's picture
Upload folder using huggingface_hub
2ef934d verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: DataPass - Query Private HF Datasets via MCP
emoji: 🎫
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_scopes:
  - email
tags:
  - building-mcp-track-xx
  - mcp
  - data-monetization
  - subscription
  - duckdb
license: mit

DataPass: Your Pass to Private Data

Track 1: Building MCP - Give AI assistants instant query access to private datasets, with built-in monetization

Author: waroca

Demo Video Social Post

Try It Now (Judges)

Test DataPass instantly with a 10-day access pass to the Nano-Banana Pro Prompt Collection - a private dataset of 80 curated AI image generation prompts across 27 categories.

Claude Desktop

Add to claude_desktop_config.json (Claude Desktop requires mcp-remote for SSE servers):

Platform Config Location
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Windows %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "nano-pro-prompts-dataset": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://waroca-datapass-server.hf.space/sse",
        "--header",
        "Authorization:Bearer hfdata_amcoJbyjPQW82MX5Ryd0bB7Nt35Ti0eCyBxsLRq3Pjw"
      ]
    }
  }
}

Cursor

Add to .cursor/mcp.json (Cursor has native SSE support):

{
  "mcpServers": {
    "nano-pro-prompts-dataset": {
      "url": "https://waroca-datapass-server.hf.space/sse",
      "headers": {
        "Authorization": "Bearer hfdata_amcoJbyjPQW82MX5Ryd0bB7Nt35Ti0eCyBxsLRq3Pjw"
      }
    }
  }
}

Try These Queries

Once connected, ask your AI assistant:

  1. "What datasets do I have access to?" - See your subscription details
  2. "Show me the schema of the nano prompts dataset" - Explore the structure
  3. "What categories of prompts are available?" - Query the data
  4. "Give me 3 prompts for product visualization" - Get specific results
  5. "How many prompts require reference images?" - Run analytics

The dataset stays private on Hugging Face - you only see query results.


The Problem

AI assistants are powerful, but they can't access private data. Dataset owners face an impossible choice:

  • Share raw files? Lose control forever. No way to revoke, no way to track usage.
  • Keep it locked? Miss out on revenue and users who need your data.

DataPass Story

The Solution: Query Access, Not Download Access

DataPass is an MCP server that lets AI assistants query private Hugging Face datasets - without ever downloading them.

  • Users ask questions in plain English or SQL
  • DataPass validates their subscription, runs the query, returns only the results
  • The raw data never leaves Hugging Face
User: "Show me the top 5 sales categories"
         β”‚
         β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚   DataPass MCP Server   β”‚
   β”‚                         β”‚
   β”‚  βœ“ Valid subscription   β”‚
   β”‚  βœ“ Convert to SQL       β”‚
   β”‚  βœ“ Query via DuckDB     │────▢ Private HF Dataset
   β”‚  βœ“ Return results only  β”‚      (never exposed)
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
   [{category: "Electronics", sales: 50000}, ...]

Why DataPass?

For Dataset Owners: Monetize Without Losing Control

Benefit How
Keep data private Raw files stay on HF, users only see query results
Monetize instantly Set up Stripe pricing (free trials, monthly, one-time)
Time-limited access Subscriptions auto-expire, no credential management
Revoke anytime Cancel a subscription, access stops immediately
Track usage See who's querying, what they're asking

For AI Users: Instant Access to Premium Data

Benefit How
No downloads Query terabytes without filling your disk
Plain English Ask questions naturally, DataPass handles the SQL
Works everywhere Claude Desktop, Cursor, any MCP-compatible client
Pay for what you need Subscribe monthly, or just get a day pass

Quick Start

For Dataset Owners (3 minutes)

  1. Add your dataset to the DataPass catalog via the admin dashboard
  2. Set pricing - free trial, monthly subscription, or one-time access
  3. Share the link - users subscribe and start querying immediately

For AI Users (1 minute)

  1. Browse the DataPass catalog
  2. Subscribe (free or paid) to datasets you need
  3. Connect your AI assistant and start querying
// Claude Desktop config
{
  "mcpServers": {
    "datapass": {
      "url": "https://waroca-datapass-server.hf.space/sse"
    }
  }
}

MCP Tools

Tool What it does
get_dataset_catalog() Browse available datasets and pricing
get_dataset_sample(dataset_id) Preview sample rows before subscribing
query_dataset(dataset_id, sql) Run SQL queries on private data
query_dataset_natural_language(dataset_id, question) Ask in plain English

Example Conversation

You: "What datasets can I access?"

Claude: You have access to "Sales Analytics" (expires Jan 15, 2025)
        - 2.3M rows, 12 columns
        - Updated daily

You: "What were the top 5 product categories last month?"

Claude: Here are the top categories by revenue:
        | Category    | Revenue   | Orders |
        |-------------|-----------|--------|
        | Electronics | $1.2M     | 8,420  |
        | Apparel     | $890K     | 12,100 |
        | Home        | $650K     | 5,200  |
        ...

No downloads. No credentials to manage. Just answers.

Monetization Use Cases

Use Case How DataPass Helps
Premium datasets Sell access to curated, high-quality data
Research data Let researchers query without sharing raw data
Analytics as a service Monetize proprietary business intelligence
Gated previews Free samples, paid full access
Time-limited access Day passes, monthly subs, or perpetual

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Subscriber UI      β”‚     β”‚  Owner Dashboard    β”‚
β”‚  (Gradio Space)     β”‚     β”‚  (Gradio Space)     β”‚
β”‚  - Browse & Subscribeβ”‚    β”‚  - Add Datasets     β”‚
β”‚  - View My Passes   β”‚     β”‚  - Set Pricing      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                           β”‚
          β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            DataPass MCP Server (FastMCP)         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Subscriptionβ”‚  β”‚ NL-to-SQL   β”‚  β”‚ DuckDB   β”‚ β”‚
β”‚  β”‚ Validation  β”‚  β”‚ (Hyperbolic)β”‚  β”‚ Queries  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                         β”‚                        β”‚
β”‚                         β–Ό                        β”‚
β”‚         Private HF Datasets (never downloaded)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Component Technology
MCP Server FastMCP + FastAPI
Query Engine DuckDB (queries parquet directly from HF)
NL-to-SQL Qwen/Qwen2.5-Coder-32B-Instruct via Hyperbolic
Payments Stripe Checkout + Webhooks
Pass Storage HF Dataset (append-only JSONL ledger)
Frontends Gradio 6.0 + HF OAuth

Note: The entire platform runs on Hugging Face (Spaces, Datasets, OAuth) except for the natural language to SQL translation, which uses Qwen2.5-Coder-32B-Instruct hosted on Hyperbolic for inference.

Security Model

What How
Dataset files Never leave HF - queried in place via DuckDB
Access control Pass validated on every request
Time limits Subscription expiry enforced server-side
Revocation Owner can invalidate passes anytime
Query limits Results capped, no SELECT * dumps

Local Development

# Install dependencies
uv sync

# Run MCP server
cd mcp-server && uv run uvicorn server:app --host 0.0.0.0 --port 8000

# Run frontend
cd frontend && uv run python app.py

Deployment

export HF_TOKEN=hf_...
uv run scripts/deploy.py --deploy-all --username YOUR_USERNAME --create

Project Structure

datapass/
β”œβ”€β”€ mcp-server/
β”‚   β”œβ”€β”€ server.py              # MCP tools + query execution
β”‚   β”œβ”€β”€ datasets_registry.py   # Dataset catalog management
β”‚   β”œβ”€β”€ subscriptions_ledger.py # Pass validation
β”‚   β”œβ”€β”€ stripe_webhook.py      # Payment handling
β”‚   └── auth.py                # HF OAuth
β”œβ”€β”€ frontend/                  # Subscriber Gradio app
β”œβ”€β”€ admin-frontend/            # Owner dashboard
└── scripts/                   # Deployment

Links


DataPass: Your pass to private datasets.