dataview-mcp / README.md
efecelik's picture
fix: update Gradio to 5.32+ with MCP extra
73cc1fe

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: DataView MCP
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
tags:
  - mcp
  - datasets
  - huggingface
  - exploration
  - gradio

DataView MCP πŸ”

A comprehensive Model Context Protocol (MCP) server for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub.

Features

Tool Description
search_datasets Find datasets by keyword, task, or domain
search_by_columns Find datasets with specific column names
get_dataset_info Get detailed metadata and README
get_schema Get column names and data types
sample_rows Get actual data samples
get_statistics Compute column statistics
profile_quality Assess data quality issues
find_similar Find similar datasets
suggest_tasks Suggest ML tasks for a dataset
compare_datasets Compare two datasets side-by-side

Quick Start

Use with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "dataview": {
      "url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"
    }
  }
}

Use with Claude Code

Add to your MCP settings:

{
  "mcpServers": {
    "dataview": {
      "command": "npx",
      "args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"]
    }
  }
}

Run Locally

# Clone the repository
git clone https://huggingface.co/spaces/efecelik/dataview-mcp
cd dataview-mcp

# Install dependencies
pip install -r requirements.txt

# Optional: Set HF token for higher rate limits
export HF_TOKEN=your_token_here

# Run the server
python app.py

Then connect to http://localhost:7860/gradio_api/mcp/sse

Example Usage

Once connected, ask your AI assistant:

  • "Search for sentiment analysis datasets"
  • "Show me 5 sample rows from the IMDB dataset"
  • "What's the schema of the SQuAD dataset?"
  • "Find datasets similar to IMDB"
  • "What ML tasks could I do with the IMDB dataset?"
  • "Compare IMDB and Rotten Tomatoes datasets"
  • "Check the data quality of this dataset"

Tool Details

search_datasets

Find datasets matching your criteria.

Query: "sentiment analysis"
Filter: text-classification
Limit: 10

sample_rows

See actual data from a dataset.

Dataset: imdb
Rows: 5
Split: train

get_statistics

Get statistical overview of columns.

Dataset: imdb
Sample Size: 1000

profile_quality

Check for data quality issues.

Dataset: imdb
Sample Size: 500

Returns quality score, missing values, duplicates, class imbalance.

suggest_tasks

AI-powered task suggestions based on dataset structure.

Dataset: imdb

Returns suggested ML tasks with confidence levels.

Development

# Install dev dependencies
pip install -r requirements.txt

# Run in development mode
gradio app.py --reload

Architecture

dataview-mcp/
β”œβ”€β”€ app.py              # Main Gradio MCP server
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ search.py       # search_datasets, search_by_columns
β”‚   β”œβ”€β”€ metadata.py     # get_dataset_info, get_schema
β”‚   β”œβ”€β”€ sampling.py     # sample_rows
β”‚   β”œβ”€β”€ profiling.py    # get_statistics, profile_quality
β”‚   └── discovery.py    # find_similar, suggest_tasks, compare_datasets
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ hf_client.py    # HF API wrapper
β”‚   └── formatting.py   # Output formatters
└── requirements.txt

License

MIT

Contributing

Contributions welcome! Please open an issue or PR.


Built with Gradio and Hugging Face Hub