Spaces:
Running
A newer version of the Gradio SDK is available:
6.3.0
title: DataView MCP
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
tags:
- mcp
- datasets
- huggingface
- exploration
- gradio
DataView MCP π
A comprehensive Model Context Protocol (MCP) server for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub.
Features
| Tool | Description |
|---|---|
search_datasets |
Find datasets by keyword, task, or domain |
search_by_columns |
Find datasets with specific column names |
get_dataset_info |
Get detailed metadata and README |
get_schema |
Get column names and data types |
sample_rows |
Get actual data samples |
get_statistics |
Compute column statistics |
profile_quality |
Assess data quality issues |
find_similar |
Find similar datasets |
suggest_tasks |
Suggest ML tasks for a dataset |
compare_datasets |
Compare two datasets side-by-side |
Quick Start
Use with Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"dataview": {
"url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"
}
}
}
Use with Claude Code
Add to your MCP settings:
{
"mcpServers": {
"dataview": {
"command": "npx",
"args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"]
}
}
}
Run Locally
# Clone the repository
git clone https://huggingface.co/spaces/efecelik/dataview-mcp
cd dataview-mcp
# Install dependencies
pip install -r requirements.txt
# Optional: Set HF token for higher rate limits
export HF_TOKEN=your_token_here
# Run the server
python app.py
Then connect to http://localhost:7860/gradio_api/mcp/sse
Example Usage
Once connected, ask your AI assistant:
- "Search for sentiment analysis datasets"
- "Show me 5 sample rows from the IMDB dataset"
- "What's the schema of the SQuAD dataset?"
- "Find datasets similar to IMDB"
- "What ML tasks could I do with the IMDB dataset?"
- "Compare IMDB and Rotten Tomatoes datasets"
- "Check the data quality of this dataset"
Tool Details
search_datasets
Find datasets matching your criteria.
Query: "sentiment analysis"
Filter: text-classification
Limit: 10
sample_rows
See actual data from a dataset.
Dataset: imdb
Rows: 5
Split: train
get_statistics
Get statistical overview of columns.
Dataset: imdb
Sample Size: 1000
profile_quality
Check for data quality issues.
Dataset: imdb
Sample Size: 500
Returns quality score, missing values, duplicates, class imbalance.
suggest_tasks
AI-powered task suggestions based on dataset structure.
Dataset: imdb
Returns suggested ML tasks with confidence levels.
Development
# Install dev dependencies
pip install -r requirements.txt
# Run in development mode
gradio app.py --reload
Architecture
dataview-mcp/
βββ app.py # Main Gradio MCP server
βββ tools/
β βββ search.py # search_datasets, search_by_columns
β βββ metadata.py # get_dataset_info, get_schema
β βββ sampling.py # sample_rows
β βββ profiling.py # get_statistics, profile_quality
β βββ discovery.py # find_similar, suggest_tasks, compare_datasets
βββ utils/
β βββ hf_client.py # HF API wrapper
β βββ formatting.py # Output formatters
βββ requirements.txt
License
MIT
Contributing
Contributions welcome! Please open an issue or PR.
Built with Gradio and Hugging Face Hub