dataview-mcp / README.md
efecelik's picture
fix: update Gradio to 5.32+ with MCP extra
73cc1fe
---
title: DataView MCP
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
tags:
- mcp
- datasets
- huggingface
- exploration
- gradio
---
# DataView MCP πŸ”
A comprehensive **Model Context Protocol (MCP) server** for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub.
## Features
| Tool | Description |
|------|-------------|
| `search_datasets` | Find datasets by keyword, task, or domain |
| `search_by_columns` | Find datasets with specific column names |
| `get_dataset_info` | Get detailed metadata and README |
| `get_schema` | Get column names and data types |
| `sample_rows` | Get actual data samples |
| `get_statistics` | Compute column statistics |
| `profile_quality` | Assess data quality issues |
| `find_similar` | Find similar datasets |
| `suggest_tasks` | Suggest ML tasks for a dataset |
| `compare_datasets` | Compare two datasets side-by-side |
## Quick Start
### Use with Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"dataview": {
"url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"
}
}
}
```
### Use with Claude Code
Add to your MCP settings:
```json
{
"mcpServers": {
"dataview": {
"command": "npx",
"args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"]
}
}
}
```
### Run Locally
```bash
# Clone the repository
git clone https://huggingface.co/spaces/efecelik/dataview-mcp
cd dataview-mcp
# Install dependencies
pip install -r requirements.txt
# Optional: Set HF token for higher rate limits
export HF_TOKEN=your_token_here
# Run the server
python app.py
```
Then connect to `http://localhost:7860/gradio_api/mcp/sse`
## Example Usage
Once connected, ask your AI assistant:
- *"Search for sentiment analysis datasets"*
- *"Show me 5 sample rows from the IMDB dataset"*
- *"What's the schema of the SQuAD dataset?"*
- *"Find datasets similar to IMDB"*
- *"What ML tasks could I do with the IMDB dataset?"*
- *"Compare IMDB and Rotten Tomatoes datasets"*
- *"Check the data quality of this dataset"*
## Tool Details
### search_datasets
Find datasets matching your criteria.
```
Query: "sentiment analysis"
Filter: text-classification
Limit: 10
```
### sample_rows
See actual data from a dataset.
```
Dataset: imdb
Rows: 5
Split: train
```
### get_statistics
Get statistical overview of columns.
```
Dataset: imdb
Sample Size: 1000
```
### profile_quality
Check for data quality issues.
```
Dataset: imdb
Sample Size: 500
```
Returns quality score, missing values, duplicates, class imbalance.
### suggest_tasks
AI-powered task suggestions based on dataset structure.
```
Dataset: imdb
```
Returns suggested ML tasks with confidence levels.
## Development
```bash
# Install dev dependencies
pip install -r requirements.txt
# Run in development mode
gradio app.py --reload
```
## Architecture
```
dataview-mcp/
β”œβ”€β”€ app.py # Main Gradio MCP server
β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ search.py # search_datasets, search_by_columns
β”‚ β”œβ”€β”€ metadata.py # get_dataset_info, get_schema
β”‚ β”œβ”€β”€ sampling.py # sample_rows
β”‚ β”œβ”€β”€ profiling.py # get_statistics, profile_quality
β”‚ └── discovery.py # find_similar, suggest_tasks, compare_datasets
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ hf_client.py # HF API wrapper
β”‚ └── formatting.py # Output formatters
└── requirements.txt
```
## License
MIT
## Contributing
Contributions welcome! Please open an issue or PR.
---
Built with Gradio and Hugging Face Hub