Spaces:

efecelik
/

dataview-mcp

Running

File size: 3,734 Bytes

9fd6495
b67578f
 
 
 
9fd6495
73cc1fe
9fd6495
 
b67578f
 
 
 
 
 
 
9fd6495
 
b67578f

---
title: DataView MCP
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
tags:
  - mcp
  - datasets
  - huggingface
  - exploration
  - gradio
---

# DataView MCP 🔍

A comprehensive **Model Context Protocol (MCP) server** for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub.

## Features

| Tool | Description |
|------|-------------|
| `search_datasets` | Find datasets by keyword, task, or domain |
| `search_by_columns` | Find datasets with specific column names |
| `get_dataset_info` | Get detailed metadata and README |
| `get_schema` | Get column names and data types |
| `sample_rows` | Get actual data samples |
| `get_statistics` | Compute column statistics |
| `profile_quality` | Assess data quality issues |
| `find_similar` | Find similar datasets |
| `suggest_tasks` | Suggest ML tasks for a dataset |
| `compare_datasets` | Compare two datasets side-by-side |

## Quick Start

### Use with Claude Desktop

Add to your `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "dataview": {
      "url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"
    }
  }
}
```

### Use with Claude Code

Add to your MCP settings:

```json
{
  "mcpServers": {
    "dataview": {
      "command": "npx",
      "args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"]
    }
  }
}
```

### Run Locally

```bash
# Clone the repository
git clone https://huggingface.co/spaces/efecelik/dataview-mcp
cd dataview-mcp

# Install dependencies
pip install -r requirements.txt

# Optional: Set HF token for higher rate limits
export HF_TOKEN=your_token_here

# Run the server
python app.py
```

Then connect to `http://localhost:7860/gradio_api/mcp/sse`

## Example Usage

Once connected, ask your AI assistant:

- *"Search for sentiment analysis datasets"*
- *"Show me 5 sample rows from the IMDB dataset"*
- *"What's the schema of the SQuAD dataset?"*
- *"Find datasets similar to IMDB"*
- *"What ML tasks could I do with the IMDB dataset?"*
- *"Compare IMDB and Rotten Tomatoes datasets"*
- *"Check the data quality of this dataset"*

## Tool Details

### search_datasets

Find datasets matching your criteria.

```
Query: "sentiment analysis"
Filter: text-classification
Limit: 10
```

### sample_rows

See actual data from a dataset.

```
Dataset: imdb
Rows: 5
Split: train
```

### get_statistics

Get statistical overview of columns.

```
Dataset: imdb
Sample Size: 1000
```

### profile_quality

Check for data quality issues.

```
Dataset: imdb
Sample Size: 500
```

Returns quality score, missing values, duplicates, class imbalance.

### suggest_tasks

AI-powered task suggestions based on dataset structure.

```
Dataset: imdb
```

Returns suggested ML tasks with confidence levels.

## Development

```bash
# Install dev dependencies
pip install -r requirements.txt

# Run in development mode
gradio app.py --reload
```

## Architecture

```
dataview-mcp/
├── app.py              # Main Gradio MCP server
├── tools/
│   ├── search.py       # search_datasets, search_by_columns
│   ├── metadata.py     # get_dataset_info, get_schema
│   ├── sampling.py     # sample_rows
│   ├── profiling.py    # get_statistics, profile_quality
│   └── discovery.py    # find_similar, suggest_tasks, compare_datasets
├── utils/
│   ├── hf_client.py    # HF API wrapper
│   └── formatting.py   # Output formatters
└── requirements.txt
```

## License

MIT

## Contributing

Contributions welcome! Please open an issue or PR.

---

Built with Gradio and Hugging Face Hub