Spaces:
Running
Running
File size: 3,734 Bytes
9fd6495 b67578f 9fd6495 73cc1fe 9fd6495 b67578f 9fd6495 b67578f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
title: DataView MCP
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.32.0
app_file: app.py
pinned: false
license: mit
tags:
- mcp
- datasets
- huggingface
- exploration
- gradio
---
# DataView MCP π
A comprehensive **Model Context Protocol (MCP) server** for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub.
## Features
| Tool | Description |
|------|-------------|
| `search_datasets` | Find datasets by keyword, task, or domain |
| `search_by_columns` | Find datasets with specific column names |
| `get_dataset_info` | Get detailed metadata and README |
| `get_schema` | Get column names and data types |
| `sample_rows` | Get actual data samples |
| `get_statistics` | Compute column statistics |
| `profile_quality` | Assess data quality issues |
| `find_similar` | Find similar datasets |
| `suggest_tasks` | Suggest ML tasks for a dataset |
| `compare_datasets` | Compare two datasets side-by-side |
## Quick Start
### Use with Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"dataview": {
"url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"
}
}
}
```
### Use with Claude Code
Add to your MCP settings:
```json
{
"mcpServers": {
"dataview": {
"command": "npx",
"args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"]
}
}
}
```
### Run Locally
```bash
# Clone the repository
git clone https://huggingface.co/spaces/efecelik/dataview-mcp
cd dataview-mcp
# Install dependencies
pip install -r requirements.txt
# Optional: Set HF token for higher rate limits
export HF_TOKEN=your_token_here
# Run the server
python app.py
```
Then connect to `http://localhost:7860/gradio_api/mcp/sse`
## Example Usage
Once connected, ask your AI assistant:
- *"Search for sentiment analysis datasets"*
- *"Show me 5 sample rows from the IMDB dataset"*
- *"What's the schema of the SQuAD dataset?"*
- *"Find datasets similar to IMDB"*
- *"What ML tasks could I do with the IMDB dataset?"*
- *"Compare IMDB and Rotten Tomatoes datasets"*
- *"Check the data quality of this dataset"*
## Tool Details
### search_datasets
Find datasets matching your criteria.
```
Query: "sentiment analysis"
Filter: text-classification
Limit: 10
```
### sample_rows
See actual data from a dataset.
```
Dataset: imdb
Rows: 5
Split: train
```
### get_statistics
Get statistical overview of columns.
```
Dataset: imdb
Sample Size: 1000
```
### profile_quality
Check for data quality issues.
```
Dataset: imdb
Sample Size: 500
```
Returns quality score, missing values, duplicates, class imbalance.
### suggest_tasks
AI-powered task suggestions based on dataset structure.
```
Dataset: imdb
```
Returns suggested ML tasks with confidence levels.
## Development
```bash
# Install dev dependencies
pip install -r requirements.txt
# Run in development mode
gradio app.py --reload
```
## Architecture
```
dataview-mcp/
βββ app.py # Main Gradio MCP server
βββ tools/
β βββ search.py # search_datasets, search_by_columns
β βββ metadata.py # get_dataset_info, get_schema
β βββ sampling.py # sample_rows
β βββ profiling.py # get_statistics, profile_quality
β βββ discovery.py # find_similar, suggest_tasks, compare_datasets
βββ utils/
β βββ hf_client.py # HF API wrapper
β βββ formatting.py # Output formatters
βββ requirements.txt
```
## License
MIT
## Contributing
Contributions welcome! Please open an issue or PR.
---
Built with Gradio and Hugging Face Hub
|