Spaces:
Running
Running
| title: DataView MCP | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.32.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - mcp | |
| - datasets | |
| - huggingface | |
| - exploration | |
| - gradio | |
| # DataView MCP π | |
| A comprehensive **Model Context Protocol (MCP) server** for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub. | |
| ## Features | |
| | Tool | Description | | |
| |------|-------------| | |
| | `search_datasets` | Find datasets by keyword, task, or domain | | |
| | `search_by_columns` | Find datasets with specific column names | | |
| | `get_dataset_info` | Get detailed metadata and README | | |
| | `get_schema` | Get column names and data types | | |
| | `sample_rows` | Get actual data samples | | |
| | `get_statistics` | Compute column statistics | | |
| | `profile_quality` | Assess data quality issues | | |
| | `find_similar` | Find similar datasets | | |
| | `suggest_tasks` | Suggest ML tasks for a dataset | | |
| | `compare_datasets` | Compare two datasets side-by-side | | |
| ## Quick Start | |
| ### Use with Claude Desktop | |
| Add to your `claude_desktop_config.json`: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "dataview": { | |
| "url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| ### Use with Claude Code | |
| Add to your MCP settings: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "dataview": { | |
| "command": "npx", | |
| "args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"] | |
| } | |
| } | |
| } | |
| ``` | |
| ### Run Locally | |
| ```bash | |
| # Clone the repository | |
| git clone https://huggingface.co/spaces/efecelik/dataview-mcp | |
| cd dataview-mcp | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Optional: Set HF token for higher rate limits | |
| export HF_TOKEN=your_token_here | |
| # Run the server | |
| python app.py | |
| ``` | |
| Then connect to `http://localhost:7860/gradio_api/mcp/sse` | |
| ## Example Usage | |
| Once connected, ask your AI assistant: | |
| - *"Search for sentiment analysis datasets"* | |
| - *"Show me 5 sample rows from the IMDB dataset"* | |
| - *"What's the schema of the SQuAD dataset?"* | |
| - *"Find datasets similar to IMDB"* | |
| - *"What ML tasks could I do with the IMDB dataset?"* | |
| - *"Compare IMDB and Rotten Tomatoes datasets"* | |
| - *"Check the data quality of this dataset"* | |
| ## Tool Details | |
| ### search_datasets | |
| Find datasets matching your criteria. | |
| ``` | |
| Query: "sentiment analysis" | |
| Filter: text-classification | |
| Limit: 10 | |
| ``` | |
| ### sample_rows | |
| See actual data from a dataset. | |
| ``` | |
| Dataset: imdb | |
| Rows: 5 | |
| Split: train | |
| ``` | |
| ### get_statistics | |
| Get statistical overview of columns. | |
| ``` | |
| Dataset: imdb | |
| Sample Size: 1000 | |
| ``` | |
| ### profile_quality | |
| Check for data quality issues. | |
| ``` | |
| Dataset: imdb | |
| Sample Size: 500 | |
| ``` | |
| Returns quality score, missing values, duplicates, class imbalance. | |
| ### suggest_tasks | |
| AI-powered task suggestions based on dataset structure. | |
| ``` | |
| Dataset: imdb | |
| ``` | |
| Returns suggested ML tasks with confidence levels. | |
| ## Development | |
| ```bash | |
| # Install dev dependencies | |
| pip install -r requirements.txt | |
| # Run in development mode | |
| gradio app.py --reload | |
| ``` | |
| ## Architecture | |
| ``` | |
| dataview-mcp/ | |
| βββ app.py # Main Gradio MCP server | |
| βββ tools/ | |
| β βββ search.py # search_datasets, search_by_columns | |
| β βββ metadata.py # get_dataset_info, get_schema | |
| β βββ sampling.py # sample_rows | |
| β βββ profiling.py # get_statistics, profile_quality | |
| β βββ discovery.py # find_similar, suggest_tasks, compare_datasets | |
| βββ utils/ | |
| β βββ hf_client.py # HF API wrapper | |
| β βββ formatting.py # Output formatters | |
| βββ requirements.txt | |
| ``` | |
| ## License | |
| MIT | |
| ## Contributing | |
| Contributions welcome! Please open an issue or PR. | |
| --- | |
| Built with Gradio and Hugging Face Hub | |