--- title: DataView MCP emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.32.0 app_file: app.py pinned: false license: mit tags: - mcp - datasets - huggingface - exploration - gradio --- # DataView MCP 🔍 A comprehensive **Model Context Protocol (MCP) server** for exploring Hugging Face datasets. Give your AI assistant the power to search, sample, analyze, and discover datasets on the Hub. ## Features | Tool | Description | |------|-------------| | `search_datasets` | Find datasets by keyword, task, or domain | | `search_by_columns` | Find datasets with specific column names | | `get_dataset_info` | Get detailed metadata and README | | `get_schema` | Get column names and data types | | `sample_rows` | Get actual data samples | | `get_statistics` | Compute column statistics | | `profile_quality` | Assess data quality issues | | `find_similar` | Find similar datasets | | `suggest_tasks` | Suggest ML tasks for a dataset | | `compare_datasets` | Compare two datasets side-by-side | ## Quick Start ### Use with Claude Desktop Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "dataview": { "url": "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse" } } } ``` ### Use with Claude Code Add to your MCP settings: ```json { "mcpServers": { "dataview": { "command": "npx", "args": ["mcp-remote", "https://efecelik-dataview-mcp.hf.space/gradio_api/mcp/sse"] } } } ``` ### Run Locally ```bash # Clone the repository git clone https://huggingface.co/spaces/efecelik/dataview-mcp cd dataview-mcp # Install dependencies pip install -r requirements.txt # Optional: Set HF token for higher rate limits export HF_TOKEN=your_token_here # Run the server python app.py ``` Then connect to `http://localhost:7860/gradio_api/mcp/sse` ## Example Usage Once connected, ask your AI assistant: - *"Search for sentiment analysis datasets"* - *"Show me 5 sample rows from the IMDB dataset"* - *"What's the schema of the SQuAD dataset?"* - *"Find datasets similar to IMDB"* - *"What ML tasks could I do with the IMDB dataset?"* - *"Compare IMDB and Rotten Tomatoes datasets"* - *"Check the data quality of this dataset"* ## Tool Details ### search_datasets Find datasets matching your criteria. ``` Query: "sentiment analysis" Filter: text-classification Limit: 10 ``` ### sample_rows See actual data from a dataset. ``` Dataset: imdb Rows: 5 Split: train ``` ### get_statistics Get statistical overview of columns. ``` Dataset: imdb Sample Size: 1000 ``` ### profile_quality Check for data quality issues. ``` Dataset: imdb Sample Size: 500 ``` Returns quality score, missing values, duplicates, class imbalance. ### suggest_tasks AI-powered task suggestions based on dataset structure. ``` Dataset: imdb ``` Returns suggested ML tasks with confidence levels. ## Development ```bash # Install dev dependencies pip install -r requirements.txt # Run in development mode gradio app.py --reload ``` ## Architecture ``` dataview-mcp/ ├── app.py # Main Gradio MCP server ├── tools/ │ ├── search.py # search_datasets, search_by_columns │ ├── metadata.py # get_dataset_info, get_schema │ ├── sampling.py # sample_rows │ ├── profiling.py # get_statistics, profile_quality │ └── discovery.py # find_similar, suggest_tasks, compare_datasets ├── utils/ │ ├── hf_client.py # HF API wrapper │ └── formatting.py # Output formatters └── requirements.txt ``` ## License MIT ## Contributing Contributions welcome! Please open an issue or PR. --- Built with Gradio and Hugging Face Hub