--- title: SCoDA emoji: 🎨 colorFrom: indigo colorTo: indigo sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false license: mit --- # CoDA: Collaborative Data Visualization Agents A production-grade multi-agent system for automated data visualization from natural language queries. [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## Overview CoDA reframes data visualization as a collaborative multi-agent problem. Instead of treating it as a monolithic task, CoDA employs specialized LLM agents that work together: - **Query Analyzer** - Interprets natural language and extracts visualization intent - **Data Processor** - Extracts metadata without token-heavy data loading - **VizMapping Agent** - Maps semantics to visualization primitives - **Search Agent** - Retrieves relevant code patterns - **Design Explorer** - Generates aesthetic specifications - **Code Generator** - Synthesizes executable Python code - **Debug Agent** - Executes code and fixes errors - **Visual Evaluator** - Assesses quality and triggers refinement ## Quick Start ### Installation ```bash # Clone the repository git clone https://github.com/yourusername/CoDA.git cd CoDA # Install dependencies pip install -r requirements.txt # Configure API key cp .env.example .env # Edit .env and add your GROQ_API_KEY ``` ### Usage #### Web Interface (Gradio) ```bash python app.py ``` Open http://localhost:7860 in your browser. #### Command Line ```bash python main.py --query "Create a bar chart of sales by category" --data sales.csv ``` Options: - `-q, --query`: Visualization query (required) - `-d, --data`: Data file path(s) (required) - `-o, --output`: Output directory (default: outputs) - `--max-iterations`: Refinement iterations (default: 3) - `--min-score`: Quality threshold (default: 7.0) ### Python API ```python from coda.orchestrator import CodaOrchestrator orchestrator = CodaOrchestrator() result = orchestrator.run( query="Show sales trends over time", data_paths=["sales_data.csv"] ) if result.success: print(f"Visualization saved to: {result.output_file}") print(f"Quality Score: {result.scores['overall']}/10") ``` ## Hugging Face Spaces Deployment 1. Create a new Space on [Hugging Face](https://huggingface.co/new-space) 2. Select "Gradio" as the SDK 3. Upload all files from this repository 4. Add `GROQ_API_KEY` as a Secret in Space Settings 5. The Space will automatically build and deploy ## Architecture ``` Natural Language Query + Data Files β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Query Analyzer β”‚ ─── Extracts intent, TODO list β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Data Processor β”‚ ─── Metadata extraction (no full load) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ VizMapping β”‚ ─── Chart type, encodings β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Search Agent β”‚ ─── Code examples β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Design Explorerβ”‚ ─── Colors, layout, styling β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Code Generator β”‚ ─── Python visualization code β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Debug Agent β”‚ ─── Execute & fix errors β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Visual Evaluatorβ”‚ ─── Quality assessment β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ ───────┴─────── ↓ Feedback Loop ↓ (if quality < threshold) ``` ## Configuration Environment variables (in `.env`): | Variable | Default | Description | |----------|---------|-------------| | `GROQ_API_KEY` | Required | Your Groq API key | | `CODA_DEFAULT_MODEL` | llama-3.3-70b-versatile | Text model | | `CODA_VISION_MODEL` | llama-3.2-90b-vision-preview | Vision model | | `CODA_MIN_OVERALL_SCORE` | 7.0 | Quality threshold | | `CODA_MAX_ITERATIONS` | 3 | Max refinement loops | ## Supported Data Formats - CSV (`.csv`) - JSON (`.json`) - Excel (`.xlsx`, `.xls`) - Parquet (`.parquet`) ## Requirements - Python 3.10+ - Groq API key ([Get one free](https://console.groq.com)) ## License MIT License - See LICENSE for details. ## Citation If you use CoDA in your research, please cite: ```bibtex @article{chen2025coda, title={CoDA: Agentic Systems for Collaborative Data Visualization}, author={Chen, Zichen and Chen, Jiefeng and Arik, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung}, journal={arXiv preprint arXiv:2510.03194}, year={2025}, url={https://arxiv.org/abs/2510.03194}, doi={10.48550/arXiv.2510.03194} } ``` **Paper**: [arXiv:2510.03194](https://arxiv.org/abs/2510.03194)