SCoDA / README.md
vanishingradient's picture
Modified emoji
60e42af
---
title: SCoDA
emoji: 🎨
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---
# CoDA: Collaborative Data Visualization Agents
A production-grade multi-agent system for automated data visualization from natural language queries.
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
## Overview
CoDA reframes data visualization as a collaborative multi-agent problem. Instead of treating it as a monolithic task, CoDA employs specialized LLM agents that work together:
- **Query Analyzer** - Interprets natural language and extracts visualization intent
- **Data Processor** - Extracts metadata without token-heavy data loading
- **VizMapping Agent** - Maps semantics to visualization primitives
- **Search Agent** - Retrieves relevant code patterns
- **Design Explorer** - Generates aesthetic specifications
- **Code Generator** - Synthesizes executable Python code
- **Debug Agent** - Executes code and fixes errors
- **Visual Evaluator** - Assesses quality and triggers refinement
## Quick Start
### Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/CoDA.git
cd CoDA
# Install dependencies
pip install -r requirements.txt
# Configure API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
```
### Usage
#### Web Interface (Gradio)
```bash
python app.py
```
Open http://localhost:7860 in your browser.
#### Command Line
```bash
python main.py --query "Create a bar chart of sales by category" --data sales.csv
```
Options:
- `-q, --query`: Visualization query (required)
- `-d, --data`: Data file path(s) (required)
- `-o, --output`: Output directory (default: outputs)
- `--max-iterations`: Refinement iterations (default: 3)
- `--min-score`: Quality threshold (default: 7.0)
### Python API
```python
from coda.orchestrator import CodaOrchestrator
orchestrator = CodaOrchestrator()
result = orchestrator.run(
query="Show sales trends over time",
data_paths=["sales_data.csv"]
)
if result.success:
print(f"Visualization saved to: {result.output_file}")
print(f"Quality Score: {result.scores['overall']}/10")
```
## Hugging Face Spaces Deployment
1. Create a new Space on [Hugging Face](https://huggingface.co/new-space)
2. Select "Gradio" as the SDK
3. Upload all files from this repository
4. Add `GROQ_API_KEY` as a Secret in Space Settings
5. The Space will automatically build and deploy
## Architecture
```
Natural Language Query + Data Files
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Query Analyzer β”‚ ─── Extracts intent, TODO list
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Data Processor β”‚ ─── Metadata extraction (no full load)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ VizMapping β”‚ ─── Chart type, encodings
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Search Agent β”‚ ─── Code examples
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Design Explorerβ”‚ ─── Colors, layout, styling
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Code Generator β”‚ ─── Python visualization code
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Debug Agent β”‚ ─── Execute & fix errors
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Visual Evaluatorβ”‚ ─── Quality assessment
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
───────┴───────
↓ Feedback Loop ↓
(if quality < threshold)
```
## Configuration
Environment variables (in `.env`):
| Variable | Default | Description |
|----------|---------|-------------|
| `GROQ_API_KEY` | Required | Your Groq API key |
| `CODA_DEFAULT_MODEL` | llama-3.3-70b-versatile | Text model |
| `CODA_VISION_MODEL` | llama-3.2-90b-vision-preview | Vision model |
| `CODA_MIN_OVERALL_SCORE` | 7.0 | Quality threshold |
| `CODA_MAX_ITERATIONS` | 3 | Max refinement loops |
## Supported Data Formats
- CSV (`.csv`)
- JSON (`.json`)
- Excel (`.xlsx`, `.xls`)
- Parquet (`.parquet`)
## Requirements
- Python 3.10+
- Groq API key ([Get one free](https://console.groq.com))
## License
MIT License - See LICENSE for details.
## Citation
If you use CoDA in your research, please cite:
```bibtex
@article{chen2025coda,
title={CoDA: Agentic Systems for Collaborative Data Visualization},
author={Chen, Zichen and Chen, Jiefeng and Arik, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung},
journal={arXiv preprint arXiv:2510.03194},
year={2025},
url={https://arxiv.org/abs/2510.03194},
doi={10.48550/arXiv.2510.03194}
}
```
**Paper**: [arXiv:2510.03194](https://arxiv.org/abs/2510.03194)