---
title: SCoDA
emoji: 🎨
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---

# CoDA: Collaborative Data Visualization Agents

A production-grade multi-agent system for automated data visualization from natural language queries.

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

CoDA reframes data visualization as a collaborative multi-agent problem. Instead of treating it as a monolithic task, CoDA employs specialized LLM agents that work together:

- **Query Analyzer** - Interprets natural language and extracts visualization intent
- **Data Processor** - Extracts metadata without token-heavy data loading
- **VizMapping Agent** - Maps semantics to visualization primitives
- **Search Agent** - Retrieves relevant code patterns
- **Design Explorer** - Generates aesthetic specifications
- **Code Generator** - Synthesizes executable Python code
- **Debug Agent** - Executes code and fixes errors
- **Visual Evaluator** - Assesses quality and triggers refinement

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/yourusername/CoDA.git
cd CoDA

# Install dependencies
pip install -r requirements.txt

# Configure API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
```

### Usage

#### Web Interface (Gradio)

```bash
python app.py
```

Open http://localhost:7860 in your browser.

#### Command Line

```bash
python main.py --query "Create a bar chart of sales by category" --data sales.csv
```

Options:
- `-q, --query`: Visualization query (required)
- `-d, --data`: Data file path(s) (required)
- `-o, --output`: Output directory (default: outputs)
- `--max-iterations`: Refinement iterations (default: 3)
- `--min-score`: Quality threshold (default: 7.0)

### Python API

```python
from coda.orchestrator import CodaOrchestrator

orchestrator = CodaOrchestrator()
result = orchestrator.run(
    query="Show sales trends over time",
    data_paths=["sales_data.csv"]
)

if result.success:
    print(f"Visualization saved to: {result.output_file}")
    print(f"Quality Score: {result.scores['overall']}/10")
```

## Hugging Face Spaces Deployment

1. Create a new Space on [Hugging Face](https://huggingface.co/new-space)
2. Select "Gradio" as the SDK
3. Upload all files from this repository
4. Add `GROQ_API_KEY` as a Secret in Space Settings
5. The Space will automatically build and deploy

## Architecture

```
Natural Language Query + Data Files
            │
            ▼
    ┌───────────────┐
    │ Query Analyzer │ ─── Extracts intent, TODO list
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │ Data Processor │ ─── Metadata extraction (no full load)
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │ VizMapping    │ ─── Chart type, encodings
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │ Search Agent  │ ─── Code examples
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │Design Explorer│ ─── Colors, layout, styling
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │Code Generator │ ─── Python visualization code
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │ Debug Agent   │ ─── Execute & fix errors
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │Visual Evaluator│ ─── Quality assessment
    └───────────────┘
            │
      ───────┴───────
    ↓ Feedback Loop ↓
    (if quality < threshold)
```

## Configuration

Environment variables (in `.env`):

| Variable | Default | Description |
|----------|---------|-------------|
| `GROQ_API_KEY` | Required | Your Groq API key |
| `CODA_DEFAULT_MODEL` | llama-3.3-70b-versatile | Text model |
| `CODA_VISION_MODEL` | llama-3.2-90b-vision-preview | Vision model |
| `CODA_MIN_OVERALL_SCORE` | 7.0 | Quality threshold |
| `CODA_MAX_ITERATIONS` | 3 | Max refinement loops |

## Supported Data Formats

- CSV (`.csv`)
- JSON (`.json`)
- Excel (`.xlsx`, `.xls`)
- Parquet (`.parquet`)

## Requirements

- Python 3.10+
- Groq API key ([Get one free](https://console.groq.com))

## License

MIT License - See LICENSE for details.

## Citation

If you use CoDA in your research, please cite:

```bibtex
@article{chen2025coda,
  title={CoDA: Agentic Systems for Collaborative Data Visualization},
  author={Chen, Zichen and Chen, Jiefeng and Arik, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2510.03194},
  year={2025},
  url={https://arxiv.org/abs/2510.03194},
  doi={10.48550/arXiv.2510.03194}
}
```

**Paper**: [arXiv:2510.03194](https://arxiv.org/abs/2510.03194)