File size: 5,712 Bytes
9281fab
35dff63
60e42af
35dff63
 
9281fab
35dff63
9281fab
 
35dff63
9281fab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
470a2b1
9281fab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: SCoDA
emoji: 🎨
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---

# CoDA: Collaborative Data Visualization Agents

A production-grade multi-agent system for automated data visualization from natural language queries.

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

CoDA reframes data visualization as a collaborative multi-agent problem. Instead of treating it as a monolithic task, CoDA employs specialized LLM agents that work together:

- **Query Analyzer** - Interprets natural language and extracts visualization intent
- **Data Processor** - Extracts metadata without token-heavy data loading
- **VizMapping Agent** - Maps semantics to visualization primitives
- **Search Agent** - Retrieves relevant code patterns
- **Design Explorer** - Generates aesthetic specifications
- **Code Generator** - Synthesizes executable Python code
- **Debug Agent** - Executes code and fixes errors
- **Visual Evaluator** - Assesses quality and triggers refinement

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/yourusername/CoDA.git
cd CoDA

# Install dependencies
pip install -r requirements.txt

# Configure API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
```

### Usage

#### Web Interface (Gradio)

```bash
python app.py
```

Open http://localhost:7860 in your browser.

#### Command Line

```bash
python main.py --query "Create a bar chart of sales by category" --data sales.csv
```

Options:
- `-q, --query`: Visualization query (required)
- `-d, --data`: Data file path(s) (required)
- `-o, --output`: Output directory (default: outputs)
- `--max-iterations`: Refinement iterations (default: 3)
- `--min-score`: Quality threshold (default: 7.0)

### Python API

```python
from coda.orchestrator import CodaOrchestrator

orchestrator = CodaOrchestrator()
result = orchestrator.run(
    query="Show sales trends over time",
    data_paths=["sales_data.csv"]
)

if result.success:
    print(f"Visualization saved to: {result.output_file}")
    print(f"Quality Score: {result.scores['overall']}/10")
```

## Hugging Face Spaces Deployment

1. Create a new Space on [Hugging Face](https://huggingface.co/new-space)
2. Select "Gradio" as the SDK
3. Upload all files from this repository
4. Add `GROQ_API_KEY` as a Secret in Space Settings
5. The Space will automatically build and deploy

## Architecture

```
Natural Language Query + Data Files
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Query Analyzer β”‚ ─── Extracts intent, TODO list
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Data Processor β”‚ ─── Metadata extraction (no full load)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ VizMapping    β”‚ ─── Chart type, encodings
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Search Agent  β”‚ ─── Code examples
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Design Explorerβ”‚ ─── Colors, layout, styling
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Code Generator β”‚ ─── Python visualization code
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Debug Agent   β”‚ ─── Execute & fix errors
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Visual Evaluatorβ”‚ ─── Quality assessment
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
      ───────┴───────
    ↓ Feedback Loop ↓
    (if quality < threshold)
```

## Configuration

Environment variables (in `.env`):

| Variable | Default | Description |
|----------|---------|-------------|
| `GROQ_API_KEY` | Required | Your Groq API key |
| `CODA_DEFAULT_MODEL` | llama-3.3-70b-versatile | Text model |
| `CODA_VISION_MODEL` | llama-3.2-90b-vision-preview | Vision model |
| `CODA_MIN_OVERALL_SCORE` | 7.0 | Quality threshold |
| `CODA_MAX_ITERATIONS` | 3 | Max refinement loops |

## Supported Data Formats

- CSV (`.csv`)
- JSON (`.json`)
- Excel (`.xlsx`, `.xls`)
- Parquet (`.parquet`)

## Requirements

- Python 3.10+
- Groq API key ([Get one free](https://console.groq.com))

## License

MIT License - See LICENSE for details.

## Citation

If you use CoDA in your research, please cite:

```bibtex
@article{chen2025coda,
  title={CoDA: Agentic Systems for Collaborative Data Visualization},
  author={Chen, Zichen and Chen, Jiefeng and Arik, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2510.03194},
  year={2025},
  url={https://arxiv.org/abs/2510.03194},
  doi={10.48550/arXiv.2510.03194}
}
```

**Paper**: [arXiv:2510.03194](https://arxiv.org/abs/2510.03194)