SCoDA / README.md
vanishingradient's picture
Modified emoji
60e42af
metadata
title: SCoDA
emoji: 🎨
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit

CoDA: Collaborative Data Visualization Agents

A production-grade multi-agent system for automated data visualization from natural language queries.

Hugging Face Spaces Python 3.10+ License: MIT

Overview

CoDA reframes data visualization as a collaborative multi-agent problem. Instead of treating it as a monolithic task, CoDA employs specialized LLM agents that work together:

  • Query Analyzer - Interprets natural language and extracts visualization intent
  • Data Processor - Extracts metadata without token-heavy data loading
  • VizMapping Agent - Maps semantics to visualization primitives
  • Search Agent - Retrieves relevant code patterns
  • Design Explorer - Generates aesthetic specifications
  • Code Generator - Synthesizes executable Python code
  • Debug Agent - Executes code and fixes errors
  • Visual Evaluator - Assesses quality and triggers refinement

Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/CoDA.git
cd CoDA

# Install dependencies
pip install -r requirements.txt

# Configure API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY

Usage

Web Interface (Gradio)

python app.py

Open http://localhost:7860 in your browser.

Command Line

python main.py --query "Create a bar chart of sales by category" --data sales.csv

Options:

  • -q, --query: Visualization query (required)
  • -d, --data: Data file path(s) (required)
  • -o, --output: Output directory (default: outputs)
  • --max-iterations: Refinement iterations (default: 3)
  • --min-score: Quality threshold (default: 7.0)

Python API

from coda.orchestrator import CodaOrchestrator

orchestrator = CodaOrchestrator()
result = orchestrator.run(
    query="Show sales trends over time",
    data_paths=["sales_data.csv"]
)

if result.success:
    print(f"Visualization saved to: {result.output_file}")
    print(f"Quality Score: {result.scores['overall']}/10")

Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face
  2. Select "Gradio" as the SDK
  3. Upload all files from this repository
  4. Add GROQ_API_KEY as a Secret in Space Settings
  5. The Space will automatically build and deploy

Architecture

Natural Language Query + Data Files
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Query Analyzer β”‚ ─── Extracts intent, TODO list
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Data Processor β”‚ ─── Metadata extraction (no full load)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ VizMapping    β”‚ ─── Chart type, encodings
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Search Agent  β”‚ ─── Code examples
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Design Explorerβ”‚ ─── Colors, layout, styling
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Code Generator β”‚ ─── Python visualization code
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Debug Agent   β”‚ ─── Execute & fix errors
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Visual Evaluatorβ”‚ ─── Quality assessment
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
      ───────┴───────
    ↓ Feedback Loop ↓
    (if quality < threshold)

Configuration

Environment variables (in .env):

Variable Default Description
GROQ_API_KEY Required Your Groq API key
CODA_DEFAULT_MODEL llama-3.3-70b-versatile Text model
CODA_VISION_MODEL llama-3.2-90b-vision-preview Vision model
CODA_MIN_OVERALL_SCORE 7.0 Quality threshold
CODA_MAX_ITERATIONS 3 Max refinement loops

Supported Data Formats

  • CSV (.csv)
  • JSON (.json)
  • Excel (.xlsx, .xls)
  • Parquet (.parquet)

Requirements

License

MIT License - See LICENSE for details.

Citation

If you use CoDA in your research, please cite:

@article{chen2025coda,
  title={CoDA: Agentic Systems for Collaborative Data Visualization},
  author={Chen, Zichen and Chen, Jiefeng and Arik, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2510.03194},
  year={2025},
  url={https://arxiv.org/abs/2510.03194},
  doi={10.48550/arXiv.2510.03194}
}

Paper: arXiv:2510.03194