🚀 SyGra V2.0.0
UI-first workflows · Multimodal generation
Broader model coverage · Enterprise-grade pipelines
👋 New to SyGra?
SyGra is a low-code / no-code framework for building synthetic dataset generation pipelines for model training, evaluation workflows.
SyGra v1.0.0 blog: https://huggingface.co/blog/ServiceNow-AI/sygra-data-gen-framework
SyGra 2.0.0 is a major milestone release designed to make synthetic data generation and evaluation workflows easier to build, richer to run, and simpler to observe. This release introduces a UI-first Studio, full multimodal pipelines, enterprise-ready data, first-class tool calling, Semantic Deduplication, Self Refinement, expanded provider support, observability, and evaluation capabilities.
🔗 Project Links
🎨 SyGra Studio — Visual Workflow Builder & Run Monitoring
SyGra Studio introduces a UI-first experience that replaces manual YAML editing with an intuitive drag-and-drop graph builder. Users can visually design workflows, execute tasks, monitor node-level progress, and inspect outputs and metadata such as latency, token usage, and estimated cost. Studio dramatically improves iteration speed, debuggability, and collaboration across teams.
Useful links:
🎧🗣️🖼️ Multimodal Pipelines — Audio, Speech, and Images
SyGra expands beyond text-only workflows with first-class multimodal support, enabling audio transcription, text-to-speech, image generation, and bidirectional audio conversations. 🔊 Audio Transcription (Audio → Text) SyGra supports dedicated transcription models such as Whisper and gpt-4o-transcribe. Audio inputs are correctly routed using input_type: audio, enabling speech-based dataset creation, enrichment, and evaluation pipelines.
Useful links:
🗣️ Text-to-Speech (Text → Audio)
SyGra supports text-to-speech models for scalable audio generation. Workflows can emit audio artifacts using output_type: audio, enabling voice dataset publishing and conversational speech synthesis at scale.
Useful links:
🖼️ Image Generation & Editing
SyGra integrates image generation and editing endpoints. Generated images are stored as managed artifacts and returned as file paths for downstream use in multimodal datasets and evaluation workflows.
Useful links:
🎙️ GPT-4o Audio (Audio ↔ Audio)
SyGra supports GPT-4o audio preview models for unified audio-in and audio-out workflows. This enables conversational voice datasets and audio-to-audio generation using a chat-based API.
Useful links:
🏢 ServiceNow Instance Integration
SyGra can read from and write to ServiceNow tables as both data sources and sinks. This enables end-to-end enterprise pipelines for enrichment, analysis, and synthetic data generation.
Useful links:
🔗 Multi-Dataset Joins & Aliasing
SyGra supports multiple datasets as sources and sinks within a single workflow. Join strategies include primary, cross, random, sequential, column-based joins, and vertical stacking.
Useful links:
🛠️ Tool Support in LLM Nodes (First-Class Tool Calling)
SyGra 2.0.0 adds first-class tool calling directly within LLM nodes. This enables workflows to generate structured tool calls without agent nodes and unlocks tool-call traces suitable for supervised fine-tuning. Evaluation workflows can now validate whether the correct tool and parameters were invoked.
Useful links:
🧹 Semantic Deduplication
SyGra includes embedding-based semantic deduplication for near-duplicate removal. Uses Performance optimal Similarity search provided by Langgraph Vector Store as default. All pair Cosine similarity for smaller dataset to ensure dataset diversity.
Useful links:
🔁 Self-Refinement Recipe
SyGra includes a reusable self-refinement subgraph recipe combining generation, judging, and iterative refinement. Reflection trajectories are captured for training, evaluation, and analysis use cases.
Useful links:
📊 Metadata & Observability
SyGra automatically captures rich execution metadata across runs. Metrics include latency percentiles, token usage, node-level costs, and structured artifacts for downstream analysis and optimization.
Useful links:
🌐 Expanded Provider & Model Integrations
SyGra now defaults to LiteLLM backed model routing, simplifying expansion across providers. Explicit integrations include Google Vertex AI and AWS Bedrock, with support for text, image, and audio modalities.
🎯 Summary
SyGra 2.0.0 delivers UI-first workflow design, multimodal generation, broad provider support, and enterprise-grade observability and evaluation—making it easier than ever to build, run, and scale high-quality synthetic data pipelines.