🚀 SyGra V2.0.0

Community Article Published February 5, 2026

UI-first workflows · Multimodal generation
Broader model coverage · Enterprise-grade pipelines


👋 New to SyGra?

SyGra is a low-code / no-code framework for building synthetic dataset generation pipelines for model training, evaluation workflows.

SyGra v1.0.0 blog: https://huggingface.co/blog/ServiceNow-AI/sygra-data-gen-framework


SyGra 2.0.0 is a major milestone release designed to make synthetic data generation and evaluation workflows easier to build, richer to run, and simpler to observe. This release introduces a UI-first Studio, full multimodal pipelines, enterprise-ready data, first-class tool calling, Semantic Deduplication, Self Refinement, expanded provider support, observability, and evaluation capabilities.

🔗 Project Links

🎨 SyGra Studio — Visual Workflow Builder & Run Monitoring

SyGra Studio introduces a UI-first experience that replaces manual YAML editing with an intuitive drag-and-drop graph builder. Users can visually design workflows, execute tasks, monitor node-level progress, and inspect outputs and metadata such as latency, token usage, and estimated cost. Studio dramatically improves iteration speed, debuggability, and collaboration across teams.

Useful links:

🎧🗣️🖼️ Multimodal Pipelines — Audio, Speech, and Images

SyGra expands beyond text-only workflows with first-class multimodal support, enabling audio transcription, text-to-speech, image generation, and bidirectional audio conversations. 🔊 Audio Transcription (Audio → Text) SyGra supports dedicated transcription models such as Whisper and gpt-4o-transcribe. Audio inputs are correctly routed using input_type: audio, enabling speech-based dataset creation, enrichment, and evaluation pipelines.

Useful links:

🗣️ Text-to-Speech (Text → Audio)

SyGra supports text-to-speech models for scalable audio generation. Workflows can emit audio artifacts using output_type: audio, enabling voice dataset publishing and conversational speech synthesis at scale.

Useful links:

🖼️ Image Generation & Editing

SyGra integrates image generation and editing endpoints. Generated images are stored as managed artifacts and returned as file paths for downstream use in multimodal datasets and evaluation workflows.

Useful links:

🎙️ GPT-4o Audio (Audio ↔ Audio)

SyGra supports GPT-4o audio preview models for unified audio-in and audio-out workflows. This enables conversational voice datasets and audio-to-audio generation using a chat-based API.

Useful links:

🏢 ServiceNow Instance Integration

SyGra can read from and write to ServiceNow tables as both data sources and sinks. This enables end-to-end enterprise pipelines for enrichment, analysis, and synthetic data generation.

Useful links:

🔗 Multi-Dataset Joins & Aliasing

SyGra supports multiple datasets as sources and sinks within a single workflow. Join strategies include primary, cross, random, sequential, column-based joins, and vertical stacking.

Useful links:

🛠️ Tool Support in LLM Nodes (First-Class Tool Calling)

SyGra 2.0.0 adds first-class tool calling directly within LLM nodes. This enables workflows to generate structured tool calls without agent nodes and unlocks tool-call traces suitable for supervised fine-tuning. Evaluation workflows can now validate whether the correct tool and parameters were invoked.

Useful links:

🧹 Semantic Deduplication

SyGra includes embedding-based semantic deduplication for near-duplicate removal. Uses Performance optimal Similarity search provided by Langgraph Vector Store as default. All pair Cosine similarity for smaller dataset to ensure dataset diversity.

Useful links:

🔁 Self-Refinement Recipe

SyGra includes a reusable self-refinement subgraph recipe combining generation, judging, and iterative refinement. Reflection trajectories are captured for training, evaluation, and analysis use cases.

Useful links:

📊 Metadata & Observability

SyGra automatically captures rich execution metadata across runs. Metrics include latency percentiles, token usage, node-level costs, and structured artifacts for downstream analysis and optimization.

Useful links:

🌐 Expanded Provider & Model Integrations

SyGra now defaults to LiteLLM backed model routing, simplifying expansion across providers. Explicit integrations include Google Vertex AI and AWS Bedrock, with support for text, image, and audio modalities.

🎯 Summary

SyGra 2.0.0 delivers UI-first workflow design, multimodal generation, broad provider support, and enterprise-grade observability and evaluation—making it easier than ever to build, run, and scale high-quality synthetic data pipelines.

Community

Sign up or log in to comment