synthkit / README.md
LaelaZ's picture
Upgrade to Gradio 6.16.0
3c6c9bf verified
---
title: synthkit
emoji: 🧪
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
license: mit
---
# synthkit — synthetic data, graded
Generate synthetic LLM data and **grade it** on validity, uniqueness, diversity,
and contamination, with an A+→F headline. This Space is a live, offline demo
(template generation + lexical grading).
The full command-line tool adds LLM-backed instruction→output generation,
fine-tuning output formats (alpaca / sharegpt / openai), and an embedding-based
semantic-dedup axis that catches paraphrase duplicates lexical methods miss.
👉 **Source & docs:** https://github.com/LaelaZorana/synthkit