File size: 684 Bytes
23dc4b1 1b1d946 23dc4b1 3c6c9bf 23dc4b1 1b1d946 23dc4b1 1b1d946 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ---
title: synthkit
emoji: 🧪
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
license: mit
---
# synthkit — synthetic data, graded
Generate synthetic LLM data and **grade it** on validity, uniqueness, diversity,
and contamination, with an A+→F headline. This Space is a live, offline demo
(template generation + lexical grading).
The full command-line tool adds LLM-backed instruction→output generation,
fine-tuning output formats (alpaca / sharegpt / openai), and an embedding-based
semantic-dedup axis that catches paraphrase duplicates lexical methods miss.
👉 **Source & docs:** https://github.com/LaelaZorana/synthkit
|