--- title: synthkit emoji: 🧪 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 6.16.0 app_file: app.py pinned: false license: mit --- # synthkit — synthetic data, graded Generate synthetic LLM data and **grade it** on validity, uniqueness, diversity, and contamination, with an A+→F headline. This Space is a live, offline demo (template generation + lexical grading). The full command-line tool adds LLM-backed instruction→output generation, fine-tuning output formats (alpaca / sharegpt / openai), and an embedding-based semantic-dedup axis that catches paraphrase duplicates lexical methods miss. 👉 **Source & docs:** https://github.com/LaelaZorana/synthkit