| --- |
| title: synthkit |
| emoji: 🧪 |
| colorFrom: indigo |
| colorTo: blue |
| sdk: gradio |
| sdk_version: 6.16.0 |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # synthkit — synthetic data, graded |
|
|
| Generate synthetic LLM data and **grade it** on validity, uniqueness, diversity, |
| and contamination, with an A+→F headline. This Space is a live, offline demo |
| (template generation + lexical grading). |
|
|
| The full command-line tool adds LLM-backed instruction→output generation, |
| fine-tuning output formats (alpaca / sharegpt / openai), and an embedding-based |
| semantic-dedup axis that catches paraphrase duplicates lexical methods miss. |
|
|
| 👉 **Source & docs:** https://github.com/LaelaZorana/synthkit |
|
|