File size: 684 Bytes
23dc4b1
1b1d946
 
 
 
23dc4b1
3c6c9bf
23dc4b1
 
1b1d946
23dc4b1
 
1b1d946
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
---
title: synthkit
emoji: 🧪
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
pinned: false
license: mit
---

# synthkit — synthetic data, graded

Generate synthetic LLM data and **grade it** on validity, uniqueness, diversity,
and contamination, with an A+→F headline. This Space is a live, offline demo
(template generation + lexical grading).

The full command-line tool adds LLM-backed instruction→output generation,
fine-tuning output formats (alpaca / sharegpt / openai), and an embedding-based
semantic-dedup axis that catches paraphrase duplicates lexical methods miss.

👉 **Source & docs:** https://github.com/LaelaZorana/synthkit