Add speed benchmarks table
Browse files- tabgan-synthetic-data.md +23 -2
tabgan-synthetic-data.md
CHANGED
|
@@ -181,7 +181,7 @@ pip install tabgan
|
|
| 181 |
|
| 182 |
## Benchmarks
|
| 183 |
|
| 184 |
-
Normalized ROC AUC
|
| 185 |
|
| 186 |
| Dataset | CTGAN | Forest Diffusion | Random |
|
| 187 |
|---------|-------|-------------------|--------|
|
|
@@ -189,7 +189,28 @@ Normalized ROC AUC comparison across 6 datasets:
|
|
| 189 |
| Adult Census | 0.689 | 0.712 | 0.523 |
|
| 190 |
| Telecom | 0.814 | 0.799 | 0.548 |
|
| 191 |
|
| 192 |
-
*Higher is better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
## What's Next
|
| 195 |
|
|
|
|
| 181 |
|
| 182 |
## Benchmarks
|
| 183 |
|
| 184 |
+
### Quality (Normalized ROC AUC)
|
| 185 |
|
| 186 |
| Dataset | CTGAN | Forest Diffusion | Random |
|
| 187 |
|---------|-------|-------------------|--------|
|
|
|
|
| 189 |
| Adult Census | 0.689 | 0.712 | 0.523 |
|
| 190 |
| Telecom | 0.814 | 0.799 | 0.548 |
|
| 191 |
|
| 192 |
+
*Higher is better.*
|
| 193 |
+
|
| 194 |
+
### Speed (generation time, 1000 rows, 8 features)
|
| 195 |
+
|
| 196 |
+
| Generator | Time | Notes |
|
| 197 |
+
|-----------|------|-------|
|
| 198 |
+
| **Random Baseline** | ~0.1s | Instant — just resampling |
|
| 199 |
+
| **CTGAN (GAN)** | ~1–10s | Fast, depends on epochs |
|
| 200 |
+
| **Forest Diffusion** | ~30–120s | High quality, but slower |
|
| 201 |
+
| **LLM (GReaT)** | ~5–30min | Best for text columns, GPU recommended |
|
| 202 |
+
|
| 203 |
+
Every `generate_data_pipe()` call now records per-step timing in `generator.last_timing_`:
|
| 204 |
+
|
| 205 |
+
```python
|
| 206 |
+
gen = GANGenerator(gen_x_times=1.1)
|
| 207 |
+
synthetic, _ = gen.generate_data_pipe(train, target, test)
|
| 208 |
+
print(gen.last_timing_)
|
| 209 |
+
# {'preprocess': 0.001, 'generation': 2.3, 'postprocess': 0.01,
|
| 210 |
+
# 'adversarial_filtering': 0.15, 'total': 2.46}
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
*Full benchmarks in the [README](https://github.com/Diyago/Tabular-data-generation).*
|
| 214 |
|
| 215 |
## What's Next
|
| 216 |
|