InsafQ commited on
Commit
9561d41
·
verified ·
1 Parent(s): ae8bd5f

Add speed benchmarks table

Browse files
Files changed (1) hide show
  1. tabgan-synthetic-data.md +23 -2
tabgan-synthetic-data.md CHANGED
@@ -181,7 +181,7 @@ pip install tabgan
181
 
182
  ## Benchmarks
183
 
184
- Normalized ROC AUC comparison across 6 datasets:
185
 
186
  | Dataset | CTGAN | Forest Diffusion | Random |
187
  |---------|-------|-------------------|--------|
@@ -189,7 +189,28 @@ Normalized ROC AUC comparison across 6 datasets:
189
  | Adult Census | 0.689 | 0.712 | 0.523 |
190
  | Telecom | 0.814 | 0.799 | 0.548 |
191
 
192
- *Higher is better. Full benchmarks in the [README](https://github.com/Diyago/Tabular-data-generation).*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
 
194
  ## What's Next
195
 
 
181
 
182
  ## Benchmarks
183
 
184
+ ### Quality (Normalized ROC AUC)
185
 
186
  | Dataset | CTGAN | Forest Diffusion | Random |
187
  |---------|-------|-------------------|--------|
 
189
  | Adult Census | 0.689 | 0.712 | 0.523 |
190
  | Telecom | 0.814 | 0.799 | 0.548 |
191
 
192
+ *Higher is better.*
193
+
194
+ ### Speed (generation time, 1000 rows, 8 features)
195
+
196
+ | Generator | Time | Notes |
197
+ |-----------|------|-------|
198
+ | **Random Baseline** | ~0.1s | Instant — just resampling |
199
+ | **CTGAN (GAN)** | ~1–10s | Fast, depends on epochs |
200
+ | **Forest Diffusion** | ~30–120s | High quality, but slower |
201
+ | **LLM (GReaT)** | ~5–30min | Best for text columns, GPU recommended |
202
+
203
+ Every `generate_data_pipe()` call now records per-step timing in `generator.last_timing_`:
204
+
205
+ ```python
206
+ gen = GANGenerator(gen_x_times=1.1)
207
+ synthetic, _ = gen.generate_data_pipe(train, target, test)
208
+ print(gen.last_timing_)
209
+ # {'preprocess': 0.001, 'generation': 2.3, 'postprocess': 0.01,
210
+ # 'adversarial_filtering': 0.15, 'total': 2.46}
211
+ ```
212
+
213
+ *Full benchmarks in the [README](https://github.com/Diyago/Tabular-data-generation).*
214
 
215
  ## What's Next
216