Spaces:

betterdataai
/

IRG

Running

App Files Files Community

IRG / baselines /README.md

Zilong-Zhao

first commit

c4ac745 21 days ago

preview code

raw

history blame contribute delete

1.47 kB

Baseline Models

Baseline Models Summary

Model	Implementation
SDV-IND (`ind`)	SDV SDK modified
ClavaDDPM	GitHub

Preprocessing and Baseline Experiments

To align the capability of all baseline models without being limited by the trivial and niche data processing capabilities (e.g., missing data, date time), we preprocess all datasets such that:

Datetime values are converted to numeric values by difference to its minimum value in seconds.
Except for FK columns, all missing values are filled
- Numeric values are filled by a special value smaller than the global minimum and an additional NULL indicator Boolean column is inserted
- Categorical values are filled by a special category "" (so this should not be an existing category in the real data)

Nevertheless, we still regard the capability to handle standard categorical and numeric values as requirement of any model. Relevant data preprocessing and normalization will be done by the models. All evaluation will be applied on this processed results too. Metadata of the processing is saved after execution of preprocess.py of a dataset.

Specific processing required for each baseline is shown under the corresponding directory.