Spaces:
Running
Running
General Information about Datasets
General Data Processing Principles
- We make sure that the relational schema is 100% accurate. Invalid data may be removed or converted to N/A.
- Missing data are removed except for FK.
- Date time columns are converted to numeric data.
Information of this processing is found in processor.json under each folder.
As for data processing on baseline models,
- Composite PKs are ignored.
- Composite FKs are converted to singular FKs by inserting auxiliary singular corresponding PK or candidate key.
- NULL FKs are removed by inserting NULL parent.
Implementation of Baselines
The data schema used for each baseline, including our model, IRG, can be found in schema/ directory under the folder
for each dataset.