File size: 770 Bytes
c4ac745
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# General Information about Datasets

## General Data Processing Principles

1. We make sure that the relational schema is 100% accurate. Invalid data may be removed or converted to N/A.
2. Missing data are removed except for FK.
3. Date time columns are converted to numeric data.

Information of this processing is found in `processor.json` under each folder.

As for data processing on baseline models,

1. Composite PKs are ignored.
2. Composite FKs are converted to singular FKs by inserting auxiliary singular corresponding PK or candidate key.
3. NULL FKs are removed by inserting NULL parent.

## Implementation of Baselines

The data schema used for each baseline, including our model, IRG, can be found in `schema/` directory under the folder
for each dataset.