Update static/schema_help.md
Browse files- static/schema_help.md +19 -24
static/schema_help.md
CHANGED
|
@@ -1,31 +1,26 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
-
`data/raw/nodes.csv`
|
| 7 |
|
| 8 |
-
|
| 9 |
-
- node identity
|
| 10 |
-
- prompt/template content
|
| 11 |
-
- parent lineage
|
| 12 |
-
- metadata tags
|
| 13 |
-
- timestamps
|
| 14 |
-
- provenance links
|
| 15 |
|
| 16 |
-
|
| 17 |
-
`data/raw/runs.csv`
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
- node_id link
|
| 22 |
-
- model + token usage
|
| 23 |
-
- outputs + result_json
|
| 24 |
-
- score + status
|
| 25 |
-
- timestamps + triggering source
|
| 26 |
|
| 27 |
-
|
| 28 |
-
Hugging Face’s CSV dataset builder requires:
|
| 29 |
-
**all CSV files in the same config/split must share identical columns**.
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
MindsEye Dataset Configs (Important)
|
| 2 |
|
| 3 |
+
If you have:
|
| 4 |
|
| 5 |
+
nodes.csv (Prompt Evolution Nodes — immutable cognitive states.)
|
|
|
|
| 6 |
|
| 7 |
+
runs.csv (Execution Records — each reasoning run.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
They SHOULD NOT be in the same dataset builder config unless they share identical columns.
|
|
|
|
| 10 |
|
| 11 |
+
Correct options:
|
| 12 |
+
Option A (recommended): Two configs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
config: nodes -> data/raw/nodes.csv
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
config: runs -> data/raw/runs.csv
|
| 17 |
+
|
| 18 |
+
Option B: Two splits
|
| 19 |
+
|
| 20 |
+
split: nodes -> nodes.csv
|
| 21 |
+
|
| 22 |
+
split: runs -> runs.csv
|
| 23 |
+
|
| 24 |
+
If HF says:
|
| 25 |
+
“All the data files must have the same columns…”
|
| 26 |
+
it means it thinks multiple CSVs belong to the same table.
|