Update README.md
Browse files
README.md
CHANGED
|
@@ -64,7 +64,7 @@ Data construction summary:
|
|
| 64 |
- Compile the source with compile(source, '<string>', 'exec') under CPython 3.12; skip samples that raise SyntaxError/ValueError.
|
| 65 |
- Extract raw bytecode bytes from compiled_code.co_code and convert to a list of integers in [0, 255].
|
| 66 |
- Save as JSON Lines: one sample per line, each line a JSON array of integers.
|
| 67 |
-
- Cap: up to 100,000 samples in this release.
|
| 68 |
|
| 69 |
Notes: Bytecode format is Python-version dependent (these samples use CPython 3.12, 2-byte instructions). No extra normalization or dedup beyond the source dataset. Any truncation/padding or chunking is handled at training time.
|
| 70 |
|
|
|
|
| 64 |
- Compile the source with compile(source, '<string>', 'exec') under CPython 3.12; skip samples that raise SyntaxError/ValueError.
|
| 65 |
- Extract raw bytecode bytes from compiled_code.co_code and convert to a list of integers in [0, 255].
|
| 66 |
- Save as JSON Lines: one sample per line, each line a JSON array of integers.
|
| 67 |
+
- Cap: up to 100,000,000 samples in this release (10% splitted for validation).
|
| 68 |
|
| 69 |
Notes: Bytecode format is Python-version dependent (these samples use CPython 3.12, 2-byte instructions). No extra normalization or dedup beyond the source dataset. Any truncation/padding or chunking is handled at training time.
|
| 70 |
|