lucapernice commited on
Commit
79f6e35
·
verified ·
1 Parent(s): ce6b4f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -64,7 +64,7 @@ Data construction summary:
64
  - Compile the source with compile(source, '<string>', 'exec') under CPython 3.12; skip samples that raise SyntaxError/ValueError.
65
  - Extract raw bytecode bytes from compiled_code.co_code and convert to a list of integers in [0, 255].
66
  - Save as JSON Lines: one sample per line, each line a JSON array of integers.
67
- - Cap: up to 100,000 samples in this release.
68
 
69
  Notes: Bytecode format is Python-version dependent (these samples use CPython 3.12, 2-byte instructions). No extra normalization or dedup beyond the source dataset. Any truncation/padding or chunking is handled at training time.
70
 
 
64
  - Compile the source with compile(source, '<string>', 'exec') under CPython 3.12; skip samples that raise SyntaxError/ValueError.
65
  - Extract raw bytecode bytes from compiled_code.co_code and convert to a list of integers in [0, 255].
66
  - Save as JSON Lines: one sample per line, each line a JSON array of integers.
67
+ - Cap: up to 100,000,000 samples in this release (10% splitted for validation).
68
 
69
  Notes: Bytecode format is Python-version dependent (these samples use CPython 3.12, 2-byte instructions). No extra normalization or dedup beyond the source dataset. Any truncation/padding or chunking is handled at training time.
70