User01110 commited on
Commit
6b49457
·
verified ·
1 Parent(s): d821e28

Upload checkpoint step 1,000

Browse files
Files changed (4) hide show
  1. README.md +21 -35
  2. config.json +1 -1
  3. generation_config.json +1 -1
  4. model.safetensors +1 -1
README.md CHANGED
@@ -8,20 +8,15 @@ base_model: SupraLabs/Supra-1.5-50M-Base-exp
8
  base_model_relation: finetune
9
  datasets:
10
  - nvidia/Nemotron-SFT-Instruction-Following-Chat-v2
11
- - Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned
12
- - MBZUAI/LaMini-instruction
13
- - ketchup123/tulu-gsm8k-openmath-instruct-100k-LF
14
- - NecroMOnk/khan-math-linear_algebra
15
- - endurasolution/ron-math-dataset
16
- - User01110/math-curated-dataset
17
  - microsoft/orca-math-word-problems-200k
18
  - TIGER-Lab/MathInstruct
19
- - openai/gsm8k
20
- - EleutherAI/arithmetic
21
  - Programming-Language/codeagent-python
22
- - jan-hq/multiturn_programming_binarized
23
  - Cutecat6152/python-data-basic
24
  - flytech/python-codes-25k
 
 
 
25
  tags:
26
  - sft
27
  - exact-loss-trainer
@@ -44,11 +39,11 @@ This is an experimental instruction SFT run from `SupraLabs/Supra-1.5-50M-Base-e
44
  | Base revision | `main` |
45
  | Output repo | `User01110/testing-50M` |
46
  | Sequence length | 1024 |
47
- | Max optimizer steps | 20,000 |
48
  | Per-device batch size | 128 |
49
  | Gradient accumulation | 4 |
50
- | Sample presentations per GPU | 10,240,000 |
51
- | Max token slots per GPU | 10,485,760,000 |
52
  | Learning rate | 2.00e-04 |
53
  | Warmup steps | 100 |
54
  | Weight decay | 0.05 |
@@ -59,9 +54,9 @@ This is an experimental instruction SFT run from `SupraLabs/Supra-1.5-50M-Base-e
59
  | Prompt format | ChatML |
60
  | System prompt | `You are a helpful assistant.` |
61
 
62
- The stream randomly mixes math, coding, and conversation-heavy instruction sources. Sources are reopened after exhaustion and keep relooping until the 20,000-step training cap finishes.
63
 
64
- Listed source rows before relooping: 35,728,143. The 20,000-step training budget presents 10,240,000 examples per GPU.
65
 
66
  ## Prompt Template Compatibility
67
 
@@ -129,30 +124,21 @@ print(text)
129
 
130
  | Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
131
  | --- | --- | --- | ---: | --- | --- | --- |
132
- | nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content}], uuid, license, used_in, reasoning | ChatML conversation turns; reasoning_off split only | reloops until max_steps |
133
- | Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | General-Distillation | train | 187,794 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
134
- | Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | General-Math | train | 76,727 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
135
- | Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | MultilingualSTEM | train | 89,997 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
136
- | Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | PHD-Science | train | 103,307 | conversations[{from, value}], input, output, domain, meta | human/gpt turns; assistant <think> blocks stripped | reloops until max_steps |
137
- | MBZUAI/LaMini-instruction | default | train | 2,585,615 | instruction, response, instruction_source | instruction -> response | reloops until max_steps |
138
- | ketchup123/tulu-gsm8k-openmath-instruct-100k-LF | default | train | 100,000 | conversations[{role, content}] | math conversations to ChatML turns | reloops until max_steps |
139
- | NecroMOnk/khan-math-linear_algebra | default | train | 1,295,000 | messages[{role, content}], topic, subtopic | math tutor messages to ChatML turns | reloops until max_steps |
140
- | endurasolution/ron-math-dataset | default | train | 29,226,764 | instruction, input, output | instruction + optional input -> output | reloops until max_steps |
141
- | User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | prompt -> response; ignores source ChatML column and rebuilds clean ChatML | reloops until max_steps |
142
- | microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | question -> answer | reloops until max_steps |
143
- | TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | instruction -> output | reloops until max_steps |
144
- | openai/gsm8k | main | train | 7,473 | question, answer | question -> answer | reloops until max_steps |
145
- | openai/gsm8k | socratic | train | 7,473 | question, answer | question -> answer | reloops until max_steps |
146
- | EleutherAI/arithmetic | 10 validation subsets | validation | 20,000 | context, completion | direct parquet URLs to avoid dataset-script loader failure | reloops until max_steps |
147
- | Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | prompt -> response | reloops until max_steps |
148
- | jan-hq/multiturn_programming_binarized | default | train | 100,139 | messages[{role, content}] | single/multiturn programming messages; all assistant spans labeled | reloops until max_steps |
149
- | Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | instruction -> response | reloops until max_steps |
150
- | flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | instruction + optional input -> output | reloops until max_steps |
151
 
152
  ## Notes
153
 
154
  - Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
155
  - Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
156
- - Kimi assistant text has `<think>...</think>` blocks stripped before tokenization.
157
- - Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until `max_steps`.
158
  - RoPE buffers and tokenizer/model load are verified during final export.
 
8
  base_model_relation: finetune
9
  datasets:
10
  - nvidia/Nemotron-SFT-Instruction-Following-Chat-v2
 
 
 
 
 
 
11
  - microsoft/orca-math-word-problems-200k
12
  - TIGER-Lab/MathInstruct
13
+ - User01110/math-curated-dataset
 
14
  - Programming-Language/codeagent-python
 
15
  - Cutecat6152/python-data-basic
16
  - flytech/python-codes-25k
17
+ - QuixiAI/open-instruct-uncensored
18
+ - openai/gsm8k
19
+ - EleutherAI/arithmetic
20
  tags:
21
  - sft
22
  - exact-loss-trainer
 
39
  | Base revision | `main` |
40
  | Output repo | `User01110/testing-50M` |
41
  | Sequence length | 1024 |
42
+ | Max optimizer steps | 10,000 |
43
  | Per-device batch size | 128 |
44
  | Gradient accumulation | 4 |
45
+ | Sample presentations per GPU | 5,120,000 |
46
+ | Max token slots per GPU | 5,242,880,000 |
47
  | Learning rate | 2.00e-04 |
48
  | Warmup steps | 100 |
49
  | Weight decay | 0.05 |
 
54
  | Prompt format | ChatML |
55
  | System prompt | `You are a helpful assistant.` |
56
 
57
+ The stream randomly mixes the selected instruction, math, and coding sources. Sources are reopened after exhaustion and keep relooping until the 10,000-step training cap finishes, except `Cutecat6152/python-data-basic`, which is capped at 3 passes.
58
 
59
+ Listed source rows before relooping: 3,718,915. The 10,000-step training budget presents 5,120,000 examples per GPU.
60
 
61
  ## Prompt Template Compatibility
62
 
 
124
 
125
  | Dataset | Config | Split | Rows | Schema | Mapping | Pass policy |
126
  | --- | --- | --- | ---: | --- | --- | --- |
127
+ | nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 | default | reasoning_off | 1,068,273 | messages[{role, content, reasoning_content}] | user/assistant message pairs; reasoning_off only | reloops until max_steps |
128
+ | microsoft/orca-math-word-problems-200k | default | train | 200,035 | question, answer | user=question; assistant=answer | reloops until max_steps |
129
+ | TIGER-Lab/MathInstruct | default | train | 262,039 | source, instruction, output | user=instruction; assistant=output | reloops until max_steps |
130
+ | User01110/math-curated-dataset | default | train | 50,944 | id, source, prompt, index, model, response, chatml | user=prompt; assistant=response; rebuilds clean ChatML | reloops until max_steps |
131
+ | Programming-Language/codeagent-python | default | train | 296,837 | prompt, response | user=prompt; assistant=response | reloops until max_steps |
132
+ | Cutecat6152/python-data-basic | default | train | 100 | id, instruction, response | user=instruction; assistant=response | max 3 passes, 300 presentations max |
133
+ | flytech/python-codes-25k | default | train | 49,626 | instruction, input, output, text | user=instruction plus optional Input block; assistant=output | reloops until max_steps |
134
+ | QuixiAI/open-instruct-uncensored | default | train | 1,756,115 | dataset, id, messages[{role, content}] | user/assistant message pairs | reloops until max_steps |
135
+ | openai/gsm8k | main | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
136
+ | openai/gsm8k | socratic | train | 7,473 | question, answer | user=question; assistant=answer | reloops until max_steps |
137
+ | EleutherAI/arithmetic | 10 validation subsets | validation raw JSONL | 20,000 | context, completion | user=context with trailing Answer: stripped; assistant=completion | reloops until max_steps |
 
 
 
 
 
 
 
 
138
 
139
  ## Notes
140
 
141
  - Dataset schemas and row counts were checked through Hugging Face Dataset Viewer metadata where available.
142
  - Multiturn/message datasets carry all assistant spans into the collator, so user/system text remains masked from step 0 while every assistant turn is supervised.
143
+ - Streaming source open/read failures are retried and reopened. Normal stream exhaustion reopens that source and continues mixing it until `max_steps`; `python-data-basic` is dropped after 3 completed passes.
 
144
  - RoPE buffers and tokenizer/model load are verified during final export.
config.json CHANGED
@@ -28,7 +28,7 @@
28
  "type": "linear"
29
  },
30
  "tie_word_embeddings": true,
31
- "transformers_version": "5.10.2",
32
  "use_cache": false,
33
  "vocab_size": 32002
34
  }
 
28
  "type": "linear"
29
  },
30
  "tie_word_embeddings": true,
31
+ "transformers_version": "5.12.0",
32
  "use_cache": false,
33
  "vocab_size": 32002
34
  }
generation_config.json CHANGED
@@ -5,5 +5,5 @@
5
  2
6
  ],
7
  "pad_token_id": 1,
8
- "transformers_version": "5.10.2"
9
  }
 
5
  2
6
  ],
7
  "pad_token_id": 1,
8
+ "transformers_version": "5.12.0"
9
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2ee07e46362d64e4e89969031e909dea6d6b8254d7a2eacced172cdcdf884e2d
3
  size 207161232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ade2681aa6046c53eca3ef8df1515d0f0d44fa21462b533b22ca535010392e0
3
  size 207161232