melihcatal commited on
Commit
e426ee7
·
verified ·
1 Parent(s): 9ba7718

Add files using upload-large-folder tool

Browse files
Files changed (28) hide show
  1. .gitattributes +1 -0
  2. qwen3-4b-instruct/dp8/adapter/README.md +207 -0
  3. qwen3-4b-instruct/dp8/adapter/adapter_config.json +46 -0
  4. qwen3-4b-instruct/dp8/adapter/adapter_model.safetensors +3 -0
  5. qwen3-4b-instruct/dp8/audit_results.json +137 -0
  6. qwen3-4b-instruct/dp8/audit_scores.npz +3 -0
  7. qwen3-4b-instruct/dp8/canary_meta.json +0 -0
  8. qwen3-4b-instruct/dp8/codecarbon.csv +2 -0
  9. qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/README.md +207 -0
  10. qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/adapter_config.json +46 -0
  11. qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/adapter_model.safetensors +3 -0
  12. qwen3-4b-instruct/dp8/epochs/epoch_001/audit_results.json +137 -0
  13. qwen3-4b-instruct/dp8/epochs/epoch_001/audit_scores.npz +3 -0
  14. qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/README.md +207 -0
  15. qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/adapter_config.json +46 -0
  16. qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/adapter_model.safetensors +3 -0
  17. qwen3-4b-instruct/dp8/epochs/epoch_002/audit_results.json +137 -0
  18. qwen3-4b-instruct/dp8/epochs/epoch_002/audit_scores.npz +3 -0
  19. qwen3-4b-instruct/dp8/metrics.jsonl +27 -0
  20. qwen3-4b-instruct/dp8/pretrain_lm_head.pt +3 -0
  21. qwen3-4b-instruct/dp8/resolved_config.yaml +101 -0
  22. qwen3-4b-instruct/dp8/scalars.csv +358 -0
  23. qwen3-4b-instruct/dp8/summary.json +72 -0
  24. qwen3-4b-instruct/dp8/tensorboard/events.out.tfevents.1773764448.7b654b6988b0.41500.0 +3 -0
  25. qwen3-4b-instruct/dp8/tokenizer/chat_template.jinja +61 -0
  26. qwen3-4b-instruct/dp8/tokenizer/tokenizer.json +3 -0
  27. qwen3-4b-instruct/dp8/tokenizer/tokenizer_config.json +516 -0
  28. qwen3-4b-instruct/dp8/train.log +21 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  qwen3-4b-instruct/base/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  qwen3-4b-instruct/base/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ qwen3-4b-instruct/dp8/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
qwen3-4b-instruct/dp8/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp8/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "k_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "o_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp8/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19061f24801c4d66921418e3cb1135de75978cb4d228db814b0755fbaada6bc4
3
+ size 4721857072
qwen3-4b-instruct/dp8/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.515088,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.516208,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp8/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:126c74d41f98d08626da6ca1f818fce7bb97a1b2a1dad6827e8baa20288a484f
3
+ size 12784
qwen3-4b-instruct/dp8/canary_meta.json ADDED
The diff for this file is too large to render. See raw diff
 
qwen3-4b-instruct/dp8/codecarbon.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,experiment_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,water_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,cpu_utilization_percent,gpu_utilization_percent,ram_utilization_percent,ram_used_gb,on_cloud,pue,wue
2
+ 2026-03-17T17:05:44,codedp-qwen3-4b-instruct-cpt-dp8,2b4560e3-78e0-4790-b542-38dc1a784383,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,2693.6014373912476,0.09826528580395394,3.648100436830916e-05,72.03163066573623,3108.4298352234105,54.0,0.051919376660011826,2.3238466076869315,0.03891114561875117,2.414677129965695,0.0,Sweden,SWE,östergötland county,,,Linux-6.8.0-94-generic-x86_64-with-glibc2.39,3.11.0,3.2.3,256,AMD EPYC 9554 64-Core Processor,8,8 x NVIDIA H200,16.1885,58.594,1511.49019241333,machine,3.719566840926081,83.2715646004481,5.423711725167903,82.01636290603649,N,1.0,0.0
qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "k_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "o_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp8/epochs/epoch_001/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b93a52a9ec5792a1c961bcf1a280df58fe64e837ff061ecca5e022aacaaaa07a
3
+ size 4721857072
qwen3-4b-instruct/dp8/epochs/epoch_001/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.51404,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.512816,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp8/epochs/epoch_001/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da624cffb9a22496e27c1ea983a7a58ba6d30e9c7f48429125a4c2c7801f61f2
3
+ size 12784
qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "k_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "o_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp8/epochs/epoch_002/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19061f24801c4d66921418e3cb1135de75978cb4d228db814b0755fbaada6bc4
3
+ size 4721857072
qwen3-4b-instruct/dp8/epochs/epoch_002/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.515088,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.516208,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp8/epochs/epoch_002/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:126c74d41f98d08626da6ca1f818fce7bb97a1b2a1dad6827e8baa20288a484f
3
+ size 12784
qwen3-4b-instruct/dp8/metrics.jsonl ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"timestamp": 1773764685.4650462, "event": "train_step", "step": 10, "epoch": 1, "metrics": {"train/step_loss": 1.6114869984713467, "train/step_real_loss": 1.1970022171735764, "train/lr": 0.0002, "train/step_canary_loss": 14.875, "perf/step_duration_sec": 13.257156527135521, "perf/samples_per_sec": 4.978443142381804, "perf/tokens_per_sec": 3881.0735842701297, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 51452.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.5080663593969574, "system/cuda_memory_allocated_gb": 12.68057107925415, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
2
+ {"timestamp": 1773764819.8299851, "event": "train_step", "step": 20, "epoch": 1, "metrics": {"train/step_loss": 1.7427907831528608, "train/step_real_loss": 1.0802308320999146, "train/lr": 0.0001984111204336116, "train/step_canary_loss": 12.34375, "perf/step_duration_sec": 13.233567330986261, "perf/samples_per_sec": 5.138448182507728, "perf/tokens_per_sec": 4095.645463116454, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 54200.0, "perf/physical_batches": 9.0, "privacy/epsilon": 3.1012868027106864, "system/cuda_memory_allocated_gb": 13.261098384857178, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
3
+ {"timestamp": 1773764953.865748, "event": "train_step", "step": 30, "epoch": 1, "metrics": {"train/step_loss": 1.00910531390797, "train/step_real_loss": 1.0406398549675941, "train/lr": 0.0001936949724999762, "train/step_canary_loss": 0.0, "perf/step_duration_sec": 14.175612474791706, "perf/samples_per_sec": 4.51479610590444, "perf/tokens_per_sec": 3616.84548665354, "perf/logical_batch_size": 64.0, "perf/logical_token_count": 51271.0, "perf/physical_batches": 10.0, "privacy/epsilon": 3.577863845948693, "system/cuda_memory_allocated_gb": 14.42089033126831, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
4
+ {"timestamp": 1773765087.7159872, "event": "train_step", "step": 40, "epoch": 1, "metrics": {"train/step_loss": 1.4425305669957942, "train/step_real_loss": 1.0335080847144127, "train/lr": 0.00018600142402077006, "train/step_canary_loss": 14.53125, "perf/step_duration_sec": 13.013761347159743, "perf/samples_per_sec": 5.071554505984891, "perf/tokens_per_sec": 4077.6835062817304, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 53066.0, "perf/physical_batches": 9.0, "privacy/epsilon": 3.9945146358205394, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
5
+ {"timestamp": 1773765221.5064714, "event": "train_step", "step": 50, "epoch": 1, "metrics": {"train/step_loss": 1.2258210108830379, "train/step_real_loss": 1.0516150891780853, "train/lr": 0.00017557495743542585, "train/step_canary_loss": 12.375, "perf/step_duration_sec": 13.473957392852753, "perf/samples_per_sec": 4.824120939738104, "perf/tokens_per_sec": 3929.2093967940723, "perf/logical_batch_size": 65.0, "perf/logical_token_count": 52942.0, "perf/physical_batches": 9.0, "privacy/epsilon": 4.370419605916781, "system/cuda_memory_allocated_gb": 12.39033031463623, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
6
+ {"timestamp": 1773765234.2224665, "event": "eval_step", "step": 50, "epoch": 1, "metrics": {"eval/loss": 0.950404628729209, "eval/duration_sec": 12.71370144886896}}
7
+ {"timestamp": 1773765368.860026, "event": "train_step", "step": 60, "epoch": 1, "metrics": {"train/step_loss": 2.122071954182216, "train/step_real_loss": 1.006563015282154, "train/lr": 0.0001627469007380852, "train/step_canary_loss": 14.020833969116211, "perf/step_duration_sec": 13.671399443875998, "perf/samples_per_sec": 5.120178097887117, "perf/tokens_per_sec": 3658.732977947337, "perf/logical_batch_size": 70.0, "perf/logical_token_count": 50020.0, "perf/physical_batches": 9.0, "privacy/epsilon": 4.716422199842559, "system/cuda_memory_allocated_gb": 13.841647148132324, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
8
+ {"timestamp": 1773765503.2535436, "event": "train_step", "step": 70, "epoch": 1, "metrics": {"train/step_loss": 1.7725614239187801, "train/step_real_loss": 1.058151200413704, "train/lr": 0.0001479248986720057, "train/step_canary_loss": 13.203125, "perf/step_duration_sec": 13.894904132932425, "perf/samples_per_sec": 4.893880472254044, "perf/tokens_per_sec": 3568.2146149169926, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 49580.0, "perf/physical_batches": 9.0, "privacy/epsilon": 5.042051045310402, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
9
+ {"timestamp": 1773765637.8816426, "event": "train_step", "step": 80, "epoch": 1, "metrics": {"train/step_loss": 1.8267304336323458, "train/step_real_loss": 1.0551588982343674, "train/lr": 0.0001315799587615025, "train/step_canary_loss": 14.171875, "perf/step_duration_sec": 13.812608732841909, "perf/samples_per_sec": 4.923038168620387, "perf/tokens_per_sec": 3637.6908208897057, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 50246.0, "perf/physical_batches": 9.0, "privacy/epsilon": 5.348292001109609, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
10
+ {"timestamp": 1773765772.2237043, "event": "train_step", "step": 90, "epoch": 1, "metrics": {"train/step_loss": 1.5600519109128126, "train/step_real_loss": 1.0277105793356895, "train/lr": 0.00011423148382732853, "train/step_canary_loss": 12.916666984558105, "perf/step_duration_sec": 13.590938247274607, "perf/samples_per_sec": 4.929755310560367, "perf/tokens_per_sec": 3897.8913034663583, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 52976.0, "perf/physical_batches": 9.0, "privacy/epsilon": 5.640478844621371, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
11
+ {"timestamp": 1773765818.3224468, "event": "train_epoch", "step": 93, "epoch": 1, "metrics": {"train/epoch_loss": 1.667893763698923, "train/epoch_real_loss": 1.0501683052070176, "train/epoch_canary_loss": 13.414085814260666, "perf/epoch_duration_sec": 1255.631331067998, "perf/epoch_samples_per_sec": 39.58486760419035, "perf/epoch_tokens_per_sec": 30085.320480071918, "perf/epoch_samples": 49704.0, "perf/epoch_tokens": 37776071.0, "system/cuda_epoch_peak_memory_gb": 86.22135162353516, "eval/loss": 0.9280200686592323, "eval/duration_sec": 12.726754495874047, "privacy/epsilon": 5.725566085147695}}
12
+ {"timestamp": 1773765831.5765529, "event": "audit_epoch", "step": 93, "epoch": 1, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.51404, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.512816, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0, "perf/audit_duration_sec": 6.853809644002467}}
13
+ {"timestamp": 1773765927.2307582, "event": "train_step", "step": 100, "epoch": 2, "metrics": {"train/step_loss": 2.1885286586385377, "train/step_real_loss": 0.9171566888689995, "train/lr": 9.643076661610196e-05, "train/step_canary_loss": 13.812500953674316, "perf/step_duration_sec": 13.692489622160792, "perf/samples_per_sec": 5.18532436096129, "perf/tokens_per_sec": 3906.0098985534178, "perf/logical_batch_size": 71.0, "perf/logical_token_count": 53483.0, "perf/physical_batches": 9.0, "privacy/epsilon": 5.920610858546996, "system/cuda_memory_allocated_gb": 14.131910800933838, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
14
+ {"timestamp": 1773765939.9789443, "event": "eval_step", "step": 100, "epoch": 2, "metrics": {"eval/loss": 0.9270543383482176, "eval/duration_sec": 12.745669181924313}}
15
+ {"timestamp": 1773766074.3579476, "event": "train_step", "step": 110, "epoch": 2, "metrics": {"train/step_loss": 1.4203428788618608, "train/step_real_loss": 1.016486406326294, "train/lr": 7.874347104470234e-05, "train/step_canary_loss": 14.34375, "perf/step_duration_sec": 13.423757336102426, "perf/samples_per_sec": 4.916656219827274, "perf/tokens_per_sec": 3954.481496565319, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 53084.0, "perf/physical_batches": 9.0, "privacy/epsilon": 6.190670402749493, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
16
+ {"timestamp": 1773766208.938867, "event": "train_step", "step": 120, "epoch": 2, "metrics": {"train/step_loss": 1.590441582807854, "train/step_real_loss": 1.0116732195019722, "train/lr": 6.173165676349103e-05, "train/step_canary_loss": 13.9375, "perf/step_duration_sec": 13.283335603773594, "perf/samples_per_sec": 5.0439138179243415, "perf/tokens_per_sec": 4092.8725752103383, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 54367.0, "perf/physical_batches": 9.0, "privacy/epsilon": 6.450768498470402, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
17
+ {"timestamp": 1773766343.2955682, "event": "train_step", "step": 130, "epoch": 2, "metrics": {"train/step_loss": 1.8971429631329966, "train/step_real_loss": 0.9516072571277618, "train/lr": 4.593591825444028e-05, "train/step_canary_loss": 14.0, "perf/step_duration_sec": 13.135221335105598, "perf/samples_per_sec": 5.253051946341283, "perf/tokens_per_sec": 4244.389841456128, "perf/logical_batch_size": 69.0, "perf/logical_token_count": 55751.0, "perf/physical_batches": 9.0, "privacy/epsilon": 6.701680965161658, "system/cuda_memory_allocated_gb": 13.55138349533081, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
18
+ {"timestamp": 1773766477.2471156, "event": "train_step", "step": 140, "epoch": 2, "metrics": {"train/step_loss": 1.62191776731121, "train/step_real_loss": 1.027046725153923, "train/lr": 3.185820604061088e-05, "train/step_canary_loss": 14.3125, "perf/step_duration_sec": 13.460042576305568, "perf/samples_per_sec": 4.9776959931719444, "perf/tokens_per_sec": 4005.113631282184, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 53909.0, "perf/physical_batches": 9.0, "privacy/epsilon": 6.94699659418226, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
19
+ {"timestamp": 1773766610.9435787, "event": "train_step", "step": 150, "epoch": 2, "metrics": {"train/step_loss": 1.9888183966926907, "train/step_real_loss": 1.0191947892308235, "train/lr": 1.994587590756397e-05, "train/step_canary_loss": 14.40000057220459, "perf/step_duration_sec": 13.144390502013266, "perf/samples_per_sec": 5.249387561137322, "perf/tokens_per_sec": 3851.1485178598896, "perf/logical_batch_size": 69.0, "perf/logical_token_count": 50621.0, "perf/physical_batches": 9.0, "privacy/epsilon": 7.185356422932143, "system/cuda_memory_allocated_gb": 13.55138349533081, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
20
+ {"timestamp": 1773766623.702012, "event": "eval_step", "step": 150, "epoch": 2, "metrics": {"eval/loss": 0.9247592026606585, "eval/duration_sec": 12.756307526025921}}
21
+ {"timestamp": 1773766758.1413558, "event": "train_step", "step": 160, "epoch": 2, "metrics": {"train/step_loss": 1.475486863743175, "train/step_real_loss": 1.1006973907351494, "train/lr": 1.057747301402887e-05, "train/step_canary_loss": 13.46875, "perf/step_duration_sec": 13.887811011634767, "perf/samples_per_sec": 4.752368817858142, "perf/tokens_per_sec": 3180.198806204889, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 44166.0, "perf/physical_batches": 9.0, "privacy/epsilon": 7.416930400989618, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
22
+ {"timestamp": 1773766893.2984576, "event": "train_step", "step": 170, "epoch": 2, "metrics": {"train/step_loss": 1.6393254835214188, "train/step_real_loss": 1.0540594905614853, "train/lr": 4.050702638550275e-06, "train/step_canary_loss": 14.125, "perf/step_duration_sec": 13.72155143506825, "perf/samples_per_sec": 4.882829781825378, "perf/tokens_per_sec": 3613.58555077656, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 49584.0, "perf/physical_batches": 9.0, "privacy/epsilon": 7.645014003428424, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
23
+ {"timestamp": 1773767029.0999126, "event": "train_step", "step": 180, "epoch": 2, "metrics": {"train/step_loss": 1.7291738425984102, "train/step_real_loss": 1.0520909577608109, "train/lr": 5.729698228102653e-07, "train/step_canary_loss": 12.5625, "perf/step_duration_sec": 13.660048387013376, "perf/samples_per_sec": 4.978020434001367, "perf/tokens_per_sec": 3874.5104336758286, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 52926.0, "perf/physical_batches": 9.0, "privacy/epsilon": 7.8658492724205455, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
24
+ {"timestamp": 1773767117.390551, "event": "train_epoch", "step": 186, "epoch": 2, "metrics": {"train/epoch_loss": 1.5804422382361127, "train/epoch_real_loss": 1.005713254121899, "train/epoch_canary_loss": 13.023172873210777, "perf/epoch_duration_sec": 1272.894041202031, "perf/epoch_samples_per_sec": 38.91918604098259, "perf/epoch_tokens_per_sec": 29671.033705472055, "perf/epoch_samples": 49540.0, "perf/epoch_tokens": 37768082.0, "system/cuda_epoch_peak_memory_gb": 86.22137594223022, "eval/loss": 0.924621483645378, "eval/duration_sec": 12.757435038685799, "privacy/epsilon": 7.996749609735891}}
25
+ {"timestamp": 1773767130.595887, "event": "audit_epoch", "step": 186, "epoch": 2, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.515088, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.516208, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0, "perf/audit_duration_sec": 6.922967464663088}}
26
+ {"timestamp": 1773767143.9571598, "event": "audit_final", "step": 186, "epoch": 2, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.515088, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.516208, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0}}
27
+ {"timestamp": 1773767144.5062578, "event": "energy_final", "step": 186, "epoch": null, "metrics": {"energy/codecarbon/duration": 2693.6014373912476, "energy/codecarbon/emissions": 0.09826528580395394, "energy/codecarbon/emissions_rate": 3.648100436830916e-05, "energy/codecarbon/cpu_power": 72.03163066573623, "energy/codecarbon/gpu_power": 3108.4298352234105, "energy/codecarbon/ram_power": 54.0, "energy/codecarbon/cpu_energy": 0.051919376660011826, "energy/codecarbon/gpu_energy": 2.3238466076869315, "energy/codecarbon/ram_energy": 0.03891114561875117, "energy/codecarbon/energy_consumed": 2.414677129965695, "energy/codecarbon/water_consumed": 0.0, "energy/codecarbon/cpu_count": 256.0, "energy/codecarbon/gpu_count": 8.0, "energy/codecarbon/longitude": 16.1885, "energy/codecarbon/latitude": 58.594, "energy/codecarbon/ram_total_size": 1511.49019241333, "energy/codecarbon/cpu_utilization_percent": 3.719566840926081, "energy/codecarbon/gpu_utilization_percent": 83.2715646004481, "energy/codecarbon/ram_utilization_percent": 5.423711725167903, "energy/codecarbon/ram_used_gb": 82.01636290603649, "energy/codecarbon/pue": 1.0, "energy/codecarbon/wue": 0.0}}
qwen3-4b-instruct/dp8/pretrain_lm_head.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc44b7d60b8e2cf912e4233ff02bc57bb7e91f7a3ba6aa8ea10b7767ca29954a
3
+ size 779106920
qwen3-4b-instruct/dp8/resolved_config.yaml ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ name: Qwen/Qwen3-4B-Instruct-2507
3
+ tokenizer_name: Qwen/Qwen3-4B-Instruct-2507
4
+ max_length: 1024
5
+ dtype: bfloat16
6
+ trust_remote_code: true
7
+ use_fast_tokenizer: true
8
+ cache_dir: null
9
+ local_files_only: false
10
+ low_cpu_mem_usage: true
11
+ tie_word_embeddings: true
12
+ gradient_checkpointing: false
13
+ use_chat_template: false
14
+ dataset:
15
+ name: melihcatal/codedp-cpt
16
+ split: train
17
+ mode: cpt
18
+ text_column: text
19
+ validation_ratio: 0.05
20
+ max_samples: -1
21
+ lora:
22
+ enabled: true
23
+ r: 16
24
+ alpha: 32
25
+ dropout: 0.05
26
+ target_modules:
27
+ - q_proj
28
+ - k_proj
29
+ - v_proj
30
+ - o_proj
31
+ modules_to_save:
32
+ - lm_head
33
+ bias: none
34
+ training:
35
+ seed: 42
36
+ epochs: 2
37
+ warmup_steps: null
38
+ warmup_ratio: 0.05
39
+ mixed_precision: false
40
+ mixed_precision_dtype: bfloat16
41
+ batch_size: 8
42
+ eval_batch_size: 8
43
+ eval_every_steps: 50
44
+ eval_every_epochs: 1
45
+ learning_rate: 0.0002
46
+ optimizer: adamw
47
+ lr_scheduler: cosine
48
+ adam_beta1: 0.9
49
+ adam_beta2: 0.999
50
+ adam_epsilon: 1.0e-08
51
+ sgd_momentum: 0.9
52
+ weight_decay: 0.01
53
+ max_grad_norm: 1.0
54
+ log_every: 10
55
+ gradient_accumulation_steps: 8
56
+ num_workers: 4
57
+ output_dir: runs/cpt/qwen3-4b-instruct/dp8
58
+ distributed:
59
+ strategy: dpddp
60
+ backend: nccl
61
+ devices: null
62
+ dp:
63
+ module_validator: auto
64
+ target_delta: 1.0e-05
65
+ noise_multiplier: null
66
+ max_grad_norm: 1.0
67
+ grad_sample_mode: hooks
68
+ secure_mode: false
69
+ enabled: true
70
+ target_epsilon: 8.0
71
+ clipping: flat
72
+ audit:
73
+ enabled: true
74
+ run_every_epoch: true
75
+ epoch_device: cuda
76
+ q_canary: auto
77
+ num_canaries: 500
78
+ prefix_length: 49
79
+ num_digits: 12
80
+ batch_size: 32
81
+ delta: 1.0e-05
82
+ p_values:
83
+ - 0.05
84
+ - 0.01
85
+ paper_guess_fraction: 0.2
86
+ paper_guess_steps: 20
87
+ enable_holdout_empirical_epsilon: false
88
+ holdout_seed: 42
89
+ tie_seed: 42
90
+ tracking:
91
+ enabled: true
92
+ tensorboard: true
93
+ wandb: false
94
+ wandb_project: codedp-finetune-h200-audit
95
+ wandb_run_name: qwen3-4b-instruct-cpt-dp8
96
+ wandb_mode: online
97
+ codecarbon: true
98
+ codecarbon_output_file: codecarbon.csv
99
+ codecarbon_measure_power_secs: 15
100
+ codecarbon_country_iso_code: null
101
+ codecarbon_project_name: codedp-qwen3-4b-instruct-cpt-dp8
qwen3-4b-instruct/dp8/scalars.csv ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ timestamp,event,step,epoch,key,value
2
+ 1773764685.4650462,train_step,10,1,train/step_loss,1.6114869984713467
3
+ 1773764685.4650462,train_step,10,1,train/step_real_loss,1.1970022171735764
4
+ 1773764685.4650462,train_step,10,1,train/lr,0.0002
5
+ 1773764685.4650462,train_step,10,1,train/step_canary_loss,14.875
6
+ 1773764685.4650462,train_step,10,1,perf/step_duration_sec,13.257156527135521
7
+ 1773764685.4650462,train_step,10,1,perf/samples_per_sec,4.978443142381804
8
+ 1773764685.4650462,train_step,10,1,perf/tokens_per_sec,3881.0735842701297
9
+ 1773764685.4650462,train_step,10,1,perf/logical_batch_size,66.0
10
+ 1773764685.4650462,train_step,10,1,perf/logical_token_count,51452.0
11
+ 1773764685.4650462,train_step,10,1,perf/physical_batches,9.0
12
+ 1773764685.4650462,train_step,10,1,privacy/epsilon,2.5080663593969574
13
+ 1773764685.4650462,train_step,10,1,system/cuda_memory_allocated_gb,12.68057107925415
14
+ 1773764685.4650462,train_step,10,1,system/cuda_max_memory_allocated_gb,86.2213044166565
15
+ 1773764819.8299851,train_step,20,1,train/step_loss,1.7427907831528608
16
+ 1773764819.8299851,train_step,20,1,train/step_real_loss,1.0802308320999146
17
+ 1773764819.8299851,train_step,20,1,train/lr,0.0001984111204336116
18
+ 1773764819.8299851,train_step,20,1,train/step_canary_loss,12.34375
19
+ 1773764819.8299851,train_step,20,1,perf/step_duration_sec,13.233567330986261
20
+ 1773764819.8299851,train_step,20,1,perf/samples_per_sec,5.138448182507728
21
+ 1773764819.8299851,train_step,20,1,perf/tokens_per_sec,4095.645463116454
22
+ 1773764819.8299851,train_step,20,1,perf/logical_batch_size,68.0
23
+ 1773764819.8299851,train_step,20,1,perf/logical_token_count,54200.0
24
+ 1773764819.8299851,train_step,20,1,perf/physical_batches,9.0
25
+ 1773764819.8299851,train_step,20,1,privacy/epsilon,3.1012868027106864
26
+ 1773764819.8299851,train_step,20,1,system/cuda_memory_allocated_gb,13.261098384857178
27
+ 1773764819.8299851,train_step,20,1,system/cuda_max_memory_allocated_gb,86.2213044166565
28
+ 1773764953.865748,train_step,30,1,train/step_loss,1.00910531390797
29
+ 1773764953.865748,train_step,30,1,train/step_real_loss,1.0406398549675941
30
+ 1773764953.865748,train_step,30,1,train/lr,0.0001936949724999762
31
+ 1773764953.865748,train_step,30,1,train/step_canary_loss,0.0
32
+ 1773764953.865748,train_step,30,1,perf/step_duration_sec,14.175612474791706
33
+ 1773764953.865748,train_step,30,1,perf/samples_per_sec,4.51479610590444
34
+ 1773764953.865748,train_step,30,1,perf/tokens_per_sec,3616.84548665354
35
+ 1773764953.865748,train_step,30,1,perf/logical_batch_size,64.0
36
+ 1773764953.865748,train_step,30,1,perf/logical_token_count,51271.0
37
+ 1773764953.865748,train_step,30,1,perf/physical_batches,10.0
38
+ 1773764953.865748,train_step,30,1,privacy/epsilon,3.577863845948693
39
+ 1773764953.865748,train_step,30,1,system/cuda_memory_allocated_gb,14.42089033126831
40
+ 1773764953.865748,train_step,30,1,system/cuda_max_memory_allocated_gb,86.2213044166565
41
+ 1773765087.7159872,train_step,40,1,train/step_loss,1.4425305669957942
42
+ 1773765087.7159872,train_step,40,1,train/step_real_loss,1.0335080847144127
43
+ 1773765087.7159872,train_step,40,1,train/lr,0.00018600142402077006
44
+ 1773765087.7159872,train_step,40,1,train/step_canary_loss,14.53125
45
+ 1773765087.7159872,train_step,40,1,perf/step_duration_sec,13.013761347159743
46
+ 1773765087.7159872,train_step,40,1,perf/samples_per_sec,5.071554505984891
47
+ 1773765087.7159872,train_step,40,1,perf/tokens_per_sec,4077.6835062817304
48
+ 1773765087.7159872,train_step,40,1,perf/logical_batch_size,66.0
49
+ 1773765087.7159872,train_step,40,1,perf/logical_token_count,53066.0
50
+ 1773765087.7159872,train_step,40,1,perf/physical_batches,9.0
51
+ 1773765087.7159872,train_step,40,1,privacy/epsilon,3.9945146358205394
52
+ 1773765087.7159872,train_step,40,1,system/cuda_memory_allocated_gb,12.680593967437744
53
+ 1773765087.7159872,train_step,40,1,system/cuda_max_memory_allocated_gb,86.22135162353516
54
+ 1773765221.5064714,train_step,50,1,train/step_loss,1.2258210108830379
55
+ 1773765221.5064714,train_step,50,1,train/step_real_loss,1.0516150891780853
56
+ 1773765221.5064714,train_step,50,1,train/lr,0.00017557495743542585
57
+ 1773765221.5064714,train_step,50,1,train/step_canary_loss,12.375
58
+ 1773765221.5064714,train_step,50,1,perf/step_duration_sec,13.473957392852753
59
+ 1773765221.5064714,train_step,50,1,perf/samples_per_sec,4.824120939738104
60
+ 1773765221.5064714,train_step,50,1,perf/tokens_per_sec,3929.2093967940723
61
+ 1773765221.5064714,train_step,50,1,perf/logical_batch_size,65.0
62
+ 1773765221.5064714,train_step,50,1,perf/logical_token_count,52942.0
63
+ 1773765221.5064714,train_step,50,1,perf/physical_batches,9.0
64
+ 1773765221.5064714,train_step,50,1,privacy/epsilon,4.370419605916781
65
+ 1773765221.5064714,train_step,50,1,system/cuda_memory_allocated_gb,12.39033031463623
66
+ 1773765221.5064714,train_step,50,1,system/cuda_max_memory_allocated_gb,86.22135162353516
67
+ 1773765234.2224665,eval_step,50,1,eval/loss,0.950404628729209
68
+ 1773765234.2224665,eval_step,50,1,eval/duration_sec,12.71370144886896
69
+ 1773765368.860026,train_step,60,1,train/step_loss,2.122071954182216
70
+ 1773765368.860026,train_step,60,1,train/step_real_loss,1.006563015282154
71
+ 1773765368.860026,train_step,60,1,train/lr,0.0001627469007380852
72
+ 1773765368.860026,train_step,60,1,train/step_canary_loss,14.020833969116211
73
+ 1773765368.860026,train_step,60,1,perf/step_duration_sec,13.671399443875998
74
+ 1773765368.860026,train_step,60,1,perf/samples_per_sec,5.120178097887117
75
+ 1773765368.860026,train_step,60,1,perf/tokens_per_sec,3658.732977947337
76
+ 1773765368.860026,train_step,60,1,perf/logical_batch_size,70.0
77
+ 1773765368.860026,train_step,60,1,perf/logical_token_count,50020.0
78
+ 1773765368.860026,train_step,60,1,perf/physical_batches,9.0
79
+ 1773765368.860026,train_step,60,1,privacy/epsilon,4.716422199842559
80
+ 1773765368.860026,train_step,60,1,system/cuda_memory_allocated_gb,13.841647148132324
81
+ 1773765368.860026,train_step,60,1,system/cuda_max_memory_allocated_gb,86.22135162353516
82
+ 1773765503.2535436,train_step,70,1,train/step_loss,1.7725614239187801
83
+ 1773765503.2535436,train_step,70,1,train/step_real_loss,1.058151200413704
84
+ 1773765503.2535436,train_step,70,1,train/lr,0.0001479248986720057
85
+ 1773765503.2535436,train_step,70,1,train/step_canary_loss,13.203125
86
+ 1773765503.2535436,train_step,70,1,perf/step_duration_sec,13.894904132932425
87
+ 1773765503.2535436,train_step,70,1,perf/samples_per_sec,4.893880472254044
88
+ 1773765503.2535436,train_step,70,1,perf/tokens_per_sec,3568.2146149169926
89
+ 1773765503.2535436,train_step,70,1,perf/logical_batch_size,68.0
90
+ 1773765503.2535436,train_step,70,1,perf/logical_token_count,49580.0
91
+ 1773765503.2535436,train_step,70,1,perf/physical_batches,9.0
92
+ 1773765503.2535436,train_step,70,1,privacy/epsilon,5.042051045310402
93
+ 1773765503.2535436,train_step,70,1,system/cuda_memory_allocated_gb,13.261121273040771
94
+ 1773765503.2535436,train_step,70,1,system/cuda_max_memory_allocated_gb,86.22135162353516
95
+ 1773765637.8816426,train_step,80,1,train/step_loss,1.8267304336323458
96
+ 1773765637.8816426,train_step,80,1,train/step_real_loss,1.0551588982343674
97
+ 1773765637.8816426,train_step,80,1,train/lr,0.0001315799587615025
98
+ 1773765637.8816426,train_step,80,1,train/step_canary_loss,14.171875
99
+ 1773765637.8816426,train_step,80,1,perf/step_duration_sec,13.812608732841909
100
+ 1773765637.8816426,train_step,80,1,perf/samples_per_sec,4.923038168620387
101
+ 1773765637.8816426,train_step,80,1,perf/tokens_per_sec,3637.6908208897057
102
+ 1773765637.8816426,train_step,80,1,perf/logical_batch_size,68.0
103
+ 1773765637.8816426,train_step,80,1,perf/logical_token_count,50246.0
104
+ 1773765637.8816426,train_step,80,1,perf/physical_batches,9.0
105
+ 1773765637.8816426,train_step,80,1,privacy/epsilon,5.348292001109609
106
+ 1773765637.8816426,train_step,80,1,system/cuda_memory_allocated_gb,13.261121273040771
107
+ 1773765637.8816426,train_step,80,1,system/cuda_max_memory_allocated_gb,86.22135162353516
108
+ 1773765772.2237043,train_step,90,1,train/step_loss,1.5600519109128126
109
+ 1773765772.2237043,train_step,90,1,train/step_real_loss,1.0277105793356895
110
+ 1773765772.2237043,train_step,90,1,train/lr,0.00011423148382732853
111
+ 1773765772.2237043,train_step,90,1,train/step_canary_loss,12.916666984558105
112
+ 1773765772.2237043,train_step,90,1,perf/step_duration_sec,13.590938247274607
113
+ 1773765772.2237043,train_step,90,1,perf/samples_per_sec,4.929755310560367
114
+ 1773765772.2237043,train_step,90,1,perf/tokens_per_sec,3897.8913034663583
115
+ 1773765772.2237043,train_step,90,1,perf/logical_batch_size,67.0
116
+ 1773765772.2237043,train_step,90,1,perf/logical_token_count,52976.0
117
+ 1773765772.2237043,train_step,90,1,perf/physical_batches,9.0
118
+ 1773765772.2237043,train_step,90,1,privacy/epsilon,5.640478844621371
119
+ 1773765772.2237043,train_step,90,1,system/cuda_memory_allocated_gb,12.970857620239258
120
+ 1773765772.2237043,train_step,90,1,system/cuda_max_memory_allocated_gb,86.22135162353516
121
+ 1773765818.3224468,train_epoch,93,1,train/epoch_loss,1.667893763698923
122
+ 1773765818.3224468,train_epoch,93,1,train/epoch_real_loss,1.0501683052070176
123
+ 1773765818.3224468,train_epoch,93,1,train/epoch_canary_loss,13.414085814260666
124
+ 1773765818.3224468,train_epoch,93,1,perf/epoch_duration_sec,1255.631331067998
125
+ 1773765818.3224468,train_epoch,93,1,perf/epoch_samples_per_sec,39.58486760419035
126
+ 1773765818.3224468,train_epoch,93,1,perf/epoch_tokens_per_sec,30085.320480071918
127
+ 1773765818.3224468,train_epoch,93,1,perf/epoch_samples,49704.0
128
+ 1773765818.3224468,train_epoch,93,1,perf/epoch_tokens,37776071.0
129
+ 1773765818.3224468,train_epoch,93,1,system/cuda_epoch_peak_memory_gb,86.22135162353516
130
+ 1773765818.3224468,train_epoch,93,1,eval/loss,0.9280200686592323
131
+ 1773765818.3224468,train_epoch,93,1,eval/duration_sec,12.726754495874047
132
+ 1773765818.3224468,train_epoch,93,1,privacy/epsilon,5.725566085147695
133
+ 1773765831.5765529,audit_epoch,93,1,audit/delta,1e-05
134
+ 1773765831.5765529,audit_epoch,93,1,audit/num_canaries,500.0
135
+ 1773765831.5765529,audit_epoch,93,1,audit/num_members,250.0
136
+ 1773765831.5765529,audit_epoch,93,1,audit/paper_guess_fraction,0.2
137
+ 1773765831.5765529,audit_epoch,93,1,audit/paper_guess_steps,20.0
138
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/auc,0.51404
139
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon/0.05,0.0
140
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon/0.01,0.0
141
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
142
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
143
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
144
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
145
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
146
+ 1773765831.5765529,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
147
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/auc,0.512816
148
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon/0.05,0.0
149
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon/0.01,0.0
150
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
151
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
152
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
153
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
154
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
155
+ 1773765831.5765529,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
156
+ 1773765831.5765529,audit_epoch,93,1,perf/audit_duration_sec,6.853809644002467
157
+ 1773765927.2307582,train_step,100,2,train/step_loss,2.1885286586385377
158
+ 1773765927.2307582,train_step,100,2,train/step_real_loss,0.9171566888689995
159
+ 1773765927.2307582,train_step,100,2,train/lr,9.643076661610196e-05
160
+ 1773765927.2307582,train_step,100,2,train/step_canary_loss,13.812500953674316
161
+ 1773765927.2307582,train_step,100,2,perf/step_duration_sec,13.692489622160792
162
+ 1773765927.2307582,train_step,100,2,perf/samples_per_sec,5.18532436096129
163
+ 1773765927.2307582,train_step,100,2,perf/tokens_per_sec,3906.0098985534178
164
+ 1773765927.2307582,train_step,100,2,perf/logical_batch_size,71.0
165
+ 1773765927.2307582,train_step,100,2,perf/logical_token_count,53483.0
166
+ 1773765927.2307582,train_step,100,2,perf/physical_batches,9.0
167
+ 1773765927.2307582,train_step,100,2,privacy/epsilon,5.920610858546996
168
+ 1773765927.2307582,train_step,100,2,system/cuda_memory_allocated_gb,14.131910800933838
169
+ 1773765927.2307582,train_step,100,2,system/cuda_max_memory_allocated_gb,86.22135162353516
170
+ 1773765939.9789443,eval_step,100,2,eval/loss,0.9270543383482176
171
+ 1773765939.9789443,eval_step,100,2,eval/duration_sec,12.745669181924313
172
+ 1773766074.3579476,train_step,110,2,train/step_loss,1.4203428788618608
173
+ 1773766074.3579476,train_step,110,2,train/step_real_loss,1.016486406326294
174
+ 1773766074.3579476,train_step,110,2,train/lr,7.874347104470234e-05
175
+ 1773766074.3579476,train_step,110,2,train/step_canary_loss,14.34375
176
+ 1773766074.3579476,train_step,110,2,perf/step_duration_sec,13.423757336102426
177
+ 1773766074.3579476,train_step,110,2,perf/samples_per_sec,4.916656219827274
178
+ 1773766074.3579476,train_step,110,2,perf/tokens_per_sec,3954.481496565319
179
+ 1773766074.3579476,train_step,110,2,perf/logical_batch_size,66.0
180
+ 1773766074.3579476,train_step,110,2,perf/logical_token_count,53084.0
181
+ 1773766074.3579476,train_step,110,2,perf/physical_batches,9.0
182
+ 1773766074.3579476,train_step,110,2,privacy/epsilon,6.190670402749493
183
+ 1773766074.3579476,train_step,110,2,system/cuda_memory_allocated_gb,12.680593967437744
184
+ 1773766074.3579476,train_step,110,2,system/cuda_max_memory_allocated_gb,86.22137594223022
185
+ 1773766208.938867,train_step,120,2,train/step_loss,1.590441582807854
186
+ 1773766208.938867,train_step,120,2,train/step_real_loss,1.0116732195019722
187
+ 1773766208.938867,train_step,120,2,train/lr,6.173165676349103e-05
188
+ 1773766208.938867,train_step,120,2,train/step_canary_loss,13.9375
189
+ 1773766208.938867,train_step,120,2,perf/step_duration_sec,13.283335603773594
190
+ 1773766208.938867,train_step,120,2,perf/samples_per_sec,5.0439138179243415
191
+ 1773766208.938867,train_step,120,2,perf/tokens_per_sec,4092.8725752103383
192
+ 1773766208.938867,train_step,120,2,perf/logical_batch_size,67.0
193
+ 1773766208.938867,train_step,120,2,perf/logical_token_count,54367.0
194
+ 1773766208.938867,train_step,120,2,perf/physical_batches,9.0
195
+ 1773766208.938867,train_step,120,2,privacy/epsilon,6.450768498470402
196
+ 1773766208.938867,train_step,120,2,system/cuda_memory_allocated_gb,12.970857620239258
197
+ 1773766208.938867,train_step,120,2,system/cuda_max_memory_allocated_gb,86.22137594223022
198
+ 1773766343.2955682,train_step,130,2,train/step_loss,1.8971429631329966
199
+ 1773766343.2955682,train_step,130,2,train/step_real_loss,0.9516072571277618
200
+ 1773766343.2955682,train_step,130,2,train/lr,4.593591825444028e-05
201
+ 1773766343.2955682,train_step,130,2,train/step_canary_loss,14.0
202
+ 1773766343.2955682,train_step,130,2,perf/step_duration_sec,13.135221335105598
203
+ 1773766343.2955682,train_step,130,2,perf/samples_per_sec,5.253051946341283
204
+ 1773766343.2955682,train_step,130,2,perf/tokens_per_sec,4244.389841456128
205
+ 1773766343.2955682,train_step,130,2,perf/logical_batch_size,69.0
206
+ 1773766343.2955682,train_step,130,2,perf/logical_token_count,55751.0
207
+ 1773766343.2955682,train_step,130,2,perf/physical_batches,9.0
208
+ 1773766343.2955682,train_step,130,2,privacy/epsilon,6.701680965161658
209
+ 1773766343.2955682,train_step,130,2,system/cuda_memory_allocated_gb,13.55138349533081
210
+ 1773766343.2955682,train_step,130,2,system/cuda_max_memory_allocated_gb,86.22137594223022
211
+ 1773766477.2471156,train_step,140,2,train/step_loss,1.62191776731121
212
+ 1773766477.2471156,train_step,140,2,train/step_real_loss,1.027046725153923
213
+ 1773766477.2471156,train_step,140,2,train/lr,3.185820604061088e-05
214
+ 1773766477.2471156,train_step,140,2,train/step_canary_loss,14.3125
215
+ 1773766477.2471156,train_step,140,2,perf/step_duration_sec,13.460042576305568
216
+ 1773766477.2471156,train_step,140,2,perf/samples_per_sec,4.9776959931719444
217
+ 1773766477.2471156,train_step,140,2,perf/tokens_per_sec,4005.113631282184
218
+ 1773766477.2471156,train_step,140,2,perf/logical_batch_size,67.0
219
+ 1773766477.2471156,train_step,140,2,perf/logical_token_count,53909.0
220
+ 1773766477.2471156,train_step,140,2,perf/physical_batches,9.0
221
+ 1773766477.2471156,train_step,140,2,privacy/epsilon,6.94699659418226
222
+ 1773766477.2471156,train_step,140,2,system/cuda_memory_allocated_gb,12.970857620239258
223
+ 1773766477.2471156,train_step,140,2,system/cuda_max_memory_allocated_gb,86.22137594223022
224
+ 1773766610.9435787,train_step,150,2,train/step_loss,1.9888183966926907
225
+ 1773766610.9435787,train_step,150,2,train/step_real_loss,1.0191947892308235
226
+ 1773766610.9435787,train_step,150,2,train/lr,1.994587590756397e-05
227
+ 1773766610.9435787,train_step,150,2,train/step_canary_loss,14.40000057220459
228
+ 1773766610.9435787,train_step,150,2,perf/step_duration_sec,13.144390502013266
229
+ 1773766610.9435787,train_step,150,2,perf/samples_per_sec,5.249387561137322
230
+ 1773766610.9435787,train_step,150,2,perf/tokens_per_sec,3851.1485178598896
231
+ 1773766610.9435787,train_step,150,2,perf/logical_batch_size,69.0
232
+ 1773766610.9435787,train_step,150,2,perf/logical_token_count,50621.0
233
+ 1773766610.9435787,train_step,150,2,perf/physical_batches,9.0
234
+ 1773766610.9435787,train_step,150,2,privacy/epsilon,7.185356422932143
235
+ 1773766610.9435787,train_step,150,2,system/cuda_memory_allocated_gb,13.55138349533081
236
+ 1773766610.9435787,train_step,150,2,system/cuda_max_memory_allocated_gb,86.22137594223022
237
+ 1773766623.702012,eval_step,150,2,eval/loss,0.9247592026606585
238
+ 1773766623.702012,eval_step,150,2,eval/duration_sec,12.756307526025921
239
+ 1773766758.1413558,train_step,160,2,train/step_loss,1.475486863743175
240
+ 1773766758.1413558,train_step,160,2,train/step_real_loss,1.1006973907351494
241
+ 1773766758.1413558,train_step,160,2,train/lr,1.057747301402887e-05
242
+ 1773766758.1413558,train_step,160,2,train/step_canary_loss,13.46875
243
+ 1773766758.1413558,train_step,160,2,perf/step_duration_sec,13.887811011634767
244
+ 1773766758.1413558,train_step,160,2,perf/samples_per_sec,4.752368817858142
245
+ 1773766758.1413558,train_step,160,2,perf/tokens_per_sec,3180.198806204889
246
+ 1773766758.1413558,train_step,160,2,perf/logical_batch_size,66.0
247
+ 1773766758.1413558,train_step,160,2,perf/logical_token_count,44166.0
248
+ 1773766758.1413558,train_step,160,2,perf/physical_batches,9.0
249
+ 1773766758.1413558,train_step,160,2,privacy/epsilon,7.416930400989618
250
+ 1773766758.1413558,train_step,160,2,system/cuda_memory_allocated_gb,12.680593967437744
251
+ 1773766758.1413558,train_step,160,2,system/cuda_max_memory_allocated_gb,86.22137594223022
252
+ 1773766893.2984576,train_step,170,2,train/step_loss,1.6393254835214188
253
+ 1773766893.2984576,train_step,170,2,train/step_real_loss,1.0540594905614853
254
+ 1773766893.2984576,train_step,170,2,train/lr,4.050702638550275e-06
255
+ 1773766893.2984576,train_step,170,2,train/step_canary_loss,14.125
256
+ 1773766893.2984576,train_step,170,2,perf/step_duration_sec,13.72155143506825
257
+ 1773766893.2984576,train_step,170,2,perf/samples_per_sec,4.882829781825378
258
+ 1773766893.2984576,train_step,170,2,perf/tokens_per_sec,3613.58555077656
259
+ 1773766893.2984576,train_step,170,2,perf/logical_batch_size,67.0
260
+ 1773766893.2984576,train_step,170,2,perf/logical_token_count,49584.0
261
+ 1773766893.2984576,train_step,170,2,perf/physical_batches,9.0
262
+ 1773766893.2984576,train_step,170,2,privacy/epsilon,7.645014003428424
263
+ 1773766893.2984576,train_step,170,2,system/cuda_memory_allocated_gb,12.970857620239258
264
+ 1773766893.2984576,train_step,170,2,system/cuda_max_memory_allocated_gb,86.22137594223022
265
+ 1773767029.0999126,train_step,180,2,train/step_loss,1.7291738425984102
266
+ 1773767029.0999126,train_step,180,2,train/step_real_loss,1.0520909577608109
267
+ 1773767029.0999126,train_step,180,2,train/lr,5.729698228102653e-07
268
+ 1773767029.0999126,train_step,180,2,train/step_canary_loss,12.5625
269
+ 1773767029.0999126,train_step,180,2,perf/step_duration_sec,13.660048387013376
270
+ 1773767029.0999126,train_step,180,2,perf/samples_per_sec,4.978020434001367
271
+ 1773767029.0999126,train_step,180,2,perf/tokens_per_sec,3874.5104336758286
272
+ 1773767029.0999126,train_step,180,2,perf/logical_batch_size,68.0
273
+ 1773767029.0999126,train_step,180,2,perf/logical_token_count,52926.0
274
+ 1773767029.0999126,train_step,180,2,perf/physical_batches,9.0
275
+ 1773767029.0999126,train_step,180,2,privacy/epsilon,7.8658492724205455
276
+ 1773767029.0999126,train_step,180,2,system/cuda_memory_allocated_gb,13.261121273040771
277
+ 1773767029.0999126,train_step,180,2,system/cuda_max_memory_allocated_gb,86.22137594223022
278
+ 1773767117.390551,train_epoch,186,2,train/epoch_loss,1.5804422382361127
279
+ 1773767117.390551,train_epoch,186,2,train/epoch_real_loss,1.005713254121899
280
+ 1773767117.390551,train_epoch,186,2,train/epoch_canary_loss,13.023172873210777
281
+ 1773767117.390551,train_epoch,186,2,perf/epoch_duration_sec,1272.894041202031
282
+ 1773767117.390551,train_epoch,186,2,perf/epoch_samples_per_sec,38.91918604098259
283
+ 1773767117.390551,train_epoch,186,2,perf/epoch_tokens_per_sec,29671.033705472055
284
+ 1773767117.390551,train_epoch,186,2,perf/epoch_samples,49540.0
285
+ 1773767117.390551,train_epoch,186,2,perf/epoch_tokens,37768082.0
286
+ 1773767117.390551,train_epoch,186,2,system/cuda_epoch_peak_memory_gb,86.22137594223022
287
+ 1773767117.390551,train_epoch,186,2,eval/loss,0.924621483645378
288
+ 1773767117.390551,train_epoch,186,2,eval/duration_sec,12.757435038685799
289
+ 1773767117.390551,train_epoch,186,2,privacy/epsilon,7.996749609735891
290
+ 1773767130.595887,audit_epoch,186,2,audit/delta,1e-05
291
+ 1773767130.595887,audit_epoch,186,2,audit/num_canaries,500.0
292
+ 1773767130.595887,audit_epoch,186,2,audit/num_members,250.0
293
+ 1773767130.595887,audit_epoch,186,2,audit/paper_guess_fraction,0.2
294
+ 1773767130.595887,audit_epoch,186,2,audit/paper_guess_steps,20.0
295
+ 1773767130.595887,audit_epoch,186,2,audit/loss/auc,0.515088
296
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon/0.05,0.0
297
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon/0.01,0.0
298
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
299
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
300
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
301
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
302
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
303
+ 1773767130.595887,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
304
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/auc,0.516208
305
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon/0.05,0.0
306
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon/0.01,0.0
307
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
308
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
309
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
310
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
311
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
312
+ 1773767130.595887,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
313
+ 1773767130.595887,audit_epoch,186,2,perf/audit_duration_sec,6.922967464663088
314
+ 1773767143.9571598,audit_final,186,2,audit/delta,1e-05
315
+ 1773767143.9571598,audit_final,186,2,audit/num_canaries,500.0
316
+ 1773767143.9571598,audit_final,186,2,audit/num_members,250.0
317
+ 1773767143.9571598,audit_final,186,2,audit/paper_guess_fraction,0.2
318
+ 1773767143.9571598,audit_final,186,2,audit/paper_guess_steps,20.0
319
+ 1773767143.9571598,audit_final,186,2,audit/loss/auc,0.515088
320
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon/0.05,0.0
321
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon/0.01,0.0
322
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
323
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
324
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
325
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
326
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
327
+ 1773767143.9571598,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
328
+ 1773767143.9571598,audit_final,186,2,audit/embedding/auc,0.516208
329
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon/0.05,0.0
330
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon/0.01,0.0
331
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
332
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
333
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
334
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
335
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
336
+ 1773767143.9571598,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
337
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/duration,2693.6014373912476
338
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/emissions,0.09826528580395394
339
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/emissions_rate,3.648100436830916e-05
340
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/cpu_power,72.03163066573623
341
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/gpu_power,3108.4298352234105
342
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/ram_power,54.0
343
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/cpu_energy,0.051919376660011826
344
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/gpu_energy,2.3238466076869315
345
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/ram_energy,0.03891114561875117
346
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/energy_consumed,2.414677129965695
347
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/water_consumed,0.0
348
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/cpu_count,256.0
349
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/gpu_count,8.0
350
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/longitude,16.1885
351
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/latitude,58.594
352
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/ram_total_size,1511.49019241333
353
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/cpu_utilization_percent,3.719566840926081
354
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/gpu_utilization_percent,83.2715646004481
355
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/ram_utilization_percent,5.423711725167903
356
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/ram_used_gb,82.01636290603649
357
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/pue,1.0
358
+ 1773767144.5062578,energy_final,186,,energy/codecarbon/wue,0.0
qwen3-4b-instruct/dp8/summary.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audit/delta": 1e-05,
3
+ "audit/embedding/auc": 0.516208,
4
+ "audit/embedding/empirical_epsilon/0.01": 0.0,
5
+ "audit/embedding/empirical_epsilon/0.05": 0.0,
6
+ "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0,
7
+ "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0,
8
+ "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0,
9
+ "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0,
10
+ "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0,
11
+ "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0,
12
+ "audit/loss/auc": 0.515088,
13
+ "audit/loss/empirical_epsilon/0.01": 0.0,
14
+ "audit/loss/empirical_epsilon/0.05": 0.0,
15
+ "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0,
16
+ "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0,
17
+ "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0,
18
+ "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0,
19
+ "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0,
20
+ "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0,
21
+ "audit/num_canaries": 500.0,
22
+ "audit/num_members": 250.0,
23
+ "audit/paper_guess_fraction": 0.2,
24
+ "audit/paper_guess_steps": 20.0,
25
+ "energy/codecarbon/cpu_count": 256.0,
26
+ "energy/codecarbon/cpu_energy": 0.051919376660011826,
27
+ "energy/codecarbon/cpu_power": 72.03163066573623,
28
+ "energy/codecarbon/cpu_utilization_percent": 3.719566840926081,
29
+ "energy/codecarbon/duration": 2693.6014373912476,
30
+ "energy/codecarbon/emissions": 0.09826528580395394,
31
+ "energy/codecarbon/emissions_rate": 3.648100436830916e-05,
32
+ "energy/codecarbon/energy_consumed": 2.414677129965695,
33
+ "energy/codecarbon/gpu_count": 8.0,
34
+ "energy/codecarbon/gpu_energy": 2.3238466076869315,
35
+ "energy/codecarbon/gpu_power": 3108.4298352234105,
36
+ "energy/codecarbon/gpu_utilization_percent": 83.2715646004481,
37
+ "energy/codecarbon/latitude": 58.594,
38
+ "energy/codecarbon/longitude": 16.1885,
39
+ "energy/codecarbon/pue": 1.0,
40
+ "energy/codecarbon/ram_energy": 0.03891114561875117,
41
+ "energy/codecarbon/ram_power": 54.0,
42
+ "energy/codecarbon/ram_total_size": 1511.49019241333,
43
+ "energy/codecarbon/ram_used_gb": 82.01636290603649,
44
+ "energy/codecarbon/ram_utilization_percent": 5.423711725167903,
45
+ "energy/codecarbon/water_consumed": 0.0,
46
+ "energy/codecarbon/wue": 0.0,
47
+ "eval/duration_sec": 12.757435038685799,
48
+ "eval/loss": 0.924621483645378,
49
+ "perf/audit_duration_sec": 6.922967464663088,
50
+ "perf/epoch_duration_sec": 1272.894041202031,
51
+ "perf/epoch_samples": 49540.0,
52
+ "perf/epoch_samples_per_sec": 38.91918604098259,
53
+ "perf/epoch_tokens": 37768082.0,
54
+ "perf/epoch_tokens_per_sec": 29671.033705472055,
55
+ "perf/logical_batch_size": 68.0,
56
+ "perf/logical_token_count": 52926.0,
57
+ "perf/physical_batches": 9.0,
58
+ "perf/samples_per_sec": 4.978020434001367,
59
+ "perf/step_duration_sec": 13.660048387013376,
60
+ "perf/tokens_per_sec": 3874.5104336758286,
61
+ "privacy/epsilon": 7.996749609735891,
62
+ "system/cuda_epoch_peak_memory_gb": 86.22137594223022,
63
+ "system/cuda_max_memory_allocated_gb": 86.22137594223022,
64
+ "system/cuda_memory_allocated_gb": 13.261121273040771,
65
+ "train/epoch_canary_loss": 13.023172873210777,
66
+ "train/epoch_loss": 1.5804422382361127,
67
+ "train/epoch_real_loss": 1.005713254121899,
68
+ "train/lr": 5.729698228102653e-07,
69
+ "train/step_canary_loss": 12.5625,
70
+ "train/step_loss": 1.7291738425984102,
71
+ "train/step_real_loss": 1.0520909577608109
72
+ }
qwen3-4b-instruct/dp8/tensorboard/events.out.tfevents.1773764448.7b654b6988b0.41500.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eff1ed8df15d5771e60d52c8dab89e065560c7481fb911bfbbce08dd4f2d2e70
3
+ size 25026
qwen3-4b-instruct/dp8/tokenizer/chat_template.jinja ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- for message in messages %}
18
+ {%- if message.content is string %}
19
+ {%- set content = message.content %}
20
+ {%- else %}
21
+ {%- set content = '' %}
22
+ {%- endif %}
23
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
+ {%- elif message.role == "assistant" %}
26
+ {{- '<|im_start|>' + message.role + '\n' + content }}
27
+ {%- if message.tool_calls %}
28
+ {%- for tool_call in message.tool_calls %}
29
+ {%- if (loop.first and content) or (not loop.first) %}
30
+ {{- '\n' }}
31
+ {%- endif %}
32
+ {%- if tool_call.function %}
33
+ {%- set tool_call = tool_call.function %}
34
+ {%- endif %}
35
+ {{- '<tool_call>\n{"name": "' }}
36
+ {{- tool_call.name }}
37
+ {{- '", "arguments": ' }}
38
+ {%- if tool_call.arguments is string %}
39
+ {{- tool_call.arguments }}
40
+ {%- else %}
41
+ {{- tool_call.arguments | tojson }}
42
+ {%- endif %}
43
+ {{- '}\n</tool_call>' }}
44
+ {%- endfor %}
45
+ {%- endif %}
46
+ {{- '<|im_end|>\n' }}
47
+ {%- elif message.role == "tool" %}
48
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
+ {{- '<|im_start|>user' }}
50
+ {%- endif %}
51
+ {{- '\n<tool_response>\n' }}
52
+ {{- content }}
53
+ {{- '\n</tool_response>' }}
54
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
+ {{- '<|im_end|>\n' }}
56
+ {%- endif %}
57
+ {%- endif %}
58
+ {%- endfor %}
59
+ {%- if add_generation_prompt %}
60
+ {{- '<|im_start|>assistant\n' }}
61
+ {%- endif %}
qwen3-4b-instruct/dp8/tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e9c8aef460c70c1e1c32afe895f455856c0075e5706f06e6d80b2f581137715
3
+ size 11517150
qwen3-4b-instruct/dp8/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,516 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "865331112869",
10
+ "569765693871",
11
+ "485177821815",
12
+ "135441121756",
13
+ "367459894796",
14
+ "877482678543",
15
+ "457919547633",
16
+ "765474393376",
17
+ "114848338811",
18
+ "746285987371",
19
+ "649291669397",
20
+ "927914615679",
21
+ "445925149649",
22
+ "691587454538",
23
+ "143777992227",
24
+ "997981281989",
25
+ "425949483533",
26
+ "982993456429",
27
+ "718726519731",
28
+ "172599315861",
29
+ "643489267333",
30
+ "282322838685",
31
+ "781653545886",
32
+ "796415361892",
33
+ "841991688488",
34
+ "211411365397",
35
+ "698218415444",
36
+ "355977139358",
37
+ "682564697312",
38
+ "383837596997",
39
+ "689362171782",
40
+ "749966767285",
41
+ "753159165157",
42
+ "795693824762",
43
+ "669689115557",
44
+ "327491773134",
45
+ "983569279932",
46
+ "612128769512",
47
+ "374327157578",
48
+ "311632789559",
49
+ "523918658846",
50
+ "765981581453",
51
+ "794825141891",
52
+ "873898736873",
53
+ "447445629421",
54
+ "473822473819",
55
+ "181439694557",
56
+ "592538279337",
57
+ "668134915514",
58
+ "643692393748",
59
+ "696651276628",
60
+ "853859348234",
61
+ "778466723723",
62
+ "929826356991",
63
+ "272362973463",
64
+ "694235616268",
65
+ "281673864127",
66
+ "479676316326",
67
+ "646979124677",
68
+ "922327493433",
69
+ "883685933161",
70
+ "264259917554",
71
+ "836746273134",
72
+ "658481324922",
73
+ "481884157827",
74
+ "587787496812",
75
+ "579184949249",
76
+ "912193598348",
77
+ "529679678956",
78
+ "795838284624",
79
+ "159337222655",
80
+ "173781362446",
81
+ "773687856563",
82
+ "535787224917",
83
+ "351885857332",
84
+ "578827344666",
85
+ "198462689911",
86
+ "722618266242",
87
+ "952872416512",
88
+ "517778845323",
89
+ "749665846687",
90
+ "661436365453",
91
+ "259666844669",
92
+ "242851284913",
93
+ "514532995959",
94
+ "161588262349",
95
+ "742765629356",
96
+ "225164373623",
97
+ "676539973863",
98
+ "826214551218",
99
+ "182345464792",
100
+ "232776999554",
101
+ "337326533813",
102
+ "676676697292",
103
+ "929185622831",
104
+ "545512344383",
105
+ "499444466686",
106
+ "314697386682",
107
+ "517379856925",
108
+ "379557332953",
109
+ "614797267726",
110
+ "429781429464",
111
+ "922466849763",
112
+ "721737645236",
113
+ "479227349997",
114
+ "136931728327",
115
+ "259533577263",
116
+ "488538864842",
117
+ "937495658852",
118
+ "489991411364",
119
+ "499148455254",
120
+ "441373944925",
121
+ "899151413682",
122
+ "467893531755",
123
+ "527117488925",
124
+ "928335588653",
125
+ "374439448821",
126
+ "879425227932",
127
+ "867678158885",
128
+ "399749397872",
129
+ "129693547287",
130
+ "689285841825",
131
+ "771619544974",
132
+ "724883568652",
133
+ "516968424863",
134
+ "733737988257",
135
+ "852347289392",
136
+ "296953381169",
137
+ "377273562477",
138
+ "262296912232",
139
+ "547149832394",
140
+ "298464134954",
141
+ "216667245274",
142
+ "843998562287",
143
+ "572154333646",
144
+ "124589118494",
145
+ "841824384614",
146
+ "232896526252",
147
+ "295448593321",
148
+ "123741461297",
149
+ "653573457168",
150
+ "196735786156",
151
+ "377338713663",
152
+ "964342468552",
153
+ "586855179568",
154
+ "484773717614",
155
+ "894885246797",
156
+ "677896358599",
157
+ "848845611563",
158
+ "851852651677",
159
+ "398549545767",
160
+ "454244839926",
161
+ "799364566435",
162
+ "967114116556",
163
+ "817378986438",
164
+ "233795848681",
165
+ "824387273757",
166
+ "916198946615",
167
+ "563117729724",
168
+ "951794811935",
169
+ "374598961236",
170
+ "922867396683",
171
+ "765737843639",
172
+ "175469284871",
173
+ "231853711778",
174
+ "662426712668",
175
+ "711412347158",
176
+ "753466987363",
177
+ "513361312532",
178
+ "712992815957",
179
+ "971621888444",
180
+ "829235161526",
181
+ "585544633356",
182
+ "582471228164",
183
+ "678666359123",
184
+ "557533689478",
185
+ "632962475133",
186
+ "484489193824",
187
+ "489562189822",
188
+ "589547936288",
189
+ "363214487524",
190
+ "244885399387",
191
+ "431751228368",
192
+ "433581868192",
193
+ "486391569221",
194
+ "185438575221",
195
+ "126574388585",
196
+ "741757479784",
197
+ "529854679937",
198
+ "996116119839",
199
+ "616248973917",
200
+ "763531783491",
201
+ "955456118295",
202
+ "364196983365",
203
+ "195792996468",
204
+ "151859598873",
205
+ "399223169721",
206
+ "938488813964",
207
+ "961981959227",
208
+ "183368827562",
209
+ "533417736566",
210
+ "786391632558",
211
+ "665661658354",
212
+ "693281533643",
213
+ "475794684356",
214
+ "652154162978",
215
+ "753233719644",
216
+ "668514843129",
217
+ "819162623892",
218
+ "941169431859",
219
+ "877385381798",
220
+ "752644929761",
221
+ "881136466196",
222
+ "275597777299",
223
+ "731681792655",
224
+ "961133895172",
225
+ "864718285734",
226
+ "963852916563",
227
+ "319584985416",
228
+ "563365646341",
229
+ "811371928234",
230
+ "837131396371",
231
+ "267514771964",
232
+ "944513428457",
233
+ "117298239631",
234
+ "158142752582",
235
+ "252867443568",
236
+ "839269684865",
237
+ "612788593128",
238
+ "145669731981",
239
+ "121557291859",
240
+ "245416776926",
241
+ "799417897197",
242
+ "997958836435",
243
+ "892336777248",
244
+ "158929292238",
245
+ "581976444672",
246
+ "897784492783",
247
+ "492373714791",
248
+ "512659818733",
249
+ "881112998642",
250
+ "619454958782",
251
+ "431149748713",
252
+ "624221476921",
253
+ "125866399464",
254
+ "339882449689",
255
+ "186198784585",
256
+ "943193294691",
257
+ "955668961269",
258
+ "232787996724",
259
+ "215671314196",
260
+ "286173241916",
261
+ "745977673725",
262
+ "556976448182",
263
+ "599961512792",
264
+ "766294538337",
265
+ "934912591213",
266
+ "295118729589",
267
+ "529455466433",
268
+ "196119929397",
269
+ "379571934299",
270
+ "251789649997",
271
+ "564544131355",
272
+ "244371196654",
273
+ "384598329253",
274
+ "887753195844",
275
+ "364947325679",
276
+ "655517954651",
277
+ "673948786567",
278
+ "857231548835",
279
+ "816115936673",
280
+ "644234165531",
281
+ "182782912224",
282
+ "234316622259",
283
+ "421369185549",
284
+ "434632855397",
285
+ "921889371893",
286
+ "415956914763",
287
+ "598916996413",
288
+ "773671349113",
289
+ "952465217972",
290
+ "117657531962",
291
+ "729825168745",
292
+ "691315125346",
293
+ "768461952319",
294
+ "664847713559",
295
+ "953267689786",
296
+ "886464195129",
297
+ "824488329416",
298
+ "837873762491",
299
+ "532833541879",
300
+ "669183782449",
301
+ "941976537588",
302
+ "739394546916",
303
+ "267954879268",
304
+ "637551427887",
305
+ "217756494954",
306
+ "524444658383",
307
+ "117783274348",
308
+ "138218735276",
309
+ "814611949491",
310
+ "711641973413",
311
+ "499156317423",
312
+ "515856611931",
313
+ "454164859837",
314
+ "345271433112",
315
+ "462294118988",
316
+ "511785788222",
317
+ "497294727353",
318
+ "866519986723",
319
+ "334513529294",
320
+ "549946382131",
321
+ "284445431422",
322
+ "396521188476",
323
+ "421435255895",
324
+ "133373659361",
325
+ "322683334381",
326
+ "228358422847",
327
+ "291762694874",
328
+ "143182978129",
329
+ "511923256573",
330
+ "327158398268",
331
+ "879764613759",
332
+ "564395222747",
333
+ "451161679736",
334
+ "538631466654",
335
+ "221762325616",
336
+ "218391991184",
337
+ "322589379462",
338
+ "876537814263",
339
+ "152676556624",
340
+ "332522971941",
341
+ "884354318946",
342
+ "513349618943",
343
+ "116639746413",
344
+ "635185846287",
345
+ "993832498489",
346
+ "813981174797",
347
+ "438745114173",
348
+ "983493951323",
349
+ "724492262421",
350
+ "622553389126",
351
+ "889965243135",
352
+ "364492359246",
353
+ "154962668224",
354
+ "179564995814",
355
+ "418412875665",
356
+ "718951851413",
357
+ "699446724178",
358
+ "624266421831",
359
+ "815458725125",
360
+ "455423278865",
361
+ "393741199486",
362
+ "328552864359",
363
+ "211662639865",
364
+ "218784516525",
365
+ "762486672996",
366
+ "142799718159",
367
+ "858146415154",
368
+ "767858144912",
369
+ "571317457151",
370
+ "635127952696",
371
+ "116427191984",
372
+ "268921994538",
373
+ "523937669294",
374
+ "165429152138",
375
+ "739246183345",
376
+ "591464355756",
377
+ "212985874612",
378
+ "191887635211",
379
+ "967214577653",
380
+ "119342152414",
381
+ "946444632795",
382
+ "618423867817",
383
+ "228565148417",
384
+ "729116422489",
385
+ "527874729936",
386
+ "739784153482",
387
+ "387763951128",
388
+ "331369926711",
389
+ "562716493614",
390
+ "739667844957",
391
+ "562389434565",
392
+ "256497188281",
393
+ "859927364588",
394
+ "417668946583",
395
+ "357621613582",
396
+ "438435178228",
397
+ "485692541169",
398
+ "825815739116",
399
+ "342221452223",
400
+ "697747991249",
401
+ "716763689965",
402
+ "141499982867",
403
+ "818479319499",
404
+ "336813343298",
405
+ "594688742928",
406
+ "472129283475",
407
+ "514354144759",
408
+ "349249721685",
409
+ "546276298359",
410
+ "353755529131",
411
+ "315534574435",
412
+ "523723475786",
413
+ "215826764872",
414
+ "367968398551",
415
+ "569853653352",
416
+ "389715484387",
417
+ "293847485454",
418
+ "714738141818",
419
+ "178478368922",
420
+ "581493616981",
421
+ "589439538674",
422
+ "846657726193",
423
+ "722339992679",
424
+ "138154781148",
425
+ "757785319772",
426
+ "492516914298",
427
+ "919181521716",
428
+ "985781138935",
429
+ "476969195485",
430
+ "313145133463",
431
+ "758963111966",
432
+ "147541537162",
433
+ "557163366873",
434
+ "144373897488",
435
+ "522515164754",
436
+ "724964923582",
437
+ "284776712475",
438
+ "375429755114",
439
+ "181233596124",
440
+ "948585673431",
441
+ "243165586174",
442
+ "396847976144",
443
+ "997724962668",
444
+ "558837194455",
445
+ "163165456396",
446
+ "378749551722",
447
+ "161238482259",
448
+ "754978243758",
449
+ "195388849133",
450
+ "229775525672",
451
+ "262437452884",
452
+ "441377892146",
453
+ "451885565366",
454
+ "981277526855",
455
+ "762495822823",
456
+ "368763327262",
457
+ "757422791351",
458
+ "636324136426",
459
+ "214193645583",
460
+ "412843856172",
461
+ "179386156569",
462
+ "756916173536",
463
+ "892697125149",
464
+ "625334487352",
465
+ "941861857715",
466
+ "887417525236",
467
+ "649516938598",
468
+ "717628619782",
469
+ "438124184139",
470
+ "547563892268",
471
+ "856317483891",
472
+ "313313831273",
473
+ "371496153876",
474
+ "587541149322",
475
+ "265847332563",
476
+ "449549215429",
477
+ "163497196769",
478
+ "861342291298",
479
+ "268433315926",
480
+ "774679513717",
481
+ "851254219729",
482
+ "583527834464",
483
+ "488496781997",
484
+ "556814553861",
485
+ "482829231639",
486
+ "618878266619",
487
+ "147444452794",
488
+ "949235426629",
489
+ "357299947518",
490
+ "175528632226",
491
+ "645527857972",
492
+ "186872457894",
493
+ "552738847828",
494
+ "626748382482",
495
+ "921894985642",
496
+ "943878645871",
497
+ "859289776479",
498
+ "614583493135",
499
+ "933775286797",
500
+ "332234613346",
501
+ "325196781219",
502
+ "142526557681",
503
+ "356722692178",
504
+ "449318681694",
505
+ "687284547244",
506
+ "947262995132",
507
+ "893974619684",
508
+ "797238311233"
509
+ ],
510
+ "is_local": false,
511
+ "model_max_length": 1010000,
512
+ "pad_token": "<|endoftext|>",
513
+ "split_special_tokens": false,
514
+ "tokenizer_class": "Qwen2Tokenizer",
515
+ "unk_token": null
516
+ }
qwen3-4b-instruct/dp8/train.log ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-03-17 16:24:45,463 [INFO] new_opacus_codex.train_steps: epoch=1 step=10 loss=1.6904
2
+ 2026-03-17 16:26:59,829 [INFO] new_opacus_codex.train_steps: epoch=1 step=20 loss=1.6599
3
+ 2026-03-17 16:29:13,865 [INFO] new_opacus_codex.train_steps: epoch=1 step=30 loss=1.6978
4
+ 2026-03-17 16:31:27,715 [INFO] new_opacus_codex.train_steps: epoch=1 step=40 loss=1.5809
5
+ 2026-03-17 16:33:41,506 [INFO] new_opacus_codex.train_steps: epoch=1 step=50 loss=1.5804
6
+ 2026-03-17 16:33:54,222 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=1 step=50 eval_loss=0.9504 duration_sec=12.71
7
+ 2026-03-17 16:36:08,859 [INFO] new_opacus_codex.train_steps: epoch=1 step=60 loss=1.5633
8
+ 2026-03-17 16:38:23,253 [INFO] new_opacus_codex.train_steps: epoch=1 step=70 loss=1.7269
9
+ 2026-03-17 16:40:37,881 [INFO] new_opacus_codex.train_steps: epoch=1 step=80 loss=1.7133
10
+ 2026-03-17 16:42:52,223 [INFO] new_opacus_codex.train_steps: epoch=1 step=90 loss=1.5771
11
+ 2026-03-17 16:45:27,230 [INFO] new_opacus_codex.train_steps: epoch=2 step=100 loss=1.7272
12
+ 2026-03-17 16:45:39,978 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=2 step=100 eval_loss=0.9271 duration_sec=12.75
13
+ 2026-03-17 16:47:54,357 [INFO] new_opacus_codex.train_steps: epoch=2 step=110 loss=1.7404
14
+ 2026-03-17 16:50:08,938 [INFO] new_opacus_codex.train_steps: epoch=2 step=120 loss=1.5583
15
+ 2026-03-17 16:52:23,295 [INFO] new_opacus_codex.train_steps: epoch=2 step=130 loss=1.5209
16
+ 2026-03-17 16:54:37,246 [INFO] new_opacus_codex.train_steps: epoch=2 step=140 loss=1.7010
17
+ 2026-03-17 16:56:50,943 [INFO] new_opacus_codex.train_steps: epoch=2 step=150 loss=1.5696
18
+ 2026-03-17 16:57:03,701 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=2 step=150 eval_loss=0.9248 duration_sec=12.76
19
+ 2026-03-17 16:59:18,140 [INFO] new_opacus_codex.train_steps: epoch=2 step=160 loss=1.3749
20
+ 2026-03-17 17:01:33,298 [INFO] new_opacus_codex.train_steps: epoch=2 step=170 loss=1.6490
21
+ 2026-03-17 17:03:49,099 [INFO] new_opacus_codex.train_steps: epoch=2 step=180 loss=1.6707