melihcatal commited on
Commit
25fb4d4
·
verified ·
1 Parent(s): 07fde2b

Upload folder using huggingface_hub

Browse files
Files changed (28) hide show
  1. .gitattributes +1 -0
  2. qwen3-4b-instruct/dp3/adapter/README.md +207 -0
  3. qwen3-4b-instruct/dp3/adapter/adapter_config.json +46 -0
  4. qwen3-4b-instruct/dp3/adapter/adapter_model.safetensors +3 -0
  5. qwen3-4b-instruct/dp3/audit_results.json +137 -0
  6. qwen3-4b-instruct/dp3/audit_scores.npz +3 -0
  7. qwen3-4b-instruct/dp3/canary_meta.json +0 -0
  8. qwen3-4b-instruct/dp3/codecarbon.csv +2 -0
  9. qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/README.md +207 -0
  10. qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/adapter_config.json +46 -0
  11. qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/adapter_model.safetensors +3 -0
  12. qwen3-4b-instruct/dp3/epochs/epoch_001/audit_results.json +137 -0
  13. qwen3-4b-instruct/dp3/epochs/epoch_001/audit_scores.npz +3 -0
  14. qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/README.md +207 -0
  15. qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/adapter_config.json +46 -0
  16. qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/adapter_model.safetensors +3 -0
  17. qwen3-4b-instruct/dp3/epochs/epoch_002/audit_results.json +137 -0
  18. qwen3-4b-instruct/dp3/epochs/epoch_002/audit_scores.npz +3 -0
  19. qwen3-4b-instruct/dp3/metrics.jsonl +27 -0
  20. qwen3-4b-instruct/dp3/pretrain_lm_head.pt +3 -0
  21. qwen3-4b-instruct/dp3/resolved_config.yaml +101 -0
  22. qwen3-4b-instruct/dp3/scalars.csv +358 -0
  23. qwen3-4b-instruct/dp3/summary.json +72 -0
  24. qwen3-4b-instruct/dp3/tensorboard/events.out.tfevents.1773811283.7b654b6988b0.3379.0 +3 -0
  25. qwen3-4b-instruct/dp3/tokenizer/chat_template.jinja +61 -0
  26. qwen3-4b-instruct/dp3/tokenizer/tokenizer.json +3 -0
  27. qwen3-4b-instruct/dp3/tokenizer/tokenizer_config.json +516 -0
  28. qwen3-4b-instruct/dp3/train.log +21 -0
.gitattributes CHANGED
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  qwen3-4b-instruct/base/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  qwen3-4b-instruct/dp8/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
36
  qwen3-4b-instruct/base/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  qwen3-4b-instruct/dp8/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
39
+ qwen3-4b-instruct/dp3/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
qwen3-4b-instruct/dp3/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp3/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "o_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "k_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp3/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0844d1d1b84e1c37902c9865b63a922a40f22a74e52fc9fe297a73324652cde
3
+ size 4721857072
qwen3-4b-instruct/dp3/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.504792,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.51504,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp3/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cdc062093b88f95ceb1289319631cbeab3c23e571ca5bbfb90fda14e85498d3
3
+ size 12784
qwen3-4b-instruct/dp3/canary_meta.json ADDED
The diff for this file is too large to render. See raw diff
 
qwen3-4b-instruct/dp3/codecarbon.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,experiment_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,water_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,cpu_utilization_percent,gpu_utilization_percent,ram_utilization_percent,ram_used_gb,on_cloud,pue,wue
2
+ 2026-03-18T06:02:23,codedp-qwen3-4b-instruct-cpt-dp3,4b4c965b-253a-402e-94f1-cd15878f8181,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,2457.6702942838892,0.09496692637153915,3.8641036021965845e-05,72.03190690693239,3303.243429373142,54.0,0.04736684855609284,2.25075239893377,0.03550715308856684,2.3336264005784284,0.0,Sweden,SWE,östergötland county,,,Linux-6.8.0-94-generic-x86_64-with-glibc2.39,3.11.0,3.2.3,256,AMD EPYC 9554 64-Core Processor,8,8 x NVIDIA H200,16.1885,58.594,1511.49019241333,machine,3.8756856324191915,94.07749795249795,5.391281211624831,81.57497969388083,N,1.0,0.0
qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "o_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "k_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp3/epochs/epoch_001/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f74fff041f7595364c070bb9a1766fbaf33b5ef486cfd83c339ae5646e8b8931
3
+ size 4721857072
qwen3-4b-instruct/dp3/epochs/epoch_001/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.505664,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.51372,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp3/epochs/epoch_001/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5805d6b95eeea5aa8c4e47bf2e70ccba9c96374348e289a8b67bb9f60dced7c
3
+ size 12784
qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": true,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "lm_head",
26
+ "embed_tokens"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 16,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "o_proj",
36
+ "v_proj",
37
+ "q_proj",
38
+ "k_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
qwen3-4b-instruct/dp3/epochs/epoch_002/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0844d1d1b84e1c37902c9865b63a922a40f22a74e52fc9fe297a73324652cde
3
+ size 4721857072
qwen3-4b-instruct/dp3/epochs/epoch_002/audit_results.json ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "delta": 1e-05,
3
+ "num_canaries": 500,
4
+ "num_members": 250,
5
+ "paper_guess_fraction": 0.2,
6
+ "paper_guess_steps": 20,
7
+ "loss": {
8
+ "auc": 0.504792,
9
+ "empirical_epsilon": {
10
+ "0.05": 0.0,
11
+ "0.01": 0.0
12
+ },
13
+ "empirical_epsilon_details": {
14
+ "0.05": {
15
+ "epsilon": 0.0,
16
+ "num_guesses": 0,
17
+ "correct_guesses": 0,
18
+ "candidate_num_guesses": [
19
+ 5,
20
+ 10,
21
+ 15,
22
+ 20,
23
+ 25,
24
+ 30,
25
+ 35,
26
+ 40,
27
+ 45,
28
+ 50,
29
+ 55,
30
+ 60,
31
+ 65,
32
+ 70,
33
+ 75,
34
+ 80,
35
+ 85,
36
+ 90,
37
+ 95,
38
+ 100
39
+ ],
40
+ "direction": "lower"
41
+ },
42
+ "0.01": {
43
+ "epsilon": 0.0,
44
+ "num_guesses": 0,
45
+ "correct_guesses": 0,
46
+ "candidate_num_guesses": [
47
+ 5,
48
+ 10,
49
+ 15,
50
+ 20,
51
+ 25,
52
+ 30,
53
+ 35,
54
+ 40,
55
+ 45,
56
+ 50,
57
+ 55,
58
+ 60,
59
+ 65,
60
+ 70,
61
+ 75,
62
+ 80,
63
+ 85,
64
+ 90,
65
+ 95,
66
+ 100
67
+ ],
68
+ "direction": "lower"
69
+ }
70
+ }
71
+ },
72
+ "embedding": {
73
+ "auc": 0.51504,
74
+ "empirical_epsilon": {
75
+ "0.05": 0.0,
76
+ "0.01": 0.0
77
+ },
78
+ "empirical_epsilon_details": {
79
+ "0.05": {
80
+ "epsilon": 0.0,
81
+ "num_guesses": 0,
82
+ "correct_guesses": 0,
83
+ "candidate_num_guesses": [
84
+ 5,
85
+ 10,
86
+ 15,
87
+ 20,
88
+ 25,
89
+ 30,
90
+ 35,
91
+ 40,
92
+ 45,
93
+ 50,
94
+ 55,
95
+ 60,
96
+ 65,
97
+ 70,
98
+ 75,
99
+ 80,
100
+ 85,
101
+ 90,
102
+ 95,
103
+ 100
104
+ ],
105
+ "direction": "lower"
106
+ },
107
+ "0.01": {
108
+ "epsilon": 0.0,
109
+ "num_guesses": 0,
110
+ "correct_guesses": 0,
111
+ "candidate_num_guesses": [
112
+ 5,
113
+ 10,
114
+ 15,
115
+ 20,
116
+ 25,
117
+ 30,
118
+ 35,
119
+ 40,
120
+ 45,
121
+ 50,
122
+ 55,
123
+ 60,
124
+ 65,
125
+ 70,
126
+ 75,
127
+ 80,
128
+ 85,
129
+ 90,
130
+ 95,
131
+ 100
132
+ ],
133
+ "direction": "lower"
134
+ }
135
+ }
136
+ }
137
+ }
qwen3-4b-instruct/dp3/epochs/epoch_002/audit_scores.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cdc062093b88f95ceb1289319631cbeab3c23e571ca5bbfb90fda14e85498d3
3
+ size 12784
qwen3-4b-instruct/dp3/metrics.jsonl ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"timestamp": 1773811506.4695234, "event": "train_step", "step": 10, "epoch": 1, "metrics": {"train/step_loss": 1.616579128034187, "train/step_real_loss": 1.2012769132852554, "train/lr": 0.0002, "train/step_canary_loss": 14.90625, "perf/step_duration_sec": 12.063218676950783, "perf/samples_per_sec": 5.471176621054408, "perf/tokens_per_sec": 4265.196659189263, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 51452.0, "perf/physical_batches": 9.0, "privacy/epsilon": 0.7638484077719963, "system/cuda_memory_allocated_gb": 12.68057107925415, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
2
+ {"timestamp": 1773811628.0656273, "event": "train_step", "step": 20, "epoch": 1, "metrics": {"train/step_loss": 1.7544667159809786, "train/step_real_loss": 1.0926365107297897, "train/lr": 0.0001984111204336116, "train/step_canary_loss": 12.34375, "perf/step_duration_sec": 11.990010250825435, "perf/samples_per_sec": 5.671387978614834, "perf/tokens_per_sec": 4520.429830013588, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 54200.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.016371889021752, "system/cuda_memory_allocated_gb": 13.261098384857178, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
3
+ {"timestamp": 1773811750.6200142, "event": "train_step", "step": 30, "epoch": 1, "metrics": {"train/step_loss": 1.0279401504632197, "train/step_real_loss": 1.0600632801651955, "train/lr": 0.0001936949724999762, "train/step_canary_loss": 0.0, "perf/step_duration_sec": 12.954613069072366, "perf/samples_per_sec": 4.94032509182328, "perf/tokens_per_sec": 3957.7407466073655, "perf/logical_batch_size": 64.0, "perf/logical_token_count": 51271.0, "perf/physical_batches": 10.0, "privacy/epsilon": 1.2175571913066654, "system/cuda_memory_allocated_gb": 14.42089033126831, "system/cuda_max_memory_allocated_gb": 86.2213044166565}}
4
+ {"timestamp": 1773811872.468319, "event": "train_step", "step": 40, "epoch": 1, "metrics": {"train/step_loss": 1.4674463272094727, "train/step_real_loss": 1.0572493374347687, "train/lr": 0.00018600142402077006, "train/step_canary_loss": 14.59375, "perf/step_duration_sec": 11.865780014079064, "perf/samples_per_sec": 5.562213349791522, "perf/tokens_per_sec": 4472.188085152074, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 53066.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.3913063211429064, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
5
+ {"timestamp": 1773811994.1354816, "event": "train_step", "step": 50, "epoch": 1, "metrics": {"train/step_loss": 1.257238564124474, "train/step_real_loss": 1.081570416688919, "train/lr": 0.00017557495743542585, "train/step_canary_loss": 12.5, "perf/step_duration_sec": 12.282207927200943, "perf/samples_per_sec": 5.292208077347962, "perf/tokens_per_sec": 4310.462769707012, "perf/logical_batch_size": 65.0, "perf/logical_token_count": 52942.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.5474611608458564, "system/cuda_memory_allocated_gb": 12.39033031463623, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
6
+ {"timestamp": 1773812006.9120464, "event": "eval_step", "step": 50, "epoch": 1, "metrics": {"eval/loss": 0.9748443443423662, "eval/duration_sec": 12.77427957393229}}
7
+ {"timestamp": 1773812128.636093, "event": "train_step", "step": 60, "epoch": 1, "metrics": {"train/step_loss": 2.1636021409715926, "train/step_real_loss": 1.041244499385357, "train/lr": 0.0001627469007380852, "train/step_canary_loss": 14.135416984558105, "perf/step_duration_sec": 12.324101509992033, "perf/samples_per_sec": 5.67992725013227, "perf/tokens_per_sec": 4058.7137293088017, "perf/logical_batch_size": 70.0, "perf/logical_token_count": 50020.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.6892902398315597, "system/cuda_memory_allocated_gb": 13.841647148132324, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
8
+ {"timestamp": 1773812251.0683832, "event": "train_step", "step": 70, "epoch": 1, "metrics": {"train/step_loss": 1.8121652182410746, "train/step_real_loss": 1.0885114818811417, "train/lr": 0.0001479248986720057, "train/step_canary_loss": 13.390625, "perf/step_duration_sec": 12.67668361403048, "perf/samples_per_sec": 5.364178997473597, "perf/tokens_per_sec": 3911.117569040308, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 49580.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.8217089373472166, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
9
+ {"timestamp": 1773812373.025167, "event": "train_step", "step": 80, "epoch": 1, "metrics": {"train/step_loss": 1.8635356987223906, "train/step_real_loss": 1.08547542989254, "train/lr": 0.0001315799587615025, "train/step_canary_loss": 14.3125, "perf/step_duration_sec": 12.623636846896261, "perf/samples_per_sec": 5.386720231635859, "perf/tokens_per_sec": 3980.3109523349326, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 50246.0, "perf/physical_batches": 9.0, "privacy/epsilon": 1.946082329590544, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
10
+ {"timestamp": 1773812495.4775374, "event": "train_step", "step": 90, "epoch": 1, "metrics": {"train/step_loss": 1.5933425817916642, "train/step_real_loss": 1.0586555153131485, "train/lr": 0.00011423148382732853, "train/step_canary_loss": 13.0, "perf/step_duration_sec": 11.945233571808785, "perf/samples_per_sec": 5.608931763219984, "perf/tokens_per_sec": 4434.907001318536, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 52976.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.06391520460725, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
11
+ {"timestamp": 1773812537.9189491, "event": "train_epoch", "step": 93, "epoch": 1, "metrics": {"train/epoch_loss": 1.6919119858601415, "train/epoch_real_loss": 1.0723268173838774, "train/epoch_canary_loss": 13.472450833857557, "perf/epoch_duration_sec": 1142.3914510426112, "perf/epoch_samples_per_sec": 43.508728951566745, "perf/epoch_tokens_per_sec": 33067.53649593877, "perf/epoch_samples": 49704.0, "perf/epoch_tokens": 37776071.0, "system/cuda_epoch_peak_memory_gb": 86.22135162353516, "eval/loss": 0.9517239639774346, "eval/duration_sec": 12.793823145795614, "privacy/epsilon": 2.0981420051109834}}
12
+ {"timestamp": 1773812551.2877524, "event": "audit_epoch", "step": 93, "epoch": 1, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.505664, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.51372, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0, "perf/audit_duration_sec": 6.93540578102693}}
13
+ {"timestamp": 1773812637.181298, "event": "train_step", "step": 100, "epoch": 2, "metrics": {"train/step_loss": 2.226203179695237, "train/step_real_loss": 0.9423502832651138, "train/lr": 9.643076661610196e-05, "train/step_canary_loss": 13.964286804199219, "perf/step_duration_sec": 12.210506019182503, "perf/samples_per_sec": 5.814664837678322, "perf/tokens_per_sec": 4380.080556528868, "perf/logical_batch_size": 71.0, "perf/logical_token_count": 53483.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.176204855129592, "system/cuda_memory_allocated_gb": 14.131910800933838, "system/cuda_max_memory_allocated_gb": 86.22135162353516}}
14
+ {"timestamp": 1773812649.9187398, "event": "eval_step", "step": 100, "epoch": 2, "metrics": {"eval/loss": 0.9495426886356794, "eval/duration_sec": 12.734526461921632}}
15
+ {"timestamp": 1773812771.789377, "event": "train_step", "step": 110, "epoch": 2, "metrics": {"train/step_loss": 1.4468027822899097, "train/step_real_loss": 1.0408434942364693, "train/lr": 7.874347104470234e-05, "train/step_canary_loss": 14.4375, "perf/step_duration_sec": 12.185235306154937, "perf/samples_per_sec": 5.416391094775368, "perf/tokens_per_sec": 4356.419770834176, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 53084.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.2837597517001855, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
16
+ {"timestamp": 1773812893.198121, "event": "train_step", "step": 120, "epoch": 2, "metrics": {"train/step_loss": 1.616717566305132, "train/step_real_loss": 1.0352746099233627, "train/lr": 6.173165676349103e-05, "train/step_canary_loss": 14.020833969116211, "perf/step_duration_sec": 12.073515899013728, "perf/samples_per_sec": 5.549336296105193, "perf/tokens_per_sec": 4502.996513587329, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 54367.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.387262501265619, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
17
+ {"timestamp": 1773813014.0904934, "event": "train_step", "step": 130, "epoch": 2, "metrics": {"train/step_loss": 1.9267044274703315, "train/step_real_loss": 0.9727360382676125, "train/lr": 4.593591825444028e-05, "train/step_canary_loss": 14.137499809265137, "perf/step_duration_sec": 11.957372829318047, "perf/samples_per_sec": 5.770498334786406, "perf/tokens_per_sec": 4662.479024096767, "perf/logical_batch_size": 69.0, "perf/logical_token_count": 55751.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.4870228609511393, "system/cuda_memory_allocated_gb": 13.55138349533081, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
18
+ {"timestamp": 1773813135.2851143, "event": "train_step", "step": 140, "epoch": 2, "metrics": {"train/step_loss": 1.6469186313116728, "train/step_real_loss": 1.0473601296544075, "train/lr": 3.185820604061088e-05, "train/step_canary_loss": 14.4375, "perf/step_duration_sec": 12.27732455311343, "perf/samples_per_sec": 5.4572150235296455, "perf/tokens_per_sec": 4390.940368708353, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 53909.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.5836117621736387, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
19
+ {"timestamp": 1773813257.0352345, "event": "train_step", "step": 150, "epoch": 2, "metrics": {"train/step_loss": 2.0192771579908286, "train/step_real_loss": 1.04129096865654, "train/lr": 1.994587590756397e-05, "train/step_canary_loss": 14.537500381469727, "perf/step_duration_sec": 11.960256013087928, "perf/samples_per_sec": 5.769107277009316, "perf/tokens_per_sec": 4232.434485065052, "perf/logical_batch_size": 69.0, "perf/logical_token_count": 50621.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.6773301521119843, "system/cuda_memory_allocated_gb": 13.55138349533081, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
20
+ {"timestamp": 1773813269.8009667, "event": "eval_step", "step": 150, "epoch": 2, "metrics": {"eval/loss": 0.9415916662949781, "eval/duration_sec": 12.763515894301236}}
21
+ {"timestamp": 1773813391.1203394, "event": "train_step", "step": 160, "epoch": 2, "metrics": {"train/step_loss": 1.4993842009342078, "train/step_real_loss": 1.1243649572134018, "train/lr": 1.057747301402887e-05, "train/step_canary_loss": 13.5, "perf/step_duration_sec": 12.41904565365985, "perf/samples_per_sec": 5.314418018952207, "perf/tokens_per_sec": 3556.311912500654, "perf/logical_batch_size": 66.0, "perf/logical_token_count": 44166.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.768504028383432, "system/cuda_memory_allocated_gb": 12.680593967437744, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
22
+ {"timestamp": 1773813513.5407782, "event": "train_step", "step": 170, "epoch": 2, "metrics": {"train/step_loss": 1.6632740604343699, "train/step_real_loss": 1.0752243772149086, "train/lr": 4.050702638550275e-06, "train/step_canary_loss": 14.208333969116211, "perf/step_duration_sec": 12.597235643770546, "perf/samples_per_sec": 5.318627188904904, "perf/tokens_per_sec": 3936.101649771056, "perf/logical_batch_size": 67.0, "perf/logical_token_count": 49584.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.8571453531213207, "system/cuda_memory_allocated_gb": 12.970857620239258, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
23
+ {"timestamp": 1773813635.9173324, "event": "train_step", "step": 180, "epoch": 2, "metrics": {"train/step_loss": 1.7563851160161636, "train/step_real_loss": 1.0712373107671738, "train/lr": 5.729698228102653e-07, "train/step_canary_loss": 12.71875, "perf/step_duration_sec": 12.466372530907393, "perf/samples_per_sec": 5.454674150913608, "perf/tokens_per_sec": 4245.501236930199, "perf/logical_batch_size": 68.0, "perf/logical_token_count": 52926.0, "perf/physical_batches": 9.0, "privacy/epsilon": 2.9437684274367397, "system/cuda_memory_allocated_gb": 13.261121273040771, "system/cuda_max_memory_allocated_gb": 86.22137594223022}}
24
+ {"timestamp": 1773813716.1885753, "event": "train_epoch", "step": 186, "epoch": 2, "metrics": {"train/epoch_loss": 1.6068291228622247, "train/epoch_real_loss": 1.0283400972633716, "train/epoch_canary_loss": 13.124244338677878, "perf/epoch_duration_sec": 1152.0965769039467, "perf/epoch_samples_per_sec": 42.999867366267054, "perf/epoch_tokens_per_sec": 32782.04514893617, "perf/epoch_samples": 49540.0, "perf/epoch_tokens": 37768082.0, "system/cuda_epoch_peak_memory_gb": 86.22137594223022, "eval/loss": 0.9410304912389854, "eval/duration_sec": 12.763936698902398, "privacy/epsilon": 2.9947529815620726}}
25
+ {"timestamp": 1773813729.4597642, "event": "audit_epoch", "step": 186, "epoch": 2, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.504792, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.51504, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0, "perf/audit_duration_sec": 7.0606719348579645}}
26
+ {"timestamp": 1773813742.732494, "event": "audit_final", "step": 186, "epoch": 2, "metrics": {"audit/delta": 1e-05, "audit/num_canaries": 500.0, "audit/num_members": 250.0, "audit/paper_guess_fraction": 0.2, "audit/paper_guess_steps": 20.0, "audit/loss/auc": 0.504792, "audit/loss/empirical_epsilon/0.05": 0.0, "audit/loss/empirical_epsilon/0.01": 0.0, "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0, "audit/embedding/auc": 0.51504, "audit/embedding/empirical_epsilon/0.05": 0.0, "audit/embedding/empirical_epsilon/0.01": 0.0, "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0, "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0, "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0}}
27
+ {"timestamp": 1773813743.2978687, "event": "energy_final", "step": 186, "epoch": null, "metrics": {"energy/codecarbon/duration": 2457.6702942838892, "energy/codecarbon/emissions": 0.09496692637153915, "energy/codecarbon/emissions_rate": 3.8641036021965845e-05, "energy/codecarbon/cpu_power": 72.03190690693239, "energy/codecarbon/gpu_power": 3303.243429373142, "energy/codecarbon/ram_power": 54.0, "energy/codecarbon/cpu_energy": 0.04736684855609284, "energy/codecarbon/gpu_energy": 2.25075239893377, "energy/codecarbon/ram_energy": 0.03550715308856684, "energy/codecarbon/energy_consumed": 2.3336264005784284, "energy/codecarbon/water_consumed": 0.0, "energy/codecarbon/cpu_count": 256.0, "energy/codecarbon/gpu_count": 8.0, "energy/codecarbon/longitude": 16.1885, "energy/codecarbon/latitude": 58.594, "energy/codecarbon/ram_total_size": 1511.49019241333, "energy/codecarbon/cpu_utilization_percent": 3.8756856324191915, "energy/codecarbon/gpu_utilization_percent": 94.07749795249795, "energy/codecarbon/ram_utilization_percent": 5.391281211624831, "energy/codecarbon/ram_used_gb": 81.57497969388083, "energy/codecarbon/pue": 1.0, "energy/codecarbon/wue": 0.0}}
qwen3-4b-instruct/dp3/pretrain_lm_head.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc44b7d60b8e2cf912e4233ff02bc57bb7e91f7a3ba6aa8ea10b7767ca29954a
3
+ size 779106920
qwen3-4b-instruct/dp3/resolved_config.yaml ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ name: Qwen/Qwen3-4B-Instruct-2507
3
+ tokenizer_name: Qwen/Qwen3-4B-Instruct-2507
4
+ max_length: 1024
5
+ dtype: bfloat16
6
+ trust_remote_code: true
7
+ use_fast_tokenizer: true
8
+ cache_dir: null
9
+ local_files_only: false
10
+ low_cpu_mem_usage: true
11
+ tie_word_embeddings: true
12
+ gradient_checkpointing: false
13
+ use_chat_template: false
14
+ dataset:
15
+ name: melihcatal/codedp-cpt
16
+ split: train
17
+ mode: cpt
18
+ text_column: text
19
+ validation_ratio: 0.05
20
+ max_samples: -1
21
+ lora:
22
+ enabled: true
23
+ r: 16
24
+ alpha: 32
25
+ dropout: 0.05
26
+ target_modules:
27
+ - q_proj
28
+ - k_proj
29
+ - v_proj
30
+ - o_proj
31
+ modules_to_save:
32
+ - lm_head
33
+ bias: none
34
+ training:
35
+ seed: 42
36
+ epochs: 2
37
+ warmup_steps: null
38
+ warmup_ratio: 0.05
39
+ mixed_precision: false
40
+ mixed_precision_dtype: bfloat16
41
+ batch_size: 8
42
+ eval_batch_size: 8
43
+ eval_every_steps: 50
44
+ eval_every_epochs: 1
45
+ learning_rate: 0.0002
46
+ optimizer: adamw
47
+ lr_scheduler: cosine
48
+ adam_beta1: 0.9
49
+ adam_beta2: 0.999
50
+ adam_epsilon: 1.0e-08
51
+ sgd_momentum: 0.9
52
+ weight_decay: 0.01
53
+ max_grad_norm: 1.0
54
+ log_every: 10
55
+ gradient_accumulation_steps: 8
56
+ num_workers: 4
57
+ output_dir: runs/cpt/qwen3-4b-instruct/dp3
58
+ distributed:
59
+ strategy: dpddp
60
+ backend: nccl
61
+ devices: null
62
+ dp:
63
+ module_validator: auto
64
+ target_delta: 1.0e-05
65
+ noise_multiplier: null
66
+ max_grad_norm: 1.0
67
+ grad_sample_mode: hooks
68
+ secure_mode: false
69
+ enabled: true
70
+ target_epsilon: 3.0
71
+ clipping: flat
72
+ audit:
73
+ enabled: true
74
+ run_every_epoch: true
75
+ epoch_device: cuda
76
+ q_canary: auto
77
+ num_canaries: 500
78
+ prefix_length: 49
79
+ num_digits: 12
80
+ batch_size: 32
81
+ delta: 1.0e-05
82
+ p_values:
83
+ - 0.05
84
+ - 0.01
85
+ paper_guess_fraction: 0.2
86
+ paper_guess_steps: 20
87
+ enable_holdout_empirical_epsilon: false
88
+ holdout_seed: 42
89
+ tie_seed: 42
90
+ tracking:
91
+ enabled: true
92
+ tensorboard: true
93
+ wandb: false
94
+ wandb_project: codedp-finetune-h200-audit
95
+ wandb_run_name: qwen3-4b-instruct-cpt-dp3
96
+ wandb_mode: online
97
+ codecarbon: true
98
+ codecarbon_output_file: codecarbon.csv
99
+ codecarbon_measure_power_secs: 15
100
+ codecarbon_country_iso_code: null
101
+ codecarbon_project_name: codedp-qwen3-4b-instruct-cpt-dp3
qwen3-4b-instruct/dp3/scalars.csv ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ timestamp,event,step,epoch,key,value
2
+ 1773811506.4695234,train_step,10,1,train/step_loss,1.616579128034187
3
+ 1773811506.4695234,train_step,10,1,train/step_real_loss,1.2012769132852554
4
+ 1773811506.4695234,train_step,10,1,train/lr,0.0002
5
+ 1773811506.4695234,train_step,10,1,train/step_canary_loss,14.90625
6
+ 1773811506.4695234,train_step,10,1,perf/step_duration_sec,12.063218676950783
7
+ 1773811506.4695234,train_step,10,1,perf/samples_per_sec,5.471176621054408
8
+ 1773811506.4695234,train_step,10,1,perf/tokens_per_sec,4265.196659189263
9
+ 1773811506.4695234,train_step,10,1,perf/logical_batch_size,66.0
10
+ 1773811506.4695234,train_step,10,1,perf/logical_token_count,51452.0
11
+ 1773811506.4695234,train_step,10,1,perf/physical_batches,9.0
12
+ 1773811506.4695234,train_step,10,1,privacy/epsilon,0.7638484077719963
13
+ 1773811506.4695234,train_step,10,1,system/cuda_memory_allocated_gb,12.68057107925415
14
+ 1773811506.4695234,train_step,10,1,system/cuda_max_memory_allocated_gb,86.2213044166565
15
+ 1773811628.0656273,train_step,20,1,train/step_loss,1.7544667159809786
16
+ 1773811628.0656273,train_step,20,1,train/step_real_loss,1.0926365107297897
17
+ 1773811628.0656273,train_step,20,1,train/lr,0.0001984111204336116
18
+ 1773811628.0656273,train_step,20,1,train/step_canary_loss,12.34375
19
+ 1773811628.0656273,train_step,20,1,perf/step_duration_sec,11.990010250825435
20
+ 1773811628.0656273,train_step,20,1,perf/samples_per_sec,5.671387978614834
21
+ 1773811628.0656273,train_step,20,1,perf/tokens_per_sec,4520.429830013588
22
+ 1773811628.0656273,train_step,20,1,perf/logical_batch_size,68.0
23
+ 1773811628.0656273,train_step,20,1,perf/logical_token_count,54200.0
24
+ 1773811628.0656273,train_step,20,1,perf/physical_batches,9.0
25
+ 1773811628.0656273,train_step,20,1,privacy/epsilon,1.016371889021752
26
+ 1773811628.0656273,train_step,20,1,system/cuda_memory_allocated_gb,13.261098384857178
27
+ 1773811628.0656273,train_step,20,1,system/cuda_max_memory_allocated_gb,86.2213044166565
28
+ 1773811750.6200142,train_step,30,1,train/step_loss,1.0279401504632197
29
+ 1773811750.6200142,train_step,30,1,train/step_real_loss,1.0600632801651955
30
+ 1773811750.6200142,train_step,30,1,train/lr,0.0001936949724999762
31
+ 1773811750.6200142,train_step,30,1,train/step_canary_loss,0.0
32
+ 1773811750.6200142,train_step,30,1,perf/step_duration_sec,12.954613069072366
33
+ 1773811750.6200142,train_step,30,1,perf/samples_per_sec,4.94032509182328
34
+ 1773811750.6200142,train_step,30,1,perf/tokens_per_sec,3957.7407466073655
35
+ 1773811750.6200142,train_step,30,1,perf/logical_batch_size,64.0
36
+ 1773811750.6200142,train_step,30,1,perf/logical_token_count,51271.0
37
+ 1773811750.6200142,train_step,30,1,perf/physical_batches,10.0
38
+ 1773811750.6200142,train_step,30,1,privacy/epsilon,1.2175571913066654
39
+ 1773811750.6200142,train_step,30,1,system/cuda_memory_allocated_gb,14.42089033126831
40
+ 1773811750.6200142,train_step,30,1,system/cuda_max_memory_allocated_gb,86.2213044166565
41
+ 1773811872.468319,train_step,40,1,train/step_loss,1.4674463272094727
42
+ 1773811872.468319,train_step,40,1,train/step_real_loss,1.0572493374347687
43
+ 1773811872.468319,train_step,40,1,train/lr,0.00018600142402077006
44
+ 1773811872.468319,train_step,40,1,train/step_canary_loss,14.59375
45
+ 1773811872.468319,train_step,40,1,perf/step_duration_sec,11.865780014079064
46
+ 1773811872.468319,train_step,40,1,perf/samples_per_sec,5.562213349791522
47
+ 1773811872.468319,train_step,40,1,perf/tokens_per_sec,4472.188085152074
48
+ 1773811872.468319,train_step,40,1,perf/logical_batch_size,66.0
49
+ 1773811872.468319,train_step,40,1,perf/logical_token_count,53066.0
50
+ 1773811872.468319,train_step,40,1,perf/physical_batches,9.0
51
+ 1773811872.468319,train_step,40,1,privacy/epsilon,1.3913063211429064
52
+ 1773811872.468319,train_step,40,1,system/cuda_memory_allocated_gb,12.680593967437744
53
+ 1773811872.468319,train_step,40,1,system/cuda_max_memory_allocated_gb,86.22135162353516
54
+ 1773811994.1354816,train_step,50,1,train/step_loss,1.257238564124474
55
+ 1773811994.1354816,train_step,50,1,train/step_real_loss,1.081570416688919
56
+ 1773811994.1354816,train_step,50,1,train/lr,0.00017557495743542585
57
+ 1773811994.1354816,train_step,50,1,train/step_canary_loss,12.5
58
+ 1773811994.1354816,train_step,50,1,perf/step_duration_sec,12.282207927200943
59
+ 1773811994.1354816,train_step,50,1,perf/samples_per_sec,5.292208077347962
60
+ 1773811994.1354816,train_step,50,1,perf/tokens_per_sec,4310.462769707012
61
+ 1773811994.1354816,train_step,50,1,perf/logical_batch_size,65.0
62
+ 1773811994.1354816,train_step,50,1,perf/logical_token_count,52942.0
63
+ 1773811994.1354816,train_step,50,1,perf/physical_batches,9.0
64
+ 1773811994.1354816,train_step,50,1,privacy/epsilon,1.5474611608458564
65
+ 1773811994.1354816,train_step,50,1,system/cuda_memory_allocated_gb,12.39033031463623
66
+ 1773811994.1354816,train_step,50,1,system/cuda_max_memory_allocated_gb,86.22135162353516
67
+ 1773812006.9120464,eval_step,50,1,eval/loss,0.9748443443423662
68
+ 1773812006.9120464,eval_step,50,1,eval/duration_sec,12.77427957393229
69
+ 1773812128.636093,train_step,60,1,train/step_loss,2.1636021409715926
70
+ 1773812128.636093,train_step,60,1,train/step_real_loss,1.041244499385357
71
+ 1773812128.636093,train_step,60,1,train/lr,0.0001627469007380852
72
+ 1773812128.636093,train_step,60,1,train/step_canary_loss,14.135416984558105
73
+ 1773812128.636093,train_step,60,1,perf/step_duration_sec,12.324101509992033
74
+ 1773812128.636093,train_step,60,1,perf/samples_per_sec,5.67992725013227
75
+ 1773812128.636093,train_step,60,1,perf/tokens_per_sec,4058.7137293088017
76
+ 1773812128.636093,train_step,60,1,perf/logical_batch_size,70.0
77
+ 1773812128.636093,train_step,60,1,perf/logical_token_count,50020.0
78
+ 1773812128.636093,train_step,60,1,perf/physical_batches,9.0
79
+ 1773812128.636093,train_step,60,1,privacy/epsilon,1.6892902398315597
80
+ 1773812128.636093,train_step,60,1,system/cuda_memory_allocated_gb,13.841647148132324
81
+ 1773812128.636093,train_step,60,1,system/cuda_max_memory_allocated_gb,86.22135162353516
82
+ 1773812251.0683832,train_step,70,1,train/step_loss,1.8121652182410746
83
+ 1773812251.0683832,train_step,70,1,train/step_real_loss,1.0885114818811417
84
+ 1773812251.0683832,train_step,70,1,train/lr,0.0001479248986720057
85
+ 1773812251.0683832,train_step,70,1,train/step_canary_loss,13.390625
86
+ 1773812251.0683832,train_step,70,1,perf/step_duration_sec,12.67668361403048
87
+ 1773812251.0683832,train_step,70,1,perf/samples_per_sec,5.364178997473597
88
+ 1773812251.0683832,train_step,70,1,perf/tokens_per_sec,3911.117569040308
89
+ 1773812251.0683832,train_step,70,1,perf/logical_batch_size,68.0
90
+ 1773812251.0683832,train_step,70,1,perf/logical_token_count,49580.0
91
+ 1773812251.0683832,train_step,70,1,perf/physical_batches,9.0
92
+ 1773812251.0683832,train_step,70,1,privacy/epsilon,1.8217089373472166
93
+ 1773812251.0683832,train_step,70,1,system/cuda_memory_allocated_gb,13.261121273040771
94
+ 1773812251.0683832,train_step,70,1,system/cuda_max_memory_allocated_gb,86.22135162353516
95
+ 1773812373.025167,train_step,80,1,train/step_loss,1.8635356987223906
96
+ 1773812373.025167,train_step,80,1,train/step_real_loss,1.08547542989254
97
+ 1773812373.025167,train_step,80,1,train/lr,0.0001315799587615025
98
+ 1773812373.025167,train_step,80,1,train/step_canary_loss,14.3125
99
+ 1773812373.025167,train_step,80,1,perf/step_duration_sec,12.623636846896261
100
+ 1773812373.025167,train_step,80,1,perf/samples_per_sec,5.386720231635859
101
+ 1773812373.025167,train_step,80,1,perf/tokens_per_sec,3980.3109523349326
102
+ 1773812373.025167,train_step,80,1,perf/logical_batch_size,68.0
103
+ 1773812373.025167,train_step,80,1,perf/logical_token_count,50246.0
104
+ 1773812373.025167,train_step,80,1,perf/physical_batches,9.0
105
+ 1773812373.025167,train_step,80,1,privacy/epsilon,1.946082329590544
106
+ 1773812373.025167,train_step,80,1,system/cuda_memory_allocated_gb,13.261121273040771
107
+ 1773812373.025167,train_step,80,1,system/cuda_max_memory_allocated_gb,86.22135162353516
108
+ 1773812495.4775374,train_step,90,1,train/step_loss,1.5933425817916642
109
+ 1773812495.4775374,train_step,90,1,train/step_real_loss,1.0586555153131485
110
+ 1773812495.4775374,train_step,90,1,train/lr,0.00011423148382732853
111
+ 1773812495.4775374,train_step,90,1,train/step_canary_loss,13.0
112
+ 1773812495.4775374,train_step,90,1,perf/step_duration_sec,11.945233571808785
113
+ 1773812495.4775374,train_step,90,1,perf/samples_per_sec,5.608931763219984
114
+ 1773812495.4775374,train_step,90,1,perf/tokens_per_sec,4434.907001318536
115
+ 1773812495.4775374,train_step,90,1,perf/logical_batch_size,67.0
116
+ 1773812495.4775374,train_step,90,1,perf/logical_token_count,52976.0
117
+ 1773812495.4775374,train_step,90,1,perf/physical_batches,9.0
118
+ 1773812495.4775374,train_step,90,1,privacy/epsilon,2.06391520460725
119
+ 1773812495.4775374,train_step,90,1,system/cuda_memory_allocated_gb,12.970857620239258
120
+ 1773812495.4775374,train_step,90,1,system/cuda_max_memory_allocated_gb,86.22135162353516
121
+ 1773812537.9189491,train_epoch,93,1,train/epoch_loss,1.6919119858601415
122
+ 1773812537.9189491,train_epoch,93,1,train/epoch_real_loss,1.0723268173838774
123
+ 1773812537.9189491,train_epoch,93,1,train/epoch_canary_loss,13.472450833857557
124
+ 1773812537.9189491,train_epoch,93,1,perf/epoch_duration_sec,1142.3914510426112
125
+ 1773812537.9189491,train_epoch,93,1,perf/epoch_samples_per_sec,43.508728951566745
126
+ 1773812537.9189491,train_epoch,93,1,perf/epoch_tokens_per_sec,33067.53649593877
127
+ 1773812537.9189491,train_epoch,93,1,perf/epoch_samples,49704.0
128
+ 1773812537.9189491,train_epoch,93,1,perf/epoch_tokens,37776071.0
129
+ 1773812537.9189491,train_epoch,93,1,system/cuda_epoch_peak_memory_gb,86.22135162353516
130
+ 1773812537.9189491,train_epoch,93,1,eval/loss,0.9517239639774346
131
+ 1773812537.9189491,train_epoch,93,1,eval/duration_sec,12.793823145795614
132
+ 1773812537.9189491,train_epoch,93,1,privacy/epsilon,2.0981420051109834
133
+ 1773812551.2877524,audit_epoch,93,1,audit/delta,1e-05
134
+ 1773812551.2877524,audit_epoch,93,1,audit/num_canaries,500.0
135
+ 1773812551.2877524,audit_epoch,93,1,audit/num_members,250.0
136
+ 1773812551.2877524,audit_epoch,93,1,audit/paper_guess_fraction,0.2
137
+ 1773812551.2877524,audit_epoch,93,1,audit/paper_guess_steps,20.0
138
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/auc,0.505664
139
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon/0.05,0.0
140
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon/0.01,0.0
141
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
142
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
143
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
144
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
145
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
146
+ 1773812551.2877524,audit_epoch,93,1,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
147
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/auc,0.51372
148
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon/0.05,0.0
149
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon/0.01,0.0
150
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
151
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
152
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
153
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
154
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
155
+ 1773812551.2877524,audit_epoch,93,1,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
156
+ 1773812551.2877524,audit_epoch,93,1,perf/audit_duration_sec,6.93540578102693
157
+ 1773812637.181298,train_step,100,2,train/step_loss,2.226203179695237
158
+ 1773812637.181298,train_step,100,2,train/step_real_loss,0.9423502832651138
159
+ 1773812637.181298,train_step,100,2,train/lr,9.643076661610196e-05
160
+ 1773812637.181298,train_step,100,2,train/step_canary_loss,13.964286804199219
161
+ 1773812637.181298,train_step,100,2,perf/step_duration_sec,12.210506019182503
162
+ 1773812637.181298,train_step,100,2,perf/samples_per_sec,5.814664837678322
163
+ 1773812637.181298,train_step,100,2,perf/tokens_per_sec,4380.080556528868
164
+ 1773812637.181298,train_step,100,2,perf/logical_batch_size,71.0
165
+ 1773812637.181298,train_step,100,2,perf/logical_token_count,53483.0
166
+ 1773812637.181298,train_step,100,2,perf/physical_batches,9.0
167
+ 1773812637.181298,train_step,100,2,privacy/epsilon,2.176204855129592
168
+ 1773812637.181298,train_step,100,2,system/cuda_memory_allocated_gb,14.131910800933838
169
+ 1773812637.181298,train_step,100,2,system/cuda_max_memory_allocated_gb,86.22135162353516
170
+ 1773812649.9187398,eval_step,100,2,eval/loss,0.9495426886356794
171
+ 1773812649.9187398,eval_step,100,2,eval/duration_sec,12.734526461921632
172
+ 1773812771.789377,train_step,110,2,train/step_loss,1.4468027822899097
173
+ 1773812771.789377,train_step,110,2,train/step_real_loss,1.0408434942364693
174
+ 1773812771.789377,train_step,110,2,train/lr,7.874347104470234e-05
175
+ 1773812771.789377,train_step,110,2,train/step_canary_loss,14.4375
176
+ 1773812771.789377,train_step,110,2,perf/step_duration_sec,12.185235306154937
177
+ 1773812771.789377,train_step,110,2,perf/samples_per_sec,5.416391094775368
178
+ 1773812771.789377,train_step,110,2,perf/tokens_per_sec,4356.419770834176
179
+ 1773812771.789377,train_step,110,2,perf/logical_batch_size,66.0
180
+ 1773812771.789377,train_step,110,2,perf/logical_token_count,53084.0
181
+ 1773812771.789377,train_step,110,2,perf/physical_batches,9.0
182
+ 1773812771.789377,train_step,110,2,privacy/epsilon,2.2837597517001855
183
+ 1773812771.789377,train_step,110,2,system/cuda_memory_allocated_gb,12.680593967437744
184
+ 1773812771.789377,train_step,110,2,system/cuda_max_memory_allocated_gb,86.22137594223022
185
+ 1773812893.198121,train_step,120,2,train/step_loss,1.616717566305132
186
+ 1773812893.198121,train_step,120,2,train/step_real_loss,1.0352746099233627
187
+ 1773812893.198121,train_step,120,2,train/lr,6.173165676349103e-05
188
+ 1773812893.198121,train_step,120,2,train/step_canary_loss,14.020833969116211
189
+ 1773812893.198121,train_step,120,2,perf/step_duration_sec,12.073515899013728
190
+ 1773812893.198121,train_step,120,2,perf/samples_per_sec,5.549336296105193
191
+ 1773812893.198121,train_step,120,2,perf/tokens_per_sec,4502.996513587329
192
+ 1773812893.198121,train_step,120,2,perf/logical_batch_size,67.0
193
+ 1773812893.198121,train_step,120,2,perf/logical_token_count,54367.0
194
+ 1773812893.198121,train_step,120,2,perf/physical_batches,9.0
195
+ 1773812893.198121,train_step,120,2,privacy/epsilon,2.387262501265619
196
+ 1773812893.198121,train_step,120,2,system/cuda_memory_allocated_gb,12.970857620239258
197
+ 1773812893.198121,train_step,120,2,system/cuda_max_memory_allocated_gb,86.22137594223022
198
+ 1773813014.0904934,train_step,130,2,train/step_loss,1.9267044274703315
199
+ 1773813014.0904934,train_step,130,2,train/step_real_loss,0.9727360382676125
200
+ 1773813014.0904934,train_step,130,2,train/lr,4.593591825444028e-05
201
+ 1773813014.0904934,train_step,130,2,train/step_canary_loss,14.137499809265137
202
+ 1773813014.0904934,train_step,130,2,perf/step_duration_sec,11.957372829318047
203
+ 1773813014.0904934,train_step,130,2,perf/samples_per_sec,5.770498334786406
204
+ 1773813014.0904934,train_step,130,2,perf/tokens_per_sec,4662.479024096767
205
+ 1773813014.0904934,train_step,130,2,perf/logical_batch_size,69.0
206
+ 1773813014.0904934,train_step,130,2,perf/logical_token_count,55751.0
207
+ 1773813014.0904934,train_step,130,2,perf/physical_batches,9.0
208
+ 1773813014.0904934,train_step,130,2,privacy/epsilon,2.4870228609511393
209
+ 1773813014.0904934,train_step,130,2,system/cuda_memory_allocated_gb,13.55138349533081
210
+ 1773813014.0904934,train_step,130,2,system/cuda_max_memory_allocated_gb,86.22137594223022
211
+ 1773813135.2851143,train_step,140,2,train/step_loss,1.6469186313116728
212
+ 1773813135.2851143,train_step,140,2,train/step_real_loss,1.0473601296544075
213
+ 1773813135.2851143,train_step,140,2,train/lr,3.185820604061088e-05
214
+ 1773813135.2851143,train_step,140,2,train/step_canary_loss,14.4375
215
+ 1773813135.2851143,train_step,140,2,perf/step_duration_sec,12.27732455311343
216
+ 1773813135.2851143,train_step,140,2,perf/samples_per_sec,5.4572150235296455
217
+ 1773813135.2851143,train_step,140,2,perf/tokens_per_sec,4390.940368708353
218
+ 1773813135.2851143,train_step,140,2,perf/logical_batch_size,67.0
219
+ 1773813135.2851143,train_step,140,2,perf/logical_token_count,53909.0
220
+ 1773813135.2851143,train_step,140,2,perf/physical_batches,9.0
221
+ 1773813135.2851143,train_step,140,2,privacy/epsilon,2.5836117621736387
222
+ 1773813135.2851143,train_step,140,2,system/cuda_memory_allocated_gb,12.970857620239258
223
+ 1773813135.2851143,train_step,140,2,system/cuda_max_memory_allocated_gb,86.22137594223022
224
+ 1773813257.0352345,train_step,150,2,train/step_loss,2.0192771579908286
225
+ 1773813257.0352345,train_step,150,2,train/step_real_loss,1.04129096865654
226
+ 1773813257.0352345,train_step,150,2,train/lr,1.994587590756397e-05
227
+ 1773813257.0352345,train_step,150,2,train/step_canary_loss,14.537500381469727
228
+ 1773813257.0352345,train_step,150,2,perf/step_duration_sec,11.960256013087928
229
+ 1773813257.0352345,train_step,150,2,perf/samples_per_sec,5.769107277009316
230
+ 1773813257.0352345,train_step,150,2,perf/tokens_per_sec,4232.434485065052
231
+ 1773813257.0352345,train_step,150,2,perf/logical_batch_size,69.0
232
+ 1773813257.0352345,train_step,150,2,perf/logical_token_count,50621.0
233
+ 1773813257.0352345,train_step,150,2,perf/physical_batches,9.0
234
+ 1773813257.0352345,train_step,150,2,privacy/epsilon,2.6773301521119843
235
+ 1773813257.0352345,train_step,150,2,system/cuda_memory_allocated_gb,13.55138349533081
236
+ 1773813257.0352345,train_step,150,2,system/cuda_max_memory_allocated_gb,86.22137594223022
237
+ 1773813269.8009667,eval_step,150,2,eval/loss,0.9415916662949781
238
+ 1773813269.8009667,eval_step,150,2,eval/duration_sec,12.763515894301236
239
+ 1773813391.1203394,train_step,160,2,train/step_loss,1.4993842009342078
240
+ 1773813391.1203394,train_step,160,2,train/step_real_loss,1.1243649572134018
241
+ 1773813391.1203394,train_step,160,2,train/lr,1.057747301402887e-05
242
+ 1773813391.1203394,train_step,160,2,train/step_canary_loss,13.5
243
+ 1773813391.1203394,train_step,160,2,perf/step_duration_sec,12.41904565365985
244
+ 1773813391.1203394,train_step,160,2,perf/samples_per_sec,5.314418018952207
245
+ 1773813391.1203394,train_step,160,2,perf/tokens_per_sec,3556.311912500654
246
+ 1773813391.1203394,train_step,160,2,perf/logical_batch_size,66.0
247
+ 1773813391.1203394,train_step,160,2,perf/logical_token_count,44166.0
248
+ 1773813391.1203394,train_step,160,2,perf/physical_batches,9.0
249
+ 1773813391.1203394,train_step,160,2,privacy/epsilon,2.768504028383432
250
+ 1773813391.1203394,train_step,160,2,system/cuda_memory_allocated_gb,12.680593967437744
251
+ 1773813391.1203394,train_step,160,2,system/cuda_max_memory_allocated_gb,86.22137594223022
252
+ 1773813513.5407782,train_step,170,2,train/step_loss,1.6632740604343699
253
+ 1773813513.5407782,train_step,170,2,train/step_real_loss,1.0752243772149086
254
+ 1773813513.5407782,train_step,170,2,train/lr,4.050702638550275e-06
255
+ 1773813513.5407782,train_step,170,2,train/step_canary_loss,14.208333969116211
256
+ 1773813513.5407782,train_step,170,2,perf/step_duration_sec,12.597235643770546
257
+ 1773813513.5407782,train_step,170,2,perf/samples_per_sec,5.318627188904904
258
+ 1773813513.5407782,train_step,170,2,perf/tokens_per_sec,3936.101649771056
259
+ 1773813513.5407782,train_step,170,2,perf/logical_batch_size,67.0
260
+ 1773813513.5407782,train_step,170,2,perf/logical_token_count,49584.0
261
+ 1773813513.5407782,train_step,170,2,perf/physical_batches,9.0
262
+ 1773813513.5407782,train_step,170,2,privacy/epsilon,2.8571453531213207
263
+ 1773813513.5407782,train_step,170,2,system/cuda_memory_allocated_gb,12.970857620239258
264
+ 1773813513.5407782,train_step,170,2,system/cuda_max_memory_allocated_gb,86.22137594223022
265
+ 1773813635.9173324,train_step,180,2,train/step_loss,1.7563851160161636
266
+ 1773813635.9173324,train_step,180,2,train/step_real_loss,1.0712373107671738
267
+ 1773813635.9173324,train_step,180,2,train/lr,5.729698228102653e-07
268
+ 1773813635.9173324,train_step,180,2,train/step_canary_loss,12.71875
269
+ 1773813635.9173324,train_step,180,2,perf/step_duration_sec,12.466372530907393
270
+ 1773813635.9173324,train_step,180,2,perf/samples_per_sec,5.454674150913608
271
+ 1773813635.9173324,train_step,180,2,perf/tokens_per_sec,4245.501236930199
272
+ 1773813635.9173324,train_step,180,2,perf/logical_batch_size,68.0
273
+ 1773813635.9173324,train_step,180,2,perf/logical_token_count,52926.0
274
+ 1773813635.9173324,train_step,180,2,perf/physical_batches,9.0
275
+ 1773813635.9173324,train_step,180,2,privacy/epsilon,2.9437684274367397
276
+ 1773813635.9173324,train_step,180,2,system/cuda_memory_allocated_gb,13.261121273040771
277
+ 1773813635.9173324,train_step,180,2,system/cuda_max_memory_allocated_gb,86.22137594223022
278
+ 1773813716.1885753,train_epoch,186,2,train/epoch_loss,1.6068291228622247
279
+ 1773813716.1885753,train_epoch,186,2,train/epoch_real_loss,1.0283400972633716
280
+ 1773813716.1885753,train_epoch,186,2,train/epoch_canary_loss,13.124244338677878
281
+ 1773813716.1885753,train_epoch,186,2,perf/epoch_duration_sec,1152.0965769039467
282
+ 1773813716.1885753,train_epoch,186,2,perf/epoch_samples_per_sec,42.999867366267054
283
+ 1773813716.1885753,train_epoch,186,2,perf/epoch_tokens_per_sec,32782.04514893617
284
+ 1773813716.1885753,train_epoch,186,2,perf/epoch_samples,49540.0
285
+ 1773813716.1885753,train_epoch,186,2,perf/epoch_tokens,37768082.0
286
+ 1773813716.1885753,train_epoch,186,2,system/cuda_epoch_peak_memory_gb,86.22137594223022
287
+ 1773813716.1885753,train_epoch,186,2,eval/loss,0.9410304912389854
288
+ 1773813716.1885753,train_epoch,186,2,eval/duration_sec,12.763936698902398
289
+ 1773813716.1885753,train_epoch,186,2,privacy/epsilon,2.9947529815620726
290
+ 1773813729.4597642,audit_epoch,186,2,audit/delta,1e-05
291
+ 1773813729.4597642,audit_epoch,186,2,audit/num_canaries,500.0
292
+ 1773813729.4597642,audit_epoch,186,2,audit/num_members,250.0
293
+ 1773813729.4597642,audit_epoch,186,2,audit/paper_guess_fraction,0.2
294
+ 1773813729.4597642,audit_epoch,186,2,audit/paper_guess_steps,20.0
295
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/auc,0.504792
296
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon/0.05,0.0
297
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon/0.01,0.0
298
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
299
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
300
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
301
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
302
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
303
+ 1773813729.4597642,audit_epoch,186,2,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
304
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/auc,0.51504
305
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon/0.05,0.0
306
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon/0.01,0.0
307
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
308
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
309
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
310
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
311
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
312
+ 1773813729.4597642,audit_epoch,186,2,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
313
+ 1773813729.4597642,audit_epoch,186,2,perf/audit_duration_sec,7.0606719348579645
314
+ 1773813742.732494,audit_final,186,2,audit/delta,1e-05
315
+ 1773813742.732494,audit_final,186,2,audit/num_canaries,500.0
316
+ 1773813742.732494,audit_final,186,2,audit/num_members,250.0
317
+ 1773813742.732494,audit_final,186,2,audit/paper_guess_fraction,0.2
318
+ 1773813742.732494,audit_final,186,2,audit/paper_guess_steps,20.0
319
+ 1773813742.732494,audit_final,186,2,audit/loss/auc,0.504792
320
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon/0.05,0.0
321
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon/0.01,0.0
322
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/epsilon,0.0
323
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/num_guesses,0.0
324
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.05/correct_guesses,0.0
325
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/epsilon,0.0
326
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/num_guesses,0.0
327
+ 1773813742.732494,audit_final,186,2,audit/loss/empirical_epsilon_details/0.01/correct_guesses,0.0
328
+ 1773813742.732494,audit_final,186,2,audit/embedding/auc,0.51504
329
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon/0.05,0.0
330
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon/0.01,0.0
331
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/epsilon,0.0
332
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/num_guesses,0.0
333
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.05/correct_guesses,0.0
334
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/epsilon,0.0
335
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/num_guesses,0.0
336
+ 1773813742.732494,audit_final,186,2,audit/embedding/empirical_epsilon_details/0.01/correct_guesses,0.0
337
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/duration,2457.6702942838892
338
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/emissions,0.09496692637153915
339
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/emissions_rate,3.8641036021965845e-05
340
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/cpu_power,72.03190690693239
341
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/gpu_power,3303.243429373142
342
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/ram_power,54.0
343
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/cpu_energy,0.04736684855609284
344
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/gpu_energy,2.25075239893377
345
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/ram_energy,0.03550715308856684
346
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/energy_consumed,2.3336264005784284
347
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/water_consumed,0.0
348
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/cpu_count,256.0
349
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/gpu_count,8.0
350
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/longitude,16.1885
351
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/latitude,58.594
352
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/ram_total_size,1511.49019241333
353
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/cpu_utilization_percent,3.8756856324191915
354
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/gpu_utilization_percent,94.07749795249795
355
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/ram_utilization_percent,5.391281211624831
356
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/ram_used_gb,81.57497969388083
357
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/pue,1.0
358
+ 1773813743.2978687,energy_final,186,,energy/codecarbon/wue,0.0
qwen3-4b-instruct/dp3/summary.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audit/delta": 1e-05,
3
+ "audit/embedding/auc": 0.51504,
4
+ "audit/embedding/empirical_epsilon/0.01": 0.0,
5
+ "audit/embedding/empirical_epsilon/0.05": 0.0,
6
+ "audit/embedding/empirical_epsilon_details/0.01/correct_guesses": 0.0,
7
+ "audit/embedding/empirical_epsilon_details/0.01/epsilon": 0.0,
8
+ "audit/embedding/empirical_epsilon_details/0.01/num_guesses": 0.0,
9
+ "audit/embedding/empirical_epsilon_details/0.05/correct_guesses": 0.0,
10
+ "audit/embedding/empirical_epsilon_details/0.05/epsilon": 0.0,
11
+ "audit/embedding/empirical_epsilon_details/0.05/num_guesses": 0.0,
12
+ "audit/loss/auc": 0.504792,
13
+ "audit/loss/empirical_epsilon/0.01": 0.0,
14
+ "audit/loss/empirical_epsilon/0.05": 0.0,
15
+ "audit/loss/empirical_epsilon_details/0.01/correct_guesses": 0.0,
16
+ "audit/loss/empirical_epsilon_details/0.01/epsilon": 0.0,
17
+ "audit/loss/empirical_epsilon_details/0.01/num_guesses": 0.0,
18
+ "audit/loss/empirical_epsilon_details/0.05/correct_guesses": 0.0,
19
+ "audit/loss/empirical_epsilon_details/0.05/epsilon": 0.0,
20
+ "audit/loss/empirical_epsilon_details/0.05/num_guesses": 0.0,
21
+ "audit/num_canaries": 500.0,
22
+ "audit/num_members": 250.0,
23
+ "audit/paper_guess_fraction": 0.2,
24
+ "audit/paper_guess_steps": 20.0,
25
+ "energy/codecarbon/cpu_count": 256.0,
26
+ "energy/codecarbon/cpu_energy": 0.04736684855609284,
27
+ "energy/codecarbon/cpu_power": 72.03190690693239,
28
+ "energy/codecarbon/cpu_utilization_percent": 3.8756856324191915,
29
+ "energy/codecarbon/duration": 2457.6702942838892,
30
+ "energy/codecarbon/emissions": 0.09496692637153915,
31
+ "energy/codecarbon/emissions_rate": 3.8641036021965845e-05,
32
+ "energy/codecarbon/energy_consumed": 2.3336264005784284,
33
+ "energy/codecarbon/gpu_count": 8.0,
34
+ "energy/codecarbon/gpu_energy": 2.25075239893377,
35
+ "energy/codecarbon/gpu_power": 3303.243429373142,
36
+ "energy/codecarbon/gpu_utilization_percent": 94.07749795249795,
37
+ "energy/codecarbon/latitude": 58.594,
38
+ "energy/codecarbon/longitude": 16.1885,
39
+ "energy/codecarbon/pue": 1.0,
40
+ "energy/codecarbon/ram_energy": 0.03550715308856684,
41
+ "energy/codecarbon/ram_power": 54.0,
42
+ "energy/codecarbon/ram_total_size": 1511.49019241333,
43
+ "energy/codecarbon/ram_used_gb": 81.57497969388083,
44
+ "energy/codecarbon/ram_utilization_percent": 5.391281211624831,
45
+ "energy/codecarbon/water_consumed": 0.0,
46
+ "energy/codecarbon/wue": 0.0,
47
+ "eval/duration_sec": 12.763936698902398,
48
+ "eval/loss": 0.9410304912389854,
49
+ "perf/audit_duration_sec": 7.0606719348579645,
50
+ "perf/epoch_duration_sec": 1152.0965769039467,
51
+ "perf/epoch_samples": 49540.0,
52
+ "perf/epoch_samples_per_sec": 42.999867366267054,
53
+ "perf/epoch_tokens": 37768082.0,
54
+ "perf/epoch_tokens_per_sec": 32782.04514893617,
55
+ "perf/logical_batch_size": 68.0,
56
+ "perf/logical_token_count": 52926.0,
57
+ "perf/physical_batches": 9.0,
58
+ "perf/samples_per_sec": 5.454674150913608,
59
+ "perf/step_duration_sec": 12.466372530907393,
60
+ "perf/tokens_per_sec": 4245.501236930199,
61
+ "privacy/epsilon": 2.9947529815620726,
62
+ "system/cuda_epoch_peak_memory_gb": 86.22137594223022,
63
+ "system/cuda_max_memory_allocated_gb": 86.22137594223022,
64
+ "system/cuda_memory_allocated_gb": 13.261121273040771,
65
+ "train/epoch_canary_loss": 13.124244338677878,
66
+ "train/epoch_loss": 1.6068291228622247,
67
+ "train/epoch_real_loss": 1.0283400972633716,
68
+ "train/lr": 5.729698228102653e-07,
69
+ "train/step_canary_loss": 12.71875,
70
+ "train/step_loss": 1.7563851160161636,
71
+ "train/step_real_loss": 1.0712373107671738
72
+ }
qwen3-4b-instruct/dp3/tensorboard/events.out.tfevents.1773811283.7b654b6988b0.3379.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf62293c09b26e3b14cebd16fd5677f0089fba642744c9d5e6c8b124fd776594
3
+ size 25026
qwen3-4b-instruct/dp3/tokenizer/chat_template.jinja ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- for message in messages %}
18
+ {%- if message.content is string %}
19
+ {%- set content = message.content %}
20
+ {%- else %}
21
+ {%- set content = '' %}
22
+ {%- endif %}
23
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
+ {%- elif message.role == "assistant" %}
26
+ {{- '<|im_start|>' + message.role + '\n' + content }}
27
+ {%- if message.tool_calls %}
28
+ {%- for tool_call in message.tool_calls %}
29
+ {%- if (loop.first and content) or (not loop.first) %}
30
+ {{- '\n' }}
31
+ {%- endif %}
32
+ {%- if tool_call.function %}
33
+ {%- set tool_call = tool_call.function %}
34
+ {%- endif %}
35
+ {{- '<tool_call>\n{"name": "' }}
36
+ {{- tool_call.name }}
37
+ {{- '", "arguments": ' }}
38
+ {%- if tool_call.arguments is string %}
39
+ {{- tool_call.arguments }}
40
+ {%- else %}
41
+ {{- tool_call.arguments | tojson }}
42
+ {%- endif %}
43
+ {{- '}\n</tool_call>' }}
44
+ {%- endfor %}
45
+ {%- endif %}
46
+ {{- '<|im_end|>\n' }}
47
+ {%- elif message.role == "tool" %}
48
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
+ {{- '<|im_start|>user' }}
50
+ {%- endif %}
51
+ {{- '\n<tool_response>\n' }}
52
+ {{- content }}
53
+ {{- '\n</tool_response>' }}
54
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
+ {{- '<|im_end|>\n' }}
56
+ {%- endif %}
57
+ {%- endif %}
58
+ {%- endfor %}
59
+ {%- if add_generation_prompt %}
60
+ {{- '<|im_start|>assistant\n' }}
61
+ {%- endif %}
qwen3-4b-instruct/dp3/tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e9c8aef460c70c1e1c32afe895f455856c0075e5706f06e6d80b2f581137715
3
+ size 11517150
qwen3-4b-instruct/dp3/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,516 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "865331112869",
10
+ "569765693871",
11
+ "485177821815",
12
+ "135441121756",
13
+ "367459894796",
14
+ "877482678543",
15
+ "457919547633",
16
+ "765474393376",
17
+ "114848338811",
18
+ "746285987371",
19
+ "649291669397",
20
+ "927914615679",
21
+ "445925149649",
22
+ "691587454538",
23
+ "143777992227",
24
+ "997981281989",
25
+ "425949483533",
26
+ "982993456429",
27
+ "718726519731",
28
+ "172599315861",
29
+ "643489267333",
30
+ "282322838685",
31
+ "781653545886",
32
+ "796415361892",
33
+ "841991688488",
34
+ "211411365397",
35
+ "698218415444",
36
+ "355977139358",
37
+ "682564697312",
38
+ "383837596997",
39
+ "689362171782",
40
+ "749966767285",
41
+ "753159165157",
42
+ "795693824762",
43
+ "669689115557",
44
+ "327491773134",
45
+ "983569279932",
46
+ "612128769512",
47
+ "374327157578",
48
+ "311632789559",
49
+ "523918658846",
50
+ "765981581453",
51
+ "794825141891",
52
+ "873898736873",
53
+ "447445629421",
54
+ "473822473819",
55
+ "181439694557",
56
+ "592538279337",
57
+ "668134915514",
58
+ "643692393748",
59
+ "696651276628",
60
+ "853859348234",
61
+ "778466723723",
62
+ "929826356991",
63
+ "272362973463",
64
+ "694235616268",
65
+ "281673864127",
66
+ "479676316326",
67
+ "646979124677",
68
+ "922327493433",
69
+ "883685933161",
70
+ "264259917554",
71
+ "836746273134",
72
+ "658481324922",
73
+ "481884157827",
74
+ "587787496812",
75
+ "579184949249",
76
+ "912193598348",
77
+ "529679678956",
78
+ "795838284624",
79
+ "159337222655",
80
+ "173781362446",
81
+ "773687856563",
82
+ "535787224917",
83
+ "351885857332",
84
+ "578827344666",
85
+ "198462689911",
86
+ "722618266242",
87
+ "952872416512",
88
+ "517778845323",
89
+ "749665846687",
90
+ "661436365453",
91
+ "259666844669",
92
+ "242851284913",
93
+ "514532995959",
94
+ "161588262349",
95
+ "742765629356",
96
+ "225164373623",
97
+ "676539973863",
98
+ "826214551218",
99
+ "182345464792",
100
+ "232776999554",
101
+ "337326533813",
102
+ "676676697292",
103
+ "929185622831",
104
+ "545512344383",
105
+ "499444466686",
106
+ "314697386682",
107
+ "517379856925",
108
+ "379557332953",
109
+ "614797267726",
110
+ "429781429464",
111
+ "922466849763",
112
+ "721737645236",
113
+ "479227349997",
114
+ "136931728327",
115
+ "259533577263",
116
+ "488538864842",
117
+ "937495658852",
118
+ "489991411364",
119
+ "499148455254",
120
+ "441373944925",
121
+ "899151413682",
122
+ "467893531755",
123
+ "527117488925",
124
+ "928335588653",
125
+ "374439448821",
126
+ "879425227932",
127
+ "867678158885",
128
+ "399749397872",
129
+ "129693547287",
130
+ "689285841825",
131
+ "771619544974",
132
+ "724883568652",
133
+ "516968424863",
134
+ "733737988257",
135
+ "852347289392",
136
+ "296953381169",
137
+ "377273562477",
138
+ "262296912232",
139
+ "547149832394",
140
+ "298464134954",
141
+ "216667245274",
142
+ "843998562287",
143
+ "572154333646",
144
+ "124589118494",
145
+ "841824384614",
146
+ "232896526252",
147
+ "295448593321",
148
+ "123741461297",
149
+ "653573457168",
150
+ "196735786156",
151
+ "377338713663",
152
+ "964342468552",
153
+ "586855179568",
154
+ "484773717614",
155
+ "894885246797",
156
+ "677896358599",
157
+ "848845611563",
158
+ "851852651677",
159
+ "398549545767",
160
+ "454244839926",
161
+ "799364566435",
162
+ "967114116556",
163
+ "817378986438",
164
+ "233795848681",
165
+ "824387273757",
166
+ "916198946615",
167
+ "563117729724",
168
+ "951794811935",
169
+ "374598961236",
170
+ "922867396683",
171
+ "765737843639",
172
+ "175469284871",
173
+ "231853711778",
174
+ "662426712668",
175
+ "711412347158",
176
+ "753466987363",
177
+ "513361312532",
178
+ "712992815957",
179
+ "971621888444",
180
+ "829235161526",
181
+ "585544633356",
182
+ "582471228164",
183
+ "678666359123",
184
+ "557533689478",
185
+ "632962475133",
186
+ "484489193824",
187
+ "489562189822",
188
+ "589547936288",
189
+ "363214487524",
190
+ "244885399387",
191
+ "431751228368",
192
+ "433581868192",
193
+ "486391569221",
194
+ "185438575221",
195
+ "126574388585",
196
+ "741757479784",
197
+ "529854679937",
198
+ "996116119839",
199
+ "616248973917",
200
+ "763531783491",
201
+ "955456118295",
202
+ "364196983365",
203
+ "195792996468",
204
+ "151859598873",
205
+ "399223169721",
206
+ "938488813964",
207
+ "961981959227",
208
+ "183368827562",
209
+ "533417736566",
210
+ "786391632558",
211
+ "665661658354",
212
+ "693281533643",
213
+ "475794684356",
214
+ "652154162978",
215
+ "753233719644",
216
+ "668514843129",
217
+ "819162623892",
218
+ "941169431859",
219
+ "877385381798",
220
+ "752644929761",
221
+ "881136466196",
222
+ "275597777299",
223
+ "731681792655",
224
+ "961133895172",
225
+ "864718285734",
226
+ "963852916563",
227
+ "319584985416",
228
+ "563365646341",
229
+ "811371928234",
230
+ "837131396371",
231
+ "267514771964",
232
+ "944513428457",
233
+ "117298239631",
234
+ "158142752582",
235
+ "252867443568",
236
+ "839269684865",
237
+ "612788593128",
238
+ "145669731981",
239
+ "121557291859",
240
+ "245416776926",
241
+ "799417897197",
242
+ "997958836435",
243
+ "892336777248",
244
+ "158929292238",
245
+ "581976444672",
246
+ "897784492783",
247
+ "492373714791",
248
+ "512659818733",
249
+ "881112998642",
250
+ "619454958782",
251
+ "431149748713",
252
+ "624221476921",
253
+ "125866399464",
254
+ "339882449689",
255
+ "186198784585",
256
+ "943193294691",
257
+ "955668961269",
258
+ "232787996724",
259
+ "215671314196",
260
+ "286173241916",
261
+ "745977673725",
262
+ "556976448182",
263
+ "599961512792",
264
+ "766294538337",
265
+ "934912591213",
266
+ "295118729589",
267
+ "529455466433",
268
+ "196119929397",
269
+ "379571934299",
270
+ "251789649997",
271
+ "564544131355",
272
+ "244371196654",
273
+ "384598329253",
274
+ "887753195844",
275
+ "364947325679",
276
+ "655517954651",
277
+ "673948786567",
278
+ "857231548835",
279
+ "816115936673",
280
+ "644234165531",
281
+ "182782912224",
282
+ "234316622259",
283
+ "421369185549",
284
+ "434632855397",
285
+ "921889371893",
286
+ "415956914763",
287
+ "598916996413",
288
+ "773671349113",
289
+ "952465217972",
290
+ "117657531962",
291
+ "729825168745",
292
+ "691315125346",
293
+ "768461952319",
294
+ "664847713559",
295
+ "953267689786",
296
+ "886464195129",
297
+ "824488329416",
298
+ "837873762491",
299
+ "532833541879",
300
+ "669183782449",
301
+ "941976537588",
302
+ "739394546916",
303
+ "267954879268",
304
+ "637551427887",
305
+ "217756494954",
306
+ "524444658383",
307
+ "117783274348",
308
+ "138218735276",
309
+ "814611949491",
310
+ "711641973413",
311
+ "499156317423",
312
+ "515856611931",
313
+ "454164859837",
314
+ "345271433112",
315
+ "462294118988",
316
+ "511785788222",
317
+ "497294727353",
318
+ "866519986723",
319
+ "334513529294",
320
+ "549946382131",
321
+ "284445431422",
322
+ "396521188476",
323
+ "421435255895",
324
+ "133373659361",
325
+ "322683334381",
326
+ "228358422847",
327
+ "291762694874",
328
+ "143182978129",
329
+ "511923256573",
330
+ "327158398268",
331
+ "879764613759",
332
+ "564395222747",
333
+ "451161679736",
334
+ "538631466654",
335
+ "221762325616",
336
+ "218391991184",
337
+ "322589379462",
338
+ "876537814263",
339
+ "152676556624",
340
+ "332522971941",
341
+ "884354318946",
342
+ "513349618943",
343
+ "116639746413",
344
+ "635185846287",
345
+ "993832498489",
346
+ "813981174797",
347
+ "438745114173",
348
+ "983493951323",
349
+ "724492262421",
350
+ "622553389126",
351
+ "889965243135",
352
+ "364492359246",
353
+ "154962668224",
354
+ "179564995814",
355
+ "418412875665",
356
+ "718951851413",
357
+ "699446724178",
358
+ "624266421831",
359
+ "815458725125",
360
+ "455423278865",
361
+ "393741199486",
362
+ "328552864359",
363
+ "211662639865",
364
+ "218784516525",
365
+ "762486672996",
366
+ "142799718159",
367
+ "858146415154",
368
+ "767858144912",
369
+ "571317457151",
370
+ "635127952696",
371
+ "116427191984",
372
+ "268921994538",
373
+ "523937669294",
374
+ "165429152138",
375
+ "739246183345",
376
+ "591464355756",
377
+ "212985874612",
378
+ "191887635211",
379
+ "967214577653",
380
+ "119342152414",
381
+ "946444632795",
382
+ "618423867817",
383
+ "228565148417",
384
+ "729116422489",
385
+ "527874729936",
386
+ "739784153482",
387
+ "387763951128",
388
+ "331369926711",
389
+ "562716493614",
390
+ "739667844957",
391
+ "562389434565",
392
+ "256497188281",
393
+ "859927364588",
394
+ "417668946583",
395
+ "357621613582",
396
+ "438435178228",
397
+ "485692541169",
398
+ "825815739116",
399
+ "342221452223",
400
+ "697747991249",
401
+ "716763689965",
402
+ "141499982867",
403
+ "818479319499",
404
+ "336813343298",
405
+ "594688742928",
406
+ "472129283475",
407
+ "514354144759",
408
+ "349249721685",
409
+ "546276298359",
410
+ "353755529131",
411
+ "315534574435",
412
+ "523723475786",
413
+ "215826764872",
414
+ "367968398551",
415
+ "569853653352",
416
+ "389715484387",
417
+ "293847485454",
418
+ "714738141818",
419
+ "178478368922",
420
+ "581493616981",
421
+ "589439538674",
422
+ "846657726193",
423
+ "722339992679",
424
+ "138154781148",
425
+ "757785319772",
426
+ "492516914298",
427
+ "919181521716",
428
+ "985781138935",
429
+ "476969195485",
430
+ "313145133463",
431
+ "758963111966",
432
+ "147541537162",
433
+ "557163366873",
434
+ "144373897488",
435
+ "522515164754",
436
+ "724964923582",
437
+ "284776712475",
438
+ "375429755114",
439
+ "181233596124",
440
+ "948585673431",
441
+ "243165586174",
442
+ "396847976144",
443
+ "997724962668",
444
+ "558837194455",
445
+ "163165456396",
446
+ "378749551722",
447
+ "161238482259",
448
+ "754978243758",
449
+ "195388849133",
450
+ "229775525672",
451
+ "262437452884",
452
+ "441377892146",
453
+ "451885565366",
454
+ "981277526855",
455
+ "762495822823",
456
+ "368763327262",
457
+ "757422791351",
458
+ "636324136426",
459
+ "214193645583",
460
+ "412843856172",
461
+ "179386156569",
462
+ "756916173536",
463
+ "892697125149",
464
+ "625334487352",
465
+ "941861857715",
466
+ "887417525236",
467
+ "649516938598",
468
+ "717628619782",
469
+ "438124184139",
470
+ "547563892268",
471
+ "856317483891",
472
+ "313313831273",
473
+ "371496153876",
474
+ "587541149322",
475
+ "265847332563",
476
+ "449549215429",
477
+ "163497196769",
478
+ "861342291298",
479
+ "268433315926",
480
+ "774679513717",
481
+ "851254219729",
482
+ "583527834464",
483
+ "488496781997",
484
+ "556814553861",
485
+ "482829231639",
486
+ "618878266619",
487
+ "147444452794",
488
+ "949235426629",
489
+ "357299947518",
490
+ "175528632226",
491
+ "645527857972",
492
+ "186872457894",
493
+ "552738847828",
494
+ "626748382482",
495
+ "921894985642",
496
+ "943878645871",
497
+ "859289776479",
498
+ "614583493135",
499
+ "933775286797",
500
+ "332234613346",
501
+ "325196781219",
502
+ "142526557681",
503
+ "356722692178",
504
+ "449318681694",
505
+ "687284547244",
506
+ "947262995132",
507
+ "893974619684",
508
+ "797238311233"
509
+ ],
510
+ "is_local": false,
511
+ "model_max_length": 1010000,
512
+ "pad_token": "<|endoftext|>",
513
+ "split_special_tokens": false,
514
+ "tokenizer_class": "Qwen2Tokenizer",
515
+ "unk_token": null
516
+ }
qwen3-4b-instruct/dp3/train.log ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-03-18 05:25:06,469 [INFO] new_opacus_codex.train_steps: epoch=1 step=10 loss=1.6915
2
+ 2026-03-18 05:27:08,065 [INFO] new_opacus_codex.train_steps: epoch=1 step=20 loss=1.6692
3
+ 2026-03-18 05:29:10,619 [INFO] new_opacus_codex.train_steps: epoch=1 step=30 loss=1.7137
4
+ 2026-03-18 05:31:12,467 [INFO] new_opacus_codex.train_steps: epoch=1 step=40 loss=1.6041
5
+ 2026-03-18 05:33:14,135 [INFO] new_opacus_codex.train_steps: epoch=1 step=50 loss=1.6079
6
+ 2026-03-18 05:33:26,911 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=1 step=50 eval_loss=0.9748 duration_sec=12.77
7
+ 2026-03-18 05:35:28,635 [INFO] new_opacus_codex.train_steps: epoch=1 step=60 loss=1.5953
8
+ 2026-03-18 05:37:31,067 [INFO] new_opacus_codex.train_steps: epoch=1 step=70 loss=1.7617
9
+ 2026-03-18 05:39:33,024 [INFO] new_opacus_codex.train_steps: epoch=1 step=80 loss=1.7465
10
+ 2026-03-18 05:41:35,477 [INFO] new_opacus_codex.train_steps: epoch=1 step=90 loss=1.6112
11
+ 2026-03-18 05:43:57,180 [INFO] new_opacus_codex.train_steps: epoch=2 step=100 loss=1.7602
12
+ 2026-03-18 05:44:09,918 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=2 step=100 eval_loss=0.9495 duration_sec=12.73
13
+ 2026-03-18 05:46:11,788 [INFO] new_opacus_codex.train_steps: epoch=2 step=110 loss=1.7710
14
+ 2026-03-18 05:48:13,197 [INFO] new_opacus_codex.train_steps: epoch=2 step=120 loss=1.5877
15
+ 2026-03-18 05:50:14,090 [INFO] new_opacus_codex.train_steps: epoch=2 step=130 loss=1.5471
16
+ 2026-03-18 05:52:15,284 [INFO] new_opacus_codex.train_steps: epoch=2 step=140 loss=1.7275
17
+ 2026-03-18 05:54:17,034 [INFO] new_opacus_codex.train_steps: epoch=2 step=150 loss=1.5942
18
+ 2026-03-18 05:54:29,800 [INFO] new_opacus_codex.train_steps: eval event=eval_step epoch=2 step=150 eval_loss=0.9416 duration_sec=12.76
19
+ 2026-03-18 05:56:31,119 [INFO] new_opacus_codex.train_steps: epoch=2 step=160 loss=1.3974
20
+ 2026-03-18 05:58:33,540 [INFO] new_opacus_codex.train_steps: epoch=2 step=170 loss=1.6737
21
+ 2026-03-18 06:00:35,917 [INFO] new_opacus_codex.train_steps: epoch=2 step=180 loss=1.6957