SirajRLX commited on
Commit
b9bc317
·
verified ·
1 Parent(s): 7be9bb6

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. best_adapter/README.md +207 -0
  2. best_adapter/adapter_config.json +43 -0
  3. checkpoints/checkpoint-2500/README.md +207 -0
  4. checkpoints/checkpoint-2500/adapter_config.json +43 -0
  5. checkpoints/checkpoint-2500/trainer_state.json +0 -0
  6. checkpoints/checkpoint-3500/README.md +207 -0
  7. checkpoints/checkpoint-3500/trainer_state.json +0 -0
  8. checkpoints/checkpoint-4500/README.md +207 -0
  9. checkpoints/checkpoint-4500/adapter_config.json +43 -0
  10. checkpoints/checkpoint-4500/trainer_state.json +0 -0
  11. checkpoints/checkpoint-5500/README.md +207 -0
  12. checkpoints/checkpoint-5500/adapter_config.json +43 -0
  13. checkpoints/checkpoint-5500/trainer_state.json +0 -0
  14. config_resolved.yaml +102 -0
  15. eval_final.json +8 -0
  16. wandb/debug-internal.log +12 -0
  17. wandb/debug.log +29 -0
  18. wandb/run-20251226_180557-p7wwl5ek/files/config.yaml +173 -0
  19. wandb/run-20251226_180557-p7wwl5ek/files/output.log +34 -0
  20. wandb/run-20251226_180557-p7wwl5ek/files/requirements.txt +104 -0
  21. wandb/run-20251226_180557-p7wwl5ek/files/wandb-metadata.json +47 -0
  22. wandb/run-20251226_180557-p7wwl5ek/files/wandb-summary.json +1 -0
  23. wandb/run-20251226_180557-p7wwl5ek/logs/debug-core.log +14 -0
  24. wandb/run-20251226_180557-p7wwl5ek/logs/debug-internal.log +11 -0
  25. wandb/run-20251226_180557-p7wwl5ek/logs/debug.log +23 -0
  26. wandb/run-20251226_180557-p7wwl5ek/run-p7wwl5ek.wandb +0 -0
  27. wandb/run-20251226_180613-i1cmzyri/files/config.yaml +173 -0
  28. wandb/run-20251226_180613-i1cmzyri/files/output.log +48 -0
  29. wandb/run-20251226_180613-i1cmzyri/files/requirements.txt +104 -0
  30. wandb/run-20251226_180613-i1cmzyri/files/wandb-metadata.json +47 -0
  31. wandb/run-20251226_180613-i1cmzyri/files/wandb-summary.json +1 -0
  32. wandb/run-20251226_180613-i1cmzyri/logs/debug-core.log +14 -0
  33. wandb/run-20251226_180613-i1cmzyri/logs/debug-internal.log +11 -0
  34. wandb/run-20251226_180613-i1cmzyri/logs/debug.log +23 -0
  35. wandb/run-20251226_180702-oordmylf/files/config.yaml +173 -0
  36. wandb/run-20251226_180702-oordmylf/files/output.log +48 -0
  37. wandb/run-20251226_180702-oordmylf/files/requirements.txt +104 -0
  38. wandb/run-20251226_180702-oordmylf/files/wandb-metadata.json +47 -0
  39. wandb/run-20251226_180702-oordmylf/files/wandb-summary.json +1 -0
  40. wandb/run-20251226_180702-oordmylf/logs/debug-core.log +14 -0
  41. wandb/run-20251226_180702-oordmylf/logs/debug-internal.log +11 -0
  42. wandb/run-20251226_180702-oordmylf/logs/debug.log +23 -0
  43. wandb/run-20251226_180808-ny9q48hd/files/config.yaml +630 -0
  44. wandb/run-20251226_180808-ny9q48hd/files/output.log +0 -0
  45. wandb/run-20251226_180808-ny9q48hd/files/requirements.txt +104 -0
  46. wandb/run-20251226_180808-ny9q48hd/files/wandb-metadata.json +47 -0
  47. wandb/run-20251226_180808-ny9q48hd/files/wandb-summary.json +1 -0
  48. wandb/run-20251226_180808-ny9q48hd/logs/debug-core.log +16 -0
  49. wandb/run-20251226_180808-ny9q48hd/logs/debug-internal.log +12 -0
  50. wandb/run-20251226_180808-ny9q48hd/logs/debug.log +29 -0
best_adapter/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Models/Devstral-Small-2-24B-HS-CPT
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Models/Devstral-Small-2-24B-HS-CPT
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
best_adapter/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Models/Devstral-Small-2-24B-HS-CPT",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj",
34
+ "o_proj",
35
+ "k_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
checkpoints/checkpoint-2500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Models/Devstral-Small-2-24B-HS-CPT
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Models/Devstral-Small-2-24B-HS-CPT
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoints/checkpoint-2500/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Models/Devstral-Small-2-24B-HS-CPT",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj",
34
+ "o_proj",
35
+ "k_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
checkpoints/checkpoint-2500/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-3500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Models/Devstral-Small-2-24B-HS-CPT
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Models/Devstral-Small-2-24B-HS-CPT
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoints/checkpoint-3500/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-4500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Models/Devstral-Small-2-24B-HS-CPT
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Models/Devstral-Small-2-24B-HS-CPT
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoints/checkpoint-4500/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Models/Devstral-Small-2-24B-HS-CPT",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj",
34
+ "o_proj",
35
+ "k_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
checkpoints/checkpoint-4500/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoints/checkpoint-5500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Models/Devstral-Small-2-24B-HS-CPT
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Models/Devstral-Small-2-24B-HS-CPT
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoints/checkpoint-5500/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Models/Devstral-Small-2-24B-HS-CPT",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj",
34
+ "o_proj",
35
+ "k_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
checkpoints/checkpoint-5500/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
config_resolved.yaml ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run:
2
+ run_dir: ./task2file/sft_devstral_24B_v2
3
+ seed: 42
4
+ wandb:
5
+ enabled: true
6
+ project: sft-training
7
+ entity: null
8
+ name: null
9
+ tags:
10
+ - sft-lora
11
+ - 24b-Devstral
12
+ notes: null
13
+ model:
14
+ repo_id: ./Models/Devstral-Small-2-24B-HS-CPT
15
+ revision: null
16
+ base_local_dir: base_model
17
+ trust_remote_code: true
18
+ tokenizer_use_fast: true
19
+ device_map: auto
20
+ torch_dtype: bfloat16
21
+ use_4bit: false
22
+ bnb_4bit_quant_type: nf4
23
+ bnb_4bit_use_double_quant: false
24
+ bnb_4bit_compute_dtype: bfloat16
25
+ attn_implementation: null
26
+ data:
27
+ train_jsonl: sft_dataset.jsonl
28
+ eval_jsonl: null
29
+ eval_split_ratio: 0.1
30
+ instruction_field: instruction
31
+ input_field: input
32
+ output_field: output
33
+ format_type: custom
34
+ system_prompt: "You are a Hyperswitch Rust code analyzer. Identify functions/structs\
35
+ \ that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain\
36
+ \ the data flow and why each component must change:\n- Flow: [Input \u2192 Processing\
37
+ \ \u2192 Output with arrows]\n- For each component: \"The [ComponentName] ([path])\
38
+ \ must [action] because [reason]\u2014without this, [consequence]\"\n- Explain\
39
+ \ coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\n\
40
+ add::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n\
41
+ 1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for\
42
+ \ nested items: `status::StructName::Type::Name`\n3. Always explain \"must change\
43
+ \ because\" and \"without this\"\n3. Types of components: function, struct, enum,\
44
+ \ impl, trait\n4. If there is extra information (e.g., enum variants), include\
45
+ \ that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\
46
+ \n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook\
47
+ \ system routes events via EventClass enum. Flow: webhook \u2192 EventClass \u2192\
48
+ \ handler \u2192 processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass)\
49
+ \ must add Subscriptions variant because it defines event routing\u2014without\
50
+ \ this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus)\
51
+ \ must map to EventType because it converts status to events\u2014without this,\
52
+ \ status changes don't trigger webhooks. These are coupled: EventClass routes\
53
+ \ to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\n\
54
+ crates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n"
55
+ custom_template: '##INSTRUCTION
56
+
57
+ {instruction}<|im_end|>
58
+
59
+ {input}<|im_end|>
60
+
61
+ {output}<|im_end|>'
62
+ max_length: 2048
63
+ shuffle: true
64
+ num_proc: 4
65
+ peft:
66
+ enabled: true
67
+ r: 8
68
+ lora_alpha: 16
69
+ lora_dropout: 0.05
70
+ bias: none
71
+ target_modules: auto
72
+ train:
73
+ num_train_epochs: 6
74
+ per_device_train_batch_size: 1
75
+ per_device_eval_batch_size: 1
76
+ gradient_accumulation_steps: 8
77
+ learning_rate: 1e-4
78
+ weight_decay: 0.0
79
+ warmup_ratio: 0.08
80
+ lr_scheduler_type: cosine
81
+ optim: adamw_torch
82
+ max_grad_norm: 0.8
83
+ gradient_checkpointing: true
84
+ logging_steps: 2
85
+ save_strategy: steps
86
+ save_steps: 500
87
+ save_total_limit: 20
88
+ evaluation_strategy: steps
89
+ eval_steps: 100
90
+ load_best_model_at_end: true
91
+ early_stopping:
92
+ enabled: true
93
+ patience: 5
94
+ min_delta: 0.001
95
+ metric: eval_loss
96
+ mode: min
97
+ resume_from_checkpoint: auto
98
+ merge:
99
+ enabled: true
100
+ merged_dtype: float16
101
+ max_shard_size: 2GB
102
+ output_dir: ./Models/Devstral-Small-2-24B-HS-CPT-SFT_v2
eval_final.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_loss": 0.6706293225288391,
3
+ "eval_runtime": 511.6513,
4
+ "eval_samples_per_second": 4.118,
5
+ "eval_steps_per_second": 4.118,
6
+ "epoch": 3.2067510548523206,
7
+ "perplexity": 1.955467553274469
8
+ }
wandb/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:08:08.66103332Z","level":"INFO","msg":"stream: starting","core version":"0.23.1"}
2
+ {"time":"2025-12-26T18:08:08.82172381Z","level":"INFO","msg":"stream: created new stream","id":"ny9q48hd"}
3
+ {"time":"2025-12-26T18:08:08.821819478Z","level":"INFO","msg":"handler: started","stream_id":"ny9q48hd"}
4
+ {"time":"2025-12-26T18:08:08.822049155Z","level":"INFO","msg":"stream: started","id":"ny9q48hd"}
5
+ {"time":"2025-12-26T18:08:08.822072296Z","level":"INFO","msg":"writer: started","stream_id":"ny9q48hd"}
6
+ {"time":"2025-12-26T18:08:08.822098276Z","level":"INFO","msg":"sender: started","stream_id":"ny9q48hd"}
7
+ {"time":"2025-12-28T04:02:04.935383596Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
8
+ {"time":"2025-12-28T04:02:05.045953421Z","level":"INFO","msg":"handler: operation stats","stats":{}}
9
+ {"time":"2025-12-28T04:02:05.051806259Z","level":"INFO","msg":"stream: closing","id":"ny9q48hd"}
10
+ {"time":"2025-12-28T04:02:05.051833004Z","level":"INFO","msg":"handler: closed","stream_id":"ny9q48hd"}
11
+ {"time":"2025-12-28T04:02:05.051917075Z","level":"INFO","msg":"sender: closed","stream_id":"ny9q48hd"}
12
+ {"time":"2025-12-28T04:02:05.051937152Z","level":"INFO","msg":"stream: closed","id":"ny9q48hd"}
wandb/debug.log ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Current SDK version is 0.23.1
2
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Configure stats pid to 190322
3
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings
4
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from /workspace/wandb/settings
5
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:setup_run_log_directory():714] Logging user logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180808-ny9q48hd/logs/debug.log
7
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:setup_run_log_directory():715] Logging internal logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180808-ny9q48hd/logs/debug-internal.log
8
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():841] calling init triggers
9
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():846] wandb.init called with sweep_config: {}
10
+ config: {'model': {'repo_id': './Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'base_local_dir': 'base_model', 'trust_remote_code': True, 'tokenizer_use_fast': True, 'device_map': 'auto', 'torch_dtype': 'bfloat16', 'use_4bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'attn_implementation': None}, 'data': {'train_jsonl': 'sft_dataset.jsonl', 'eval_jsonl': None, 'eval_split_ratio': 0.1, 'instruction_field': 'instruction', 'input_field': 'input', 'output_field': 'output', 'format_type': 'custom', 'system_prompt': 'You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain the data flow and why each component must change:\n- Flow: [Input → Processing → Output with arrows]\n- For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"\n- Explain coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\nadd::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for nested items: `status::StructName::Type::Name`\n3. Always explain "must change because" and "without this"\n3. Types of components: function, struct, enum, impl, trait\n4. If there is extra information (e.g., enum variants), include that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don\'t trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\ncrates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n', 'custom_template': '##INSTRUCTION\n{instruction}<|im_end|>\n{input}<|im_end|>\n{output}<|im_end|>', 'max_length': 2048, 'shuffle': True, 'num_proc': 4}, 'peft': {'enabled': True, 'r': 8, 'lora_alpha': 16, 'lora_dropout': 0.05, 'bias': 'none', 'target_modules': 'auto'}, 'train': {'num_train_epochs': 6, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'learning_rate': '1e-4', 'weight_decay': 0.0, 'warmup_ratio': 0.08, 'lr_scheduler_type': 'cosine', 'optim': 'adamw_torch', 'max_grad_norm': 0.8, 'gradient_checkpointing': True, 'logging_steps': 2, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'evaluation_strategy': 'steps', 'eval_steps': 100, 'load_best_model_at_end': True, 'early_stopping': {'enabled': True, 'patience': 5, 'min_delta': 0.001, 'metric': 'eval_loss', 'mode': 'min'}, 'resume_from_checkpoint': 'auto'}, 'run_dir': 'task2file/sft_devstral_24B_v2', '_wandb': {}}
11
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():889] starting backend
12
+ 2025-12-26 18:08:08,653 INFO MainThread:190322 [wandb_init.py:init():892] sending inform_init request
13
+ 2025-12-26 18:08:08,658 INFO MainThread:190322 [wandb_init.py:init():900] backend started and connected
14
+ 2025-12-26 18:08:08,661 INFO MainThread:190322 [wandb_init.py:init():970] updated telemetry
15
+ 2025-12-26 18:08:08,662 INFO MainThread:190322 [wandb_init.py:init():994] communicating run to backend with 90.0 second timeout
16
+ 2025-12-26 18:08:09,021 INFO MainThread:190322 [wandb_init.py:init():1041] starting run threads in backend
17
+ 2025-12-26 18:08:09,134 INFO MainThread:190322 [wandb_run.py:_console_start():2521] atexit reg
18
+ 2025-12-26 18:08:09,134 INFO MainThread:190322 [wandb_run.py:_redirect():2369] redirect: wrap_raw
19
+ 2025-12-26 18:08:09,135 INFO MainThread:190322 [wandb_run.py:_redirect():2438] Wrapping output streams.
20
+ 2025-12-26 18:08:09,135 INFO MainThread:190322 [wandb_run.py:_redirect():2461] Redirects installed.
21
+ 2025-12-26 18:08:09,138 INFO MainThread:190322 [wandb_init.py:init():1081] run started, returning control to user process
22
+ 2025-12-26 18:08:52,955 INFO MainThread:190322 [wandb_run.py:_config_callback():1396] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.0', 'base_model_name_or_path': 'Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'inference_mode': False, 'r': 8, 'target_modules': ['v_proj', 'q_proj', 'o_proj', 'k_proj'], 'exclude_modules': None, 'lora_alpha': 16, 'lora_dropout': 0.05, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'image_token_index': 10, 'projector_hidden_act': 'gelu', 'vision_feature_layer': -1, 'vision_config': {'hidden_size': 1024, 'intermediate_size': 4096, 'num_hidden_layers': 24, 'num_attention_heads': 16, 'num_channels': 3, 'patch_size': 14, 'image_size': 1540, 'attention_dropout': 0.0, 'hidden_act': 'silu', 'head_dim': 64, 'initializer_range': 0.02, 'rope_parameters': {'rope_theta': 10000.0, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': '', 'model_type': 'pixtral', 'output_attentions': False}, 'text_config': {'vocab_size': 131072, 'max_position_embeddings': 393216, 'hidden_size': 5120, 'intermediate_size': 32768, 'num_hidden_layers': 40, 'num_attention_heads': 32, 'sliding_window': None, 'head_dim': 128, 'num_key_value_heads': 8, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'use_cache': True, 'attention_dropout': 0.0, 'rope_parameters': {'beta_fast': 32.0, 'beta_slow': 1.0, 'factor': 48.0, 'llama_4_scaling_beta': 0.1, 'mscale': 1.0, 'mscale_all_dim': 1.0, 'original_max_position_embeddings': 8192, 'rope_theta': 100000000.0, 'rope_type': 'yarn', 'type': 'yarn'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 11, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': '', 'model_type': 'ministral3', 'output_attentions': False}, 'multimodal_projector_bias': False, 'spatial_merge_size': 2, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': ['Mistral3ForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': 'Models/Devstral-Small-2-24B-HS-CPT', 'transformers_version': '5.0.0.dev0', 'model_type': 'mistral3', 'use_cache': False, 'output_attentions': False, 'output_dir': 'task2file/sft_devstral_24B_v2/checkpoints', 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 0.0001, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 0.8, 'num_train_epochs': 6.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.08, 'warmup_steps': 0.08, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 2, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'eval_loss', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True}
23
+ 2025-12-26 18:08:52,965 INFO MainThread:190322 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 24022764544 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7b8940b75420>>
24
+ 2025-12-26 18:08:52,965 INFO MainThread:190322 [wandb_run.py:_config_callback():1396] config_cb model/num_parameters 24022764544 None
25
+ 2025-12-28 04:02:04,643 INFO MainThread:190322 [wandb_run.py:_finish():2287] finishing run sirajuddin-shaik-007/sft-training/ny9q48hd
26
+ 2025-12-28 04:02:04,645 INFO MainThread:190322 [wandb_run.py:_atexit_cleanup():2486] got exitcode: 0
27
+ 2025-12-28 04:02:04,646 INFO MainThread:190322 [wandb_run.py:_restore():2468] restore
28
+ 2025-12-28 04:02:04,646 INFO MainThread:190322 [wandb_run.py:_restore():2474] restore done
29
+ 2025-12-28 04:02:05,050 INFO MainThread:190322 [wandb_run.py:_footer_sync_info():3862] logging synced files
wandb/run-20251226_180557-p7wwl5ek/files/config.yaml ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _wandb:
2
+ value:
3
+ cli_version: 0.23.1
4
+ e:
5
+ d58ijptqwmgcs1za2j0g4cusbym7bptn:
6
+ args:
7
+ - --config
8
+ - trainer-kit/SFT/config_instruct.yaml
9
+ codePath: trainer-kit/SFT/run_instruct.py
10
+ codePathLocal: trainer-kit/SFT/run_instruct.py
11
+ cpu_count: 12
12
+ cpu_count_logical: 24
13
+ cudaVersion: "13.0"
14
+ disk:
15
+ /:
16
+ total: "791251738624"
17
+ used: "385798254592"
18
+ email: shaiksirajuddin9949@gmail.com
19
+ executable: /workspace/llm_finetuning_env/bin/python
20
+ gpu: NVIDIA A100-SXM4-80GB
21
+ gpu_count: 2
22
+ gpu_nvidia:
23
+ - architecture: Ampere
24
+ cudaCores: 6912
25
+ memoryTotal: "85899345920"
26
+ name: NVIDIA A100-SXM4-80GB
27
+ uuid: GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba
28
+ - architecture: Ampere
29
+ cudaCores: 6912
30
+ memoryTotal: "85899345920"
31
+ name: NVIDIA A100-SXM4-80GB
32
+ uuid: GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40
33
+ host: a100-2gpu-shell-session-757d587799-mfdvv
34
+ memory:
35
+ total: "359047892992"
36
+ os: Linux-6.12.46+-x86_64-with-glibc2.35
37
+ program: /workspace/trainer-kit/SFT/run_instruct.py
38
+ python: CPython 3.10.12
39
+ root: task2file/sft_devstral_24B_v2
40
+ startedAt: "2025-12-26T18:05:57.725585Z"
41
+ writerId: d58ijptqwmgcs1za2j0g4cusbym7bptn
42
+ m: []
43
+ python_version: 3.10.12
44
+ t:
45
+ "1":
46
+ - 1
47
+ - 11
48
+ - 41
49
+ - 49
50
+ - 51
51
+ - 71
52
+ - 98
53
+ "2":
54
+ - 1
55
+ - 11
56
+ - 41
57
+ - 49
58
+ - 51
59
+ - 71
60
+ - 98
61
+ "3":
62
+ - 15
63
+ - 16
64
+ "4": 3.10.12
65
+ "5": 0.23.1
66
+ "6": 5.0.0.dev0
67
+ "12": 0.23.1
68
+ "13": linux-x86_64
69
+ data:
70
+ value:
71
+ custom_template: |-
72
+ ##INSTRUCTION
73
+ {instruction}<|im_end|>
74
+ {input}<|im_end|>
75
+ {output}<|im_end|>
76
+ eval_jsonl: null
77
+ eval_split_ratio: 0.1
78
+ format_type: custom
79
+ input_field: input
80
+ instruction_field: instruction
81
+ max_length: 2048
82
+ num_proc: 4
83
+ output_field: output
84
+ shuffle: true
85
+ system_prompt: |
86
+ You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.
87
+
88
+ ## Output Format
89
+
90
+ ##OUTPUT
91
+ Explain the data flow and why each component must change:
92
+ - Flow: [Input → Processing → Output with arrows]
93
+ - For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"
94
+ - Explain coupling between components
95
+
96
+ ##SELECT
97
+ modify::crates/path/to/file.rs::impl::ComponentName
98
+ add::crates/another/file.rs::function::AnotherComponent
99
+ <EOS>
100
+
101
+ ## Rules
102
+
103
+ 1. Use full paths: `remove::crates/folder/file.rs::Type::Name`
104
+ 2. Use `::` for nested items: `status::StructName::Type::Name`
105
+ 3. Always explain "must change because" and "without this"
106
+ 3. Types of components: function, struct, enum, impl, trait
107
+ 4. If there is extra information (e.g., enum variants), include that too.
108
+ 5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>
109
+
110
+ ## Example
111
+
112
+ ##TASK
113
+ Add webhook subscription support
114
+
115
+ ##OUTPUT
116
+ The webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don't trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.
117
+
118
+ ##SELECT
119
+ crates/common_enums/src/enums.rs::EventClass
120
+ crates/common_enums/src/transformers.rs::SubscriptionStatus
121
+ <EOS>
122
+ train_jsonl: ./sft_dataset.jsonl
123
+ model:
124
+ value:
125
+ attn_implementation: null
126
+ base_local_dir: base_model
127
+ bnb_4bit_compute_dtype: bfloat16
128
+ bnb_4bit_quant_type: nf4
129
+ bnb_4bit_use_double_quant: false
130
+ device_map: auto
131
+ repo_id: ./Models/Devstral-Small-2-24B-HS-CPT
132
+ revision: null
133
+ tokenizer_use_fast: true
134
+ torch_dtype: bfloat16
135
+ trust_remote_code: true
136
+ use_4bit: false
137
+ peft:
138
+ value:
139
+ bias: none
140
+ enabled: true
141
+ lora_alpha: 16
142
+ lora_dropout: 0.05
143
+ r: 8
144
+ target_modules: auto
145
+ run_dir:
146
+ value: task2file/sft_devstral_24B_v2
147
+ train:
148
+ value:
149
+ early_stopping:
150
+ enabled: true
151
+ metric: eval_loss
152
+ min_delta: 0.001
153
+ mode: min
154
+ patience: 5
155
+ eval_steps: 100
156
+ evaluation_strategy: steps
157
+ gradient_accumulation_steps: 8
158
+ gradient_checkpointing: true
159
+ learning_rate: "1e-4"
160
+ load_best_model_at_end: true
161
+ logging_steps: 2
162
+ lr_scheduler_type: cosine
163
+ max_grad_norm: 0.8
164
+ num_train_epochs: 6
165
+ optim: adamw_torch
166
+ per_device_eval_batch_size: 1
167
+ per_device_train_batch_size: 1
168
+ resume_from_checkpoint: auto
169
+ save_steps: 500
170
+ save_strategy: steps
171
+ save_total_limit: 20
172
+ warmup_ratio: 0.08
173
+ weight_decay: 0
wandb/run-20251226_180557-p7wwl5ek/files/output.log ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Wandb initialized: project='sft-training', name='auto-generated'
2
+ [info] Detected Mistral3 model architecture, loading with specific class
3
+ Traceback (most recent call last):
4
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
5
+ main()
6
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 849, in main
7
+ model, tokenizer = load_base_model_and_tokenizer(cfg, base_dir)
8
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 579, in load_base_model_and_tokenizer
9
+ model = Mistral3ForConditionalGeneration.from_pretrained(
10
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3948, in from_pretrained
11
+ device_map = _get_device_map(model, device_map, max_memory, hf_quantizer)
12
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/integrations/accelerate.py", line 281, in _get_device_map
13
+ inferred_max_memory = get_balanced_memory(
14
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/integrations/accelerate.py", line 197, in get_balanced_memory
15
+ max_memory = get_max_memory(max_memory)
16
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 804, in get_max_memory
17
+ _ = torch.tensor([0], device=i)
18
+ KeyboardInterrupt
19
+ Traceback (most recent call last):
20
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
21
+ main()
22
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 849, in main
23
+ model, tokenizer = load_base_model_and_tokenizer(cfg, base_dir)
24
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 579, in load_base_model_and_tokenizer
25
+ model = Mistral3ForConditionalGeneration.from_pretrained(
26
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3948, in from_pretrained
27
+ device_map = _get_device_map(model, device_map, max_memory, hf_quantizer)
28
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/integrations/accelerate.py", line 281, in _get_device_map
29
+ inferred_max_memory = get_balanced_memory(
30
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/transformers/integrations/accelerate.py", line 197, in get_balanced_memory
31
+ max_memory = get_max_memory(max_memory)
32
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 804, in get_max_memory
33
+ _ = torch.tensor([0], device=i)
34
+ KeyboardInterrupt
wandb/run-20251226_180557-p7wwl5ek/files/requirements.txt ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exceptiongroup==1.3.1
2
+ wheel==0.45.1
3
+ python-dateutil==2.9.0.post0
4
+ nvidia-ml-py==13.580.82
5
+ huggingface_hub==1.2.3
6
+ idna==3.11
7
+ click==8.3.1
8
+ numpy==2.2.6
9
+ httpx==0.28.1
10
+ tokenizers==0.22.1
11
+ sympy==1.13.1
12
+ yarl==1.22.0
13
+ async-timeout==5.0.1
14
+ datasets==4.4.2
15
+ platformdirs==4.5.1
16
+ nvidia-cuda-cupti-cu12==12.1.105
17
+ nvidia-nvtx-cu12==12.1.105
18
+ smmap==5.0.2
19
+ accelerate==1.12.0
20
+ requests==2.32.5
21
+ aiohttp==3.13.2
22
+ bitsandbytes==0.49.0
23
+ nvidia-cublas-cu12==12.1.3.1
24
+ mpmath==1.3.0
25
+ torchaudio==2.5.1+cu121
26
+ nvidia-cuda-runtime-cu12==12.1.105
27
+ typing-inspection==0.4.2
28
+ GitPython==3.1.45
29
+ xxhash==3.6.0
30
+ nvidia-cusolver-cu12==11.4.5.107
31
+ pydantic_core==2.41.5
32
+ six==1.17.0
33
+ torchvision==0.20.1+cu121
34
+ typing_extensions==4.15.0
35
+ triton==3.1.0
36
+ charset-normalizer==3.4.4
37
+ nvitop==1.6.1
38
+ wandb==0.23.1
39
+ regex==2025.11.3
40
+ pip==25.3
41
+ nvidia-cusparse-cu12==12.1.0.106
42
+ pytz==2025.2
43
+ Jinja2==3.1.6
44
+ psutil==7.2.0
45
+ pillow==12.0.0
46
+ packaging==25.0
47
+ safetensors==0.7.0
48
+ sentry-sdk==2.48.0
49
+ gitdb==4.0.12
50
+ httpcore==1.0.9
51
+ setuptools==80.9.0
52
+ nvidia-cufft-cu12==11.0.2.54
53
+ anyio==4.12.0
54
+ transformers==5.0.0.dev0
55
+ pydantic==2.12.5
56
+ fsspec==2025.10.0
57
+ filelock==3.20.0
58
+ PyYAML==6.0.3
59
+ hf-xet==1.2.0
60
+ nvidia-cudnn-cu12==9.1.0.70
61
+ tqdm==4.67.1
62
+ MarkupSafe==2.1.5
63
+ attrs==25.4.0
64
+ nvidia-cuda-nvrtc-cu12==12.1.105
65
+ peft==0.18.0
66
+ aiohappyeyeballs==2.6.1
67
+ networkx==3.4.2
68
+ nvidia-nvjitlink-cu12==12.9.86
69
+ certifi==2025.11.12
70
+ pyarrow==22.0.0
71
+ dill==0.4.0
72
+ protobuf==6.33.2
73
+ aiosignal==1.4.0
74
+ frozenlist==1.8.0
75
+ urllib3==2.6.2
76
+ propcache==0.4.1
77
+ tzdata==2025.3
78
+ pandas==2.3.3
79
+ annotated-types==0.7.0
80
+ shellingham==1.5.4
81
+ nvidia-nccl-cu12==2.21.5
82
+ multidict==6.7.0
83
+ nvidia-curand-cu12==10.3.2.106
84
+ trl==0.26.2
85
+ torch==2.5.1+cu121
86
+ h11==0.16.0
87
+ multiprocess==0.70.18
88
+ typer-slim==0.21.0
89
+ wheel==0.45.1
90
+ tomli==2.0.1
91
+ autocommand==2.2.2
92
+ jaraco.context==5.3.0
93
+ zipp==3.19.2
94
+ packaging==24.2
95
+ inflect==7.3.1
96
+ typing_extensions==4.12.2
97
+ platformdirs==4.2.2
98
+ jaraco.functools==4.0.1
99
+ jaraco.collections==5.1.0
100
+ jaraco.text==3.12.1
101
+ backports.tarfile==1.2.0
102
+ more-itertools==10.3.0
103
+ importlib_metadata==8.0.0
104
+ typeguard==4.3.0
wandb/run-20251226_180557-p7wwl5ek/files/wandb-metadata.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.12.46+-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.10.12",
4
+ "startedAt": "2025-12-26T18:05:57.725585Z",
5
+ "args": [
6
+ "--config",
7
+ "trainer-kit/SFT/config_instruct.yaml"
8
+ ],
9
+ "program": "/workspace/trainer-kit/SFT/run_instruct.py",
10
+ "codePath": "trainer-kit/SFT/run_instruct.py",
11
+ "codePathLocal": "trainer-kit/SFT/run_instruct.py",
12
+ "email": "shaiksirajuddin9949@gmail.com",
13
+ "root": "task2file/sft_devstral_24B_v2",
14
+ "host": "a100-2gpu-shell-session-757d587799-mfdvv",
15
+ "executable": "/workspace/llm_finetuning_env/bin/python",
16
+ "cpu_count": 12,
17
+ "cpu_count_logical": 24,
18
+ "gpu": "NVIDIA A100-SXM4-80GB",
19
+ "gpu_count": 2,
20
+ "disk": {
21
+ "/": {
22
+ "total": "791251738624",
23
+ "used": "385798254592"
24
+ }
25
+ },
26
+ "memory": {
27
+ "total": "359047892992"
28
+ },
29
+ "gpu_nvidia": [
30
+ {
31
+ "name": "NVIDIA A100-SXM4-80GB",
32
+ "memoryTotal": "85899345920",
33
+ "cudaCores": 6912,
34
+ "architecture": "Ampere",
35
+ "uuid": "GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba"
36
+ },
37
+ {
38
+ "name": "NVIDIA A100-SXM4-80GB",
39
+ "memoryTotal": "85899345920",
40
+ "cudaCores": 6912,
41
+ "architecture": "Ampere",
42
+ "uuid": "GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40"
43
+ }
44
+ ],
45
+ "cudaVersion": "13.0",
46
+ "writerId": "d58ijptqwmgcs1za2j0g4cusbym7bptn"
47
+ }
wandb/run-20251226_180557-p7wwl5ek/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_wandb":{"runtime":2},"_runtime":2}
wandb/run-20251226_180557-p7wwl5ek/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:05:57.823957218Z","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpht47h6p7/port-189168.txt","pid":189168,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2025-12-26T18:05:57.824696455Z","level":"INFO","msg":"server: will exit if parent process dies","ppid":189168}
3
+ {"time":"2025-12-26T18:05:57.82469386Z","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-189168-189238-3343268377/socket","Net":"unix"}}
4
+ {"time":"2025-12-26T18:05:58.003542516Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2025-12-26T18:05:58.010768581Z","level":"INFO","msg":"handleInformInit: received","streamId":"p7wwl5ek","id":"1(@)"}
6
+ {"time":"2025-12-26T18:05:58.174029705Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"p7wwl5ek","id":"1(@)"}
7
+ {"time":"2025-12-26T18:06:01.288089083Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2025-12-26T18:06:01.288159025Z","level":"INFO","msg":"connection: closing","id":"1(@)"}
9
+ {"time":"2025-12-26T18:06:01.288200541Z","level":"INFO","msg":"server is shutting down"}
10
+ {"time":"2025-12-26T18:06:01.288263567Z","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
11
+ {"time":"2025-12-26T18:06:01.288324658Z","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-189168-189238-3343268377/socket","Net":"unix"}}
12
+ {"time":"2025-12-26T18:06:01.731707539Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2025-12-26T18:06:01.731736159Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2025-12-26T18:06:01.731744711Z","level":"INFO","msg":"server is closed"}
wandb/run-20251226_180557-p7wwl5ek/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:05:58.010910802Z","level":"INFO","msg":"stream: starting","core version":"0.23.1"}
2
+ {"time":"2025-12-26T18:05:58.173852132Z","level":"INFO","msg":"stream: created new stream","id":"p7wwl5ek"}
3
+ {"time":"2025-12-26T18:05:58.173936115Z","level":"INFO","msg":"handler: started","stream_id":"p7wwl5ek"}
4
+ {"time":"2025-12-26T18:05:58.174019933Z","level":"INFO","msg":"stream: started","id":"p7wwl5ek"}
5
+ {"time":"2025-12-26T18:05:58.174038448Z","level":"INFO","msg":"writer: started","stream_id":"p7wwl5ek"}
6
+ {"time":"2025-12-26T18:05:58.174048363Z","level":"INFO","msg":"sender: started","stream_id":"p7wwl5ek"}
7
+ {"time":"2025-12-26T18:06:01.288165843Z","level":"INFO","msg":"stream: closing","id":"p7wwl5ek"}
8
+ {"time":"2025-12-26T18:06:01.633870412Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2025-12-26T18:06:01.730892428Z","level":"INFO","msg":"handler: closed","stream_id":"p7wwl5ek"}
10
+ {"time":"2025-12-26T18:06:01.730977697Z","level":"INFO","msg":"sender: closed","stream_id":"p7wwl5ek"}
11
+ {"time":"2025-12-26T18:06:01.730985259Z","level":"INFO","msg":"stream: closed","id":"p7wwl5ek"}
wandb/run-20251226_180557-p7wwl5ek/logs/debug.log ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_setup.py:_flush():80] Current SDK version is 0.23.1
2
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_setup.py:_flush():80] Configure stats pid to 189168
3
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings
4
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_setup.py:_flush():80] Loading settings from /workspace/wandb/settings
5
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_init.py:setup_run_log_directory():714] Logging user logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180557-p7wwl5ek/logs/debug.log
7
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_init.py:setup_run_log_directory():715] Logging internal logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180557-p7wwl5ek/logs/debug-internal.log
8
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_init.py:init():841] calling init triggers
9
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_init.py:init():846] wandb.init called with sweep_config: {}
10
+ config: {'model': {'repo_id': './Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'base_local_dir': 'base_model', 'trust_remote_code': True, 'tokenizer_use_fast': True, 'device_map': 'auto', 'torch_dtype': 'bfloat16', 'use_4bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'attn_implementation': None}, 'data': {'train_jsonl': './sft_dataset.jsonl', 'eval_jsonl': None, 'eval_split_ratio': 0.1, 'instruction_field': 'instruction', 'input_field': 'input', 'output_field': 'output', 'format_type': 'custom', 'system_prompt': 'You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain the data flow and why each component must change:\n- Flow: [Input → Processing → Output with arrows]\n- For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"\n- Explain coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\nadd::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for nested items: `status::StructName::Type::Name`\n3. Always explain "must change because" and "without this"\n3. Types of components: function, struct, enum, impl, trait\n4. If there is extra information (e.g., enum variants), include that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don\'t trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\ncrates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n', 'custom_template': '##INSTRUCTION\n{instruction}<|im_end|>\n{input}<|im_end|>\n{output}<|im_end|>', 'max_length': 2048, 'shuffle': True, 'num_proc': 4}, 'peft': {'enabled': True, 'r': 8, 'lora_alpha': 16, 'lora_dropout': 0.05, 'bias': 'none', 'target_modules': 'auto'}, 'train': {'num_train_epochs': 6, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'learning_rate': '1e-4', 'weight_decay': 0.0, 'warmup_ratio': 0.08, 'lr_scheduler_type': 'cosine', 'optim': 'adamw_torch', 'max_grad_norm': 0.8, 'gradient_checkpointing': True, 'logging_steps': 2, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'evaluation_strategy': 'steps', 'eval_steps': 100, 'load_best_model_at_end': True, 'early_stopping': {'enabled': True, 'patience': 5, 'min_delta': 0.001, 'metric': 'eval_loss', 'mode': 'min'}, 'resume_from_checkpoint': 'auto'}, 'run_dir': 'task2file/sft_devstral_24B_v2', '_wandb': {}}
11
+ 2025-12-26 18:05:57,727 INFO MainThread:189168 [wandb_init.py:init():889] starting backend
12
+ 2025-12-26 18:05:58,003 INFO MainThread:189168 [wandb_init.py:init():892] sending inform_init request
13
+ 2025-12-26 18:05:58,008 INFO MainThread:189168 [wandb_init.py:init():900] backend started and connected
14
+ 2025-12-26 18:05:58,010 INFO MainThread:189168 [wandb_init.py:init():970] updated telemetry
15
+ 2025-12-26 18:05:58,011 INFO MainThread:189168 [wandb_init.py:init():994] communicating run to backend with 90.0 second timeout
16
+ 2025-12-26 18:05:58,366 INFO MainThread:189168 [wandb_init.py:init():1041] starting run threads in backend
17
+ 2025-12-26 18:05:58,481 INFO MainThread:189168 [wandb_run.py:_console_start():2521] atexit reg
18
+ 2025-12-26 18:05:58,481 INFO MainThread:189168 [wandb_run.py:_redirect():2369] redirect: wrap_raw
19
+ 2025-12-26 18:05:58,481 INFO MainThread:189168 [wandb_run.py:_redirect():2438] Wrapping output streams.
20
+ 2025-12-26 18:05:58,481 INFO MainThread:189168 [wandb_run.py:_redirect():2461] Redirects installed.
21
+ 2025-12-26 18:05:58,485 INFO MainThread:189168 [wandb_init.py:init():1081] run started, returning control to user process
22
+ 2025-12-26 18:06:01,288 INFO wandb-AsyncioManager-main:189168 [service_client.py:_forward_responses():80] Reached EOF.
23
+ 2025-12-26 18:06:01,288 INFO wandb-AsyncioManager-main:189168 [mailbox.py:close():137] Closing mailbox, abandoning 1 handles.
wandb/run-20251226_180557-p7wwl5ek/run-p7wwl5ek.wandb ADDED
Binary file (8.22 kB). View file
 
wandb/run-20251226_180613-i1cmzyri/files/config.yaml ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _wandb:
2
+ value:
3
+ cli_version: 0.23.1
4
+ e:
5
+ qgjoaibnrh2irresqyfv8dka3f0628ti:
6
+ args:
7
+ - --config
8
+ - trainer-kit/SFT/config_instruct.yaml
9
+ codePath: trainer-kit/SFT/run_instruct.py
10
+ codePathLocal: trainer-kit/SFT/run_instruct.py
11
+ cpu_count: 12
12
+ cpu_count_logical: 24
13
+ cudaVersion: "13.0"
14
+ disk:
15
+ /:
16
+ total: "791251738624"
17
+ used: "386025496576"
18
+ email: shaiksirajuddin9949@gmail.com
19
+ executable: /workspace/llm_finetuning_env/bin/python
20
+ gpu: NVIDIA A100-SXM4-80GB
21
+ gpu_count: 2
22
+ gpu_nvidia:
23
+ - architecture: Ampere
24
+ cudaCores: 6912
25
+ memoryTotal: "85899345920"
26
+ name: NVIDIA A100-SXM4-80GB
27
+ uuid: GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba
28
+ - architecture: Ampere
29
+ cudaCores: 6912
30
+ memoryTotal: "85899345920"
31
+ name: NVIDIA A100-SXM4-80GB
32
+ uuid: GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40
33
+ host: a100-2gpu-shell-session-757d587799-mfdvv
34
+ memory:
35
+ total: "359047892992"
36
+ os: Linux-6.12.46+-x86_64-with-glibc2.35
37
+ program: /workspace/trainer-kit/SFT/run_instruct.py
38
+ python: CPython 3.10.12
39
+ root: task2file/sft_devstral_24B_v2
40
+ startedAt: "2025-12-26T18:06:13.427654Z"
41
+ writerId: qgjoaibnrh2irresqyfv8dka3f0628ti
42
+ m: []
43
+ python_version: 3.10.12
44
+ t:
45
+ "1":
46
+ - 1
47
+ - 11
48
+ - 41
49
+ - 49
50
+ - 51
51
+ - 71
52
+ - 98
53
+ "2":
54
+ - 1
55
+ - 11
56
+ - 41
57
+ - 49
58
+ - 51
59
+ - 71
60
+ - 98
61
+ "3":
62
+ - 15
63
+ - 16
64
+ "4": 3.10.12
65
+ "5": 0.23.1
66
+ "6": 5.0.0.dev0
67
+ "12": 0.23.1
68
+ "13": linux-x86_64
69
+ data:
70
+ value:
71
+ custom_template: |-
72
+ ##INSTRUCTION
73
+ {instruction}<|im_end|>
74
+ {input}<|im_end|>
75
+ {output}<|im_end|>
76
+ eval_jsonl: null
77
+ eval_split_ratio: 0.1
78
+ format_type: custom
79
+ input_field: input
80
+ instruction_field: instruction
81
+ max_length: 2048
82
+ num_proc: 4
83
+ output_field: output
84
+ shuffle: true
85
+ system_prompt: |
86
+ You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.
87
+
88
+ ## Output Format
89
+
90
+ ##OUTPUT
91
+ Explain the data flow and why each component must change:
92
+ - Flow: [Input → Processing → Output with arrows]
93
+ - For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"
94
+ - Explain coupling between components
95
+
96
+ ##SELECT
97
+ modify::crates/path/to/file.rs::impl::ComponentName
98
+ add::crates/another/file.rs::function::AnotherComponent
99
+ <EOS>
100
+
101
+ ## Rules
102
+
103
+ 1. Use full paths: `remove::crates/folder/file.rs::Type::Name`
104
+ 2. Use `::` for nested items: `status::StructName::Type::Name`
105
+ 3. Always explain "must change because" and "without this"
106
+ 3. Types of components: function, struct, enum, impl, trait
107
+ 4. If there is extra information (e.g., enum variants), include that too.
108
+ 5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>
109
+
110
+ ## Example
111
+
112
+ ##TASK
113
+ Add webhook subscription support
114
+
115
+ ##OUTPUT
116
+ The webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don't trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.
117
+
118
+ ##SELECT
119
+ crates/common_enums/src/enums.rs::EventClass
120
+ crates/common_enums/src/transformers.rs::SubscriptionStatus
121
+ <EOS>
122
+ train_jsonl: ./sft_dataset.jsonl
123
+ model:
124
+ value:
125
+ attn_implementation: null
126
+ base_local_dir: base_model
127
+ bnb_4bit_compute_dtype: bfloat16
128
+ bnb_4bit_quant_type: nf4
129
+ bnb_4bit_use_double_quant: false
130
+ device_map: auto
131
+ repo_id: ./Models/Devstral-Small-2-24B-HS-CPT
132
+ revision: null
133
+ tokenizer_use_fast: true
134
+ torch_dtype: bfloat16
135
+ trust_remote_code: true
136
+ use_4bit: false
137
+ peft:
138
+ value:
139
+ bias: none
140
+ enabled: true
141
+ lora_alpha: 16
142
+ lora_dropout: 0.05
143
+ r: 8
144
+ target_modules: auto
145
+ run_dir:
146
+ value: task2file/sft_devstral_24B_v2
147
+ train:
148
+ value:
149
+ early_stopping:
150
+ enabled: true
151
+ metric: eval_loss
152
+ min_delta: 0.001
153
+ mode: min
154
+ patience: 5
155
+ eval_steps: 100
156
+ evaluation_strategy: steps
157
+ gradient_accumulation_steps: 8
158
+ gradient_checkpointing: true
159
+ learning_rate: "1e-4"
160
+ load_best_model_at_end: true
161
+ logging_steps: 2
162
+ lr_scheduler_type: cosine
163
+ max_grad_norm: 0.8
164
+ num_train_epochs: 6
165
+ optim: adamw_torch
166
+ per_device_eval_batch_size: 1
167
+ per_device_train_batch_size: 1
168
+ resume_from_checkpoint: auto
169
+ save_steps: 500
170
+ save_strategy: steps
171
+ save_total_limit: 20
172
+ warmup_ratio: 0.08
173
+ weight_decay: 0
wandb/run-20251226_180613-i1cmzyri/files/output.log ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Wandb initialized: project='sft-training', name='auto-generated'
2
+ [info] Detected Mistral3 model architecture, loading with specific class
3
+ Loading weights: 100%|█| 585/585 [00:12<00:00, 46.92it/s, Materializing param=model.vision_tower.transfor
4
+ [info] Ensuring all parameters are materialized...
5
+ Traceback (most recent call last):
6
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
7
+ main()
8
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 852, in main
9
+ train_ds, eval_ds = build_datasets(cfg, tokenizer)
10
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 414, in build_datasets
11
+ ds = load_dataset("json", data_files={"train": train_path})
12
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1492, in load_dataset
13
+ builder_instance = load_dataset_builder(
14
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1137, in load_dataset_builder
15
+ dataset_module = dataset_module_factory(
16
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 913, in dataset_module_factory
17
+ ).get_module()
18
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 527, in get_module
19
+ data_files = DataFilesDict.from_patterns(
20
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 708, in from_patterns
21
+ else DataFilesList.from_patterns(
22
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 601, in from_patterns
23
+ resolve_pattern(
24
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 390, in resolve_pattern
25
+ raise FileNotFoundError(error_msg)
26
+ FileNotFoundError: Unable to find '/workspace/./sft_dataset.jsonl'
27
+ Traceback (most recent call last):
28
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
29
+ main()
30
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 852, in main
31
+ train_ds, eval_ds = build_datasets(cfg, tokenizer)
32
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 414, in build_datasets
33
+ ds = load_dataset("json", data_files={"train": train_path})
34
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1492, in load_dataset
35
+ builder_instance = load_dataset_builder(
36
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1137, in load_dataset_builder
37
+ dataset_module = dataset_module_factory(
38
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 913, in dataset_module_factory
39
+ ).get_module()
40
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 527, in get_module
41
+ data_files = DataFilesDict.from_patterns(
42
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 708, in from_patterns
43
+ else DataFilesList.from_patterns(
44
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 601, in from_patterns
45
+ resolve_pattern(
46
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 390, in resolve_pattern
47
+ raise FileNotFoundError(error_msg)
48
+ FileNotFoundError: Unable to find '/workspace/./sft_dataset.jsonl'
wandb/run-20251226_180613-i1cmzyri/files/requirements.txt ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exceptiongroup==1.3.1
2
+ wheel==0.45.1
3
+ python-dateutil==2.9.0.post0
4
+ nvidia-ml-py==13.580.82
5
+ huggingface_hub==1.2.3
6
+ idna==3.11
7
+ click==8.3.1
8
+ numpy==2.2.6
9
+ httpx==0.28.1
10
+ tokenizers==0.22.1
11
+ sympy==1.13.1
12
+ yarl==1.22.0
13
+ async-timeout==5.0.1
14
+ datasets==4.4.2
15
+ platformdirs==4.5.1
16
+ nvidia-cuda-cupti-cu12==12.1.105
17
+ nvidia-nvtx-cu12==12.1.105
18
+ smmap==5.0.2
19
+ accelerate==1.12.0
20
+ requests==2.32.5
21
+ aiohttp==3.13.2
22
+ bitsandbytes==0.49.0
23
+ nvidia-cublas-cu12==12.1.3.1
24
+ mpmath==1.3.0
25
+ torchaudio==2.5.1+cu121
26
+ nvidia-cuda-runtime-cu12==12.1.105
27
+ typing-inspection==0.4.2
28
+ GitPython==3.1.45
29
+ xxhash==3.6.0
30
+ nvidia-cusolver-cu12==11.4.5.107
31
+ pydantic_core==2.41.5
32
+ six==1.17.0
33
+ torchvision==0.20.1+cu121
34
+ typing_extensions==4.15.0
35
+ triton==3.1.0
36
+ charset-normalizer==3.4.4
37
+ nvitop==1.6.1
38
+ wandb==0.23.1
39
+ regex==2025.11.3
40
+ pip==25.3
41
+ nvidia-cusparse-cu12==12.1.0.106
42
+ pytz==2025.2
43
+ Jinja2==3.1.6
44
+ psutil==7.2.0
45
+ pillow==12.0.0
46
+ packaging==25.0
47
+ safetensors==0.7.0
48
+ sentry-sdk==2.48.0
49
+ gitdb==4.0.12
50
+ httpcore==1.0.9
51
+ setuptools==80.9.0
52
+ nvidia-cufft-cu12==11.0.2.54
53
+ anyio==4.12.0
54
+ transformers==5.0.0.dev0
55
+ pydantic==2.12.5
56
+ fsspec==2025.10.0
57
+ filelock==3.20.0
58
+ PyYAML==6.0.3
59
+ hf-xet==1.2.0
60
+ nvidia-cudnn-cu12==9.1.0.70
61
+ tqdm==4.67.1
62
+ MarkupSafe==2.1.5
63
+ attrs==25.4.0
64
+ nvidia-cuda-nvrtc-cu12==12.1.105
65
+ peft==0.18.0
66
+ aiohappyeyeballs==2.6.1
67
+ networkx==3.4.2
68
+ nvidia-nvjitlink-cu12==12.9.86
69
+ certifi==2025.11.12
70
+ pyarrow==22.0.0
71
+ dill==0.4.0
72
+ protobuf==6.33.2
73
+ aiosignal==1.4.0
74
+ frozenlist==1.8.0
75
+ urllib3==2.6.2
76
+ propcache==0.4.1
77
+ tzdata==2025.3
78
+ pandas==2.3.3
79
+ annotated-types==0.7.0
80
+ shellingham==1.5.4
81
+ nvidia-nccl-cu12==2.21.5
82
+ multidict==6.7.0
83
+ nvidia-curand-cu12==10.3.2.106
84
+ trl==0.26.2
85
+ torch==2.5.1+cu121
86
+ h11==0.16.0
87
+ multiprocess==0.70.18
88
+ typer-slim==0.21.0
89
+ wheel==0.45.1
90
+ tomli==2.0.1
91
+ autocommand==2.2.2
92
+ jaraco.context==5.3.0
93
+ zipp==3.19.2
94
+ packaging==24.2
95
+ inflect==7.3.1
96
+ typing_extensions==4.12.2
97
+ platformdirs==4.2.2
98
+ jaraco.functools==4.0.1
99
+ jaraco.collections==5.1.0
100
+ jaraco.text==3.12.1
101
+ backports.tarfile==1.2.0
102
+ more-itertools==10.3.0
103
+ importlib_metadata==8.0.0
104
+ typeguard==4.3.0
wandb/run-20251226_180613-i1cmzyri/files/wandb-metadata.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.12.46+-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.10.12",
4
+ "startedAt": "2025-12-26T18:06:13.427654Z",
5
+ "args": [
6
+ "--config",
7
+ "trainer-kit/SFT/config_instruct.yaml"
8
+ ],
9
+ "program": "/workspace/trainer-kit/SFT/run_instruct.py",
10
+ "codePath": "trainer-kit/SFT/run_instruct.py",
11
+ "codePathLocal": "trainer-kit/SFT/run_instruct.py",
12
+ "email": "shaiksirajuddin9949@gmail.com",
13
+ "root": "task2file/sft_devstral_24B_v2",
14
+ "host": "a100-2gpu-shell-session-757d587799-mfdvv",
15
+ "executable": "/workspace/llm_finetuning_env/bin/python",
16
+ "cpu_count": 12,
17
+ "cpu_count_logical": 24,
18
+ "gpu": "NVIDIA A100-SXM4-80GB",
19
+ "gpu_count": 2,
20
+ "disk": {
21
+ "/": {
22
+ "total": "791251738624",
23
+ "used": "386025496576"
24
+ }
25
+ },
26
+ "memory": {
27
+ "total": "359047892992"
28
+ },
29
+ "gpu_nvidia": [
30
+ {
31
+ "name": "NVIDIA A100-SXM4-80GB",
32
+ "memoryTotal": "85899345920",
33
+ "cudaCores": 6912,
34
+ "architecture": "Ampere",
35
+ "uuid": "GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba"
36
+ },
37
+ {
38
+ "name": "NVIDIA A100-SXM4-80GB",
39
+ "memoryTotal": "85899345920",
40
+ "cudaCores": 6912,
41
+ "architecture": "Ampere",
42
+ "uuid": "GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40"
43
+ }
44
+ ],
45
+ "cudaVersion": "13.0",
46
+ "writerId": "qgjoaibnrh2irresqyfv8dka3f0628ti"
47
+ }
wandb/run-20251226_180613-i1cmzyri/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_wandb":{"runtime":19},"_runtime":19}
wandb/run-20251226_180613-i1cmzyri/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:06:13.512689006Z","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpaql7q9l6/port-189374.txt","pid":189374,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2025-12-26T18:06:13.513337738Z","level":"INFO","msg":"server: will exit if parent process dies","ppid":189374}
3
+ {"time":"2025-12-26T18:06:13.513339589Z","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-189374-189434-2072401818/socket","Net":"unix"}}
4
+ {"time":"2025-12-26T18:06:13.697155944Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2025-12-26T18:06:13.703520039Z","level":"INFO","msg":"handleInformInit: received","streamId":"i1cmzyri","id":"1(@)"}
6
+ {"time":"2025-12-26T18:06:13.861987964Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"i1cmzyri","id":"1(@)"}
7
+ {"time":"2025-12-26T18:06:33.265409361Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2025-12-26T18:06:33.265466167Z","level":"INFO","msg":"connection: closing","id":"1(@)"}
9
+ {"time":"2025-12-26T18:06:33.265521907Z","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
10
+ {"time":"2025-12-26T18:06:33.265530544Z","level":"INFO","msg":"server is shutting down"}
11
+ {"time":"2025-12-26T18:06:33.265687334Z","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-189374-189434-2072401818/socket","Net":"unix"}}
12
+ {"time":"2025-12-26T18:06:33.573303185Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2025-12-26T18:06:33.573347477Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2025-12-26T18:06:33.573360632Z","level":"INFO","msg":"server is closed"}
wandb/run-20251226_180613-i1cmzyri/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:06:13.703642155Z","level":"INFO","msg":"stream: starting","core version":"0.23.1"}
2
+ {"time":"2025-12-26T18:06:13.861777831Z","level":"INFO","msg":"stream: created new stream","id":"i1cmzyri"}
3
+ {"time":"2025-12-26T18:06:13.861855775Z","level":"INFO","msg":"handler: started","stream_id":"i1cmzyri"}
4
+ {"time":"2025-12-26T18:06:13.861978087Z","level":"INFO","msg":"stream: started","id":"i1cmzyri"}
5
+ {"time":"2025-12-26T18:06:13.862005472Z","level":"INFO","msg":"writer: started","stream_id":"i1cmzyri"}
6
+ {"time":"2025-12-26T18:06:13.862018215Z","level":"INFO","msg":"sender: started","stream_id":"i1cmzyri"}
7
+ {"time":"2025-12-26T18:06:33.265479861Z","level":"INFO","msg":"stream: closing","id":"i1cmzyri"}
8
+ {"time":"2025-12-26T18:06:33.464312202Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2025-12-26T18:06:33.550123155Z","level":"INFO","msg":"handler: closed","stream_id":"i1cmzyri"}
10
+ {"time":"2025-12-26T18:06:33.550255641Z","level":"INFO","msg":"sender: closed","stream_id":"i1cmzyri"}
11
+ {"time":"2025-12-26T18:06:33.550271731Z","level":"INFO","msg":"stream: closed","id":"i1cmzyri"}
wandb/run-20251226_180613-i1cmzyri/logs/debug.log ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_setup.py:_flush():80] Current SDK version is 0.23.1
2
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_setup.py:_flush():80] Configure stats pid to 189374
3
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings
4
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_setup.py:_flush():80] Loading settings from /workspace/wandb/settings
5
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_init.py:setup_run_log_directory():714] Logging user logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180613-i1cmzyri/logs/debug.log
7
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_init.py:setup_run_log_directory():715] Logging internal logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180613-i1cmzyri/logs/debug-internal.log
8
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_init.py:init():841] calling init triggers
9
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_init.py:init():846] wandb.init called with sweep_config: {}
10
+ config: {'model': {'repo_id': './Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'base_local_dir': 'base_model', 'trust_remote_code': True, 'tokenizer_use_fast': True, 'device_map': 'auto', 'torch_dtype': 'bfloat16', 'use_4bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'attn_implementation': None}, 'data': {'train_jsonl': './sft_dataset.jsonl', 'eval_jsonl': None, 'eval_split_ratio': 0.1, 'instruction_field': 'instruction', 'input_field': 'input', 'output_field': 'output', 'format_type': 'custom', 'system_prompt': 'You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain the data flow and why each component must change:\n- Flow: [Input → Processing → Output with arrows]\n- For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"\n- Explain coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\nadd::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for nested items: `status::StructName::Type::Name`\n3. Always explain "must change because" and "without this"\n3. Types of components: function, struct, enum, impl, trait\n4. If there is extra information (e.g., enum variants), include that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don\'t trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\ncrates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n', 'custom_template': '##INSTRUCTION\n{instruction}<|im_end|>\n{input}<|im_end|>\n{output}<|im_end|>', 'max_length': 2048, 'shuffle': True, 'num_proc': 4}, 'peft': {'enabled': True, 'r': 8, 'lora_alpha': 16, 'lora_dropout': 0.05, 'bias': 'none', 'target_modules': 'auto'}, 'train': {'num_train_epochs': 6, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'learning_rate': '1e-4', 'weight_decay': 0.0, 'warmup_ratio': 0.08, 'lr_scheduler_type': 'cosine', 'optim': 'adamw_torch', 'max_grad_norm': 0.8, 'gradient_checkpointing': True, 'logging_steps': 2, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'evaluation_strategy': 'steps', 'eval_steps': 100, 'load_best_model_at_end': True, 'early_stopping': {'enabled': True, 'patience': 5, 'min_delta': 0.001, 'metric': 'eval_loss', 'mode': 'min'}, 'resume_from_checkpoint': 'auto'}, 'run_dir': 'task2file/sft_devstral_24B_v2', '_wandb': {}}
11
+ 2025-12-26 18:06:13,429 INFO MainThread:189374 [wandb_init.py:init():889] starting backend
12
+ 2025-12-26 18:06:13,697 INFO MainThread:189374 [wandb_init.py:init():892] sending inform_init request
13
+ 2025-12-26 18:06:13,701 INFO MainThread:189374 [wandb_init.py:init():900] backend started and connected
14
+ 2025-12-26 18:06:13,703 INFO MainThread:189374 [wandb_init.py:init():970] updated telemetry
15
+ 2025-12-26 18:06:13,704 INFO MainThread:189374 [wandb_init.py:init():994] communicating run to backend with 90.0 second timeout
16
+ 2025-12-26 18:06:14,110 INFO MainThread:189374 [wandb_init.py:init():1041] starting run threads in backend
17
+ 2025-12-26 18:06:14,225 INFO MainThread:189374 [wandb_run.py:_console_start():2521] atexit reg
18
+ 2025-12-26 18:06:14,226 INFO MainThread:189374 [wandb_run.py:_redirect():2369] redirect: wrap_raw
19
+ 2025-12-26 18:06:14,226 INFO MainThread:189374 [wandb_run.py:_redirect():2438] Wrapping output streams.
20
+ 2025-12-26 18:06:14,226 INFO MainThread:189374 [wandb_run.py:_redirect():2461] Redirects installed.
21
+ 2025-12-26 18:06:14,230 INFO MainThread:189374 [wandb_init.py:init():1081] run started, returning control to user process
22
+ 2025-12-26 18:06:33,265 INFO wandb-AsyncioManager-main:189374 [service_client.py:_forward_responses():80] Reached EOF.
23
+ 2025-12-26 18:06:33,265 INFO wandb-AsyncioManager-main:189374 [mailbox.py:close():137] Closing mailbox, abandoning 1 handles.
wandb/run-20251226_180702-oordmylf/files/config.yaml ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _wandb:
2
+ value:
3
+ cli_version: 0.23.1
4
+ e:
5
+ b1hui6k7d05xwq8cyz4bv453ig8gyf1q:
6
+ args:
7
+ - --config
8
+ - trainer-kit/SFT/config_instruct.yaml
9
+ codePath: trainer-kit/SFT/run_instruct.py
10
+ codePathLocal: trainer-kit/SFT/run_instruct.py
11
+ cpu_count: 12
12
+ cpu_count_logical: 24
13
+ cudaVersion: "13.0"
14
+ disk:
15
+ /:
16
+ total: "791251738624"
17
+ used: "386726633472"
18
+ email: shaiksirajuddin9949@gmail.com
19
+ executable: /workspace/llm_finetuning_env/bin/python
20
+ gpu: NVIDIA A100-SXM4-80GB
21
+ gpu_count: 2
22
+ gpu_nvidia:
23
+ - architecture: Ampere
24
+ cudaCores: 6912
25
+ memoryTotal: "85899345920"
26
+ name: NVIDIA A100-SXM4-80GB
27
+ uuid: GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba
28
+ - architecture: Ampere
29
+ cudaCores: 6912
30
+ memoryTotal: "85899345920"
31
+ name: NVIDIA A100-SXM4-80GB
32
+ uuid: GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40
33
+ host: a100-2gpu-shell-session-757d587799-mfdvv
34
+ memory:
35
+ total: "359047892992"
36
+ os: Linux-6.12.46+-x86_64-with-glibc2.35
37
+ program: /workspace/trainer-kit/SFT/run_instruct.py
38
+ python: CPython 3.10.12
39
+ root: task2file/sft_devstral_24B_v2
40
+ startedAt: "2025-12-26T18:07:02.184185Z"
41
+ writerId: b1hui6k7d05xwq8cyz4bv453ig8gyf1q
42
+ m: []
43
+ python_version: 3.10.12
44
+ t:
45
+ "1":
46
+ - 1
47
+ - 11
48
+ - 41
49
+ - 49
50
+ - 51
51
+ - 71
52
+ - 98
53
+ "2":
54
+ - 1
55
+ - 11
56
+ - 41
57
+ - 49
58
+ - 51
59
+ - 71
60
+ - 98
61
+ "3":
62
+ - 15
63
+ - 16
64
+ "4": 3.10.12
65
+ "5": 0.23.1
66
+ "6": 5.0.0.dev0
67
+ "12": 0.23.1
68
+ "13": linux-x86_64
69
+ data:
70
+ value:
71
+ custom_template: |-
72
+ ##INSTRUCTION
73
+ {instruction}<|im_end|>
74
+ {input}<|im_end|>
75
+ {output}<|im_end|>
76
+ eval_jsonl: null
77
+ eval_split_ratio: 0.1
78
+ format_type: custom
79
+ input_field: input
80
+ instruction_field: instruction
81
+ max_length: 2048
82
+ num_proc: 4
83
+ output_field: output
84
+ shuffle: true
85
+ system_prompt: |
86
+ You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.
87
+
88
+ ## Output Format
89
+
90
+ ##OUTPUT
91
+ Explain the data flow and why each component must change:
92
+ - Flow: [Input → Processing → Output with arrows]
93
+ - For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"
94
+ - Explain coupling between components
95
+
96
+ ##SELECT
97
+ modify::crates/path/to/file.rs::impl::ComponentName
98
+ add::crates/another/file.rs::function::AnotherComponent
99
+ <EOS>
100
+
101
+ ## Rules
102
+
103
+ 1. Use full paths: `remove::crates/folder/file.rs::Type::Name`
104
+ 2. Use `::` for nested items: `status::StructName::Type::Name`
105
+ 3. Always explain "must change because" and "without this"
106
+ 3. Types of components: function, struct, enum, impl, trait
107
+ 4. If there is extra information (e.g., enum variants), include that too.
108
+ 5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>
109
+
110
+ ## Example
111
+
112
+ ##TASK
113
+ Add webhook subscription support
114
+
115
+ ##OUTPUT
116
+ The webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don't trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.
117
+
118
+ ##SELECT
119
+ crates/common_enums/src/enums.rs::EventClass
120
+ crates/common_enums/src/transformers.rs::SubscriptionStatus
121
+ <EOS>
122
+ train_jsonl: sft_dataset.jsonl
123
+ model:
124
+ value:
125
+ attn_implementation: null
126
+ base_local_dir: base_model
127
+ bnb_4bit_compute_dtype: bfloat16
128
+ bnb_4bit_quant_type: nf4
129
+ bnb_4bit_use_double_quant: false
130
+ device_map: auto
131
+ repo_id: ./Models/Devstral-Small-2-24B-HS-CPT
132
+ revision: null
133
+ tokenizer_use_fast: true
134
+ torch_dtype: bfloat16
135
+ trust_remote_code: true
136
+ use_4bit: false
137
+ peft:
138
+ value:
139
+ bias: none
140
+ enabled: true
141
+ lora_alpha: 16
142
+ lora_dropout: 0.05
143
+ r: 8
144
+ target_modules: auto
145
+ run_dir:
146
+ value: task2file/sft_devstral_24B_v2
147
+ train:
148
+ value:
149
+ early_stopping:
150
+ enabled: true
151
+ metric: eval_loss
152
+ min_delta: 0.001
153
+ mode: min
154
+ patience: 5
155
+ eval_steps: 100
156
+ evaluation_strategy: steps
157
+ gradient_accumulation_steps: 8
158
+ gradient_checkpointing: true
159
+ learning_rate: "1e-4"
160
+ load_best_model_at_end: true
161
+ logging_steps: 2
162
+ lr_scheduler_type: cosine
163
+ max_grad_norm: 0.8
164
+ num_train_epochs: 6
165
+ optim: adamw_torch
166
+ per_device_eval_batch_size: 1
167
+ per_device_train_batch_size: 1
168
+ resume_from_checkpoint: auto
169
+ save_steps: 500
170
+ save_strategy: steps
171
+ save_total_limit: 20
172
+ warmup_ratio: 0.08
173
+ weight_decay: 0
wandb/run-20251226_180702-oordmylf/files/output.log ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Wandb initialized: project='sft-training', name='auto-generated'
2
+ [info] Detected Mistral3 model architecture, loading with specific class
3
+ Loading weights: 100%|█| 585/585 [00:12<00:00, 47.03it/s, Materializing param=model.vision_tower.transfor
4
+ [info] Ensuring all parameters are materialized...
5
+ Traceback (most recent call last):
6
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
7
+ main()
8
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 852, in main
9
+ train_ds, eval_ds = build_datasets(cfg, tokenizer)
10
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 414, in build_datasets
11
+ ds = load_dataset("json", data_files={"train": train_path})
12
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1492, in load_dataset
13
+ builder_instance = load_dataset_builder(
14
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1137, in load_dataset_builder
15
+ dataset_module = dataset_module_factory(
16
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 913, in dataset_module_factory
17
+ ).get_module()
18
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 527, in get_module
19
+ data_files = DataFilesDict.from_patterns(
20
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 708, in from_patterns
21
+ else DataFilesList.from_patterns(
22
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 601, in from_patterns
23
+ resolve_pattern(
24
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 390, in resolve_pattern
25
+ raise FileNotFoundError(error_msg)
26
+ FileNotFoundError: Unable to find '/workspace/sft_dataset.jsonl'
27
+ Traceback (most recent call last):
28
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 983, in <module>
29
+ main()
30
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 852, in main
31
+ train_ds, eval_ds = build_datasets(cfg, tokenizer)
32
+ File "/workspace/trainer-kit/SFT/run_instruct.py", line 414, in build_datasets
33
+ ds = load_dataset("json", data_files={"train": train_path})
34
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1492, in load_dataset
35
+ builder_instance = load_dataset_builder(
36
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 1137, in load_dataset_builder
37
+ dataset_module = dataset_module_factory(
38
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 913, in dataset_module_factory
39
+ ).get_module()
40
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/load.py", line 527, in get_module
41
+ data_files = DataFilesDict.from_patterns(
42
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 708, in from_patterns
43
+ else DataFilesList.from_patterns(
44
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 601, in from_patterns
45
+ resolve_pattern(
46
+ File "/workspace/llm_finetuning_env/lib/python3.10/site-packages/datasets/data_files.py", line 390, in resolve_pattern
47
+ raise FileNotFoundError(error_msg)
48
+ FileNotFoundError: Unable to find '/workspace/sft_dataset.jsonl'
wandb/run-20251226_180702-oordmylf/files/requirements.txt ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exceptiongroup==1.3.1
2
+ wheel==0.45.1
3
+ python-dateutil==2.9.0.post0
4
+ nvidia-ml-py==13.580.82
5
+ huggingface_hub==1.2.3
6
+ idna==3.11
7
+ click==8.3.1
8
+ numpy==2.2.6
9
+ httpx==0.28.1
10
+ tokenizers==0.22.1
11
+ sympy==1.13.1
12
+ yarl==1.22.0
13
+ async-timeout==5.0.1
14
+ datasets==4.4.2
15
+ platformdirs==4.5.1
16
+ nvidia-cuda-cupti-cu12==12.1.105
17
+ nvidia-nvtx-cu12==12.1.105
18
+ smmap==5.0.2
19
+ accelerate==1.12.0
20
+ requests==2.32.5
21
+ aiohttp==3.13.2
22
+ bitsandbytes==0.49.0
23
+ nvidia-cublas-cu12==12.1.3.1
24
+ mpmath==1.3.0
25
+ torchaudio==2.5.1+cu121
26
+ nvidia-cuda-runtime-cu12==12.1.105
27
+ typing-inspection==0.4.2
28
+ GitPython==3.1.45
29
+ xxhash==3.6.0
30
+ nvidia-cusolver-cu12==11.4.5.107
31
+ pydantic_core==2.41.5
32
+ six==1.17.0
33
+ torchvision==0.20.1+cu121
34
+ typing_extensions==4.15.0
35
+ triton==3.1.0
36
+ charset-normalizer==3.4.4
37
+ nvitop==1.6.1
38
+ wandb==0.23.1
39
+ regex==2025.11.3
40
+ pip==25.3
41
+ nvidia-cusparse-cu12==12.1.0.106
42
+ pytz==2025.2
43
+ Jinja2==3.1.6
44
+ psutil==7.2.0
45
+ pillow==12.0.0
46
+ packaging==25.0
47
+ safetensors==0.7.0
48
+ sentry-sdk==2.48.0
49
+ gitdb==4.0.12
50
+ httpcore==1.0.9
51
+ setuptools==80.9.0
52
+ nvidia-cufft-cu12==11.0.2.54
53
+ anyio==4.12.0
54
+ transformers==5.0.0.dev0
55
+ pydantic==2.12.5
56
+ fsspec==2025.10.0
57
+ filelock==3.20.0
58
+ PyYAML==6.0.3
59
+ hf-xet==1.2.0
60
+ nvidia-cudnn-cu12==9.1.0.70
61
+ tqdm==4.67.1
62
+ MarkupSafe==2.1.5
63
+ attrs==25.4.0
64
+ nvidia-cuda-nvrtc-cu12==12.1.105
65
+ peft==0.18.0
66
+ aiohappyeyeballs==2.6.1
67
+ networkx==3.4.2
68
+ nvidia-nvjitlink-cu12==12.9.86
69
+ certifi==2025.11.12
70
+ pyarrow==22.0.0
71
+ dill==0.4.0
72
+ protobuf==6.33.2
73
+ aiosignal==1.4.0
74
+ frozenlist==1.8.0
75
+ urllib3==2.6.2
76
+ propcache==0.4.1
77
+ tzdata==2025.3
78
+ pandas==2.3.3
79
+ annotated-types==0.7.0
80
+ shellingham==1.5.4
81
+ nvidia-nccl-cu12==2.21.5
82
+ multidict==6.7.0
83
+ nvidia-curand-cu12==10.3.2.106
84
+ trl==0.26.2
85
+ torch==2.5.1+cu121
86
+ h11==0.16.0
87
+ multiprocess==0.70.18
88
+ typer-slim==0.21.0
89
+ wheel==0.45.1
90
+ tomli==2.0.1
91
+ autocommand==2.2.2
92
+ jaraco.context==5.3.0
93
+ zipp==3.19.2
94
+ packaging==24.2
95
+ inflect==7.3.1
96
+ typing_extensions==4.12.2
97
+ platformdirs==4.2.2
98
+ jaraco.functools==4.0.1
99
+ jaraco.collections==5.1.0
100
+ jaraco.text==3.12.1
101
+ backports.tarfile==1.2.0
102
+ more-itertools==10.3.0
103
+ importlib_metadata==8.0.0
104
+ typeguard==4.3.0
wandb/run-20251226_180702-oordmylf/files/wandb-metadata.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.12.46+-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.10.12",
4
+ "startedAt": "2025-12-26T18:07:02.184185Z",
5
+ "args": [
6
+ "--config",
7
+ "trainer-kit/SFT/config_instruct.yaml"
8
+ ],
9
+ "program": "/workspace/trainer-kit/SFT/run_instruct.py",
10
+ "codePath": "trainer-kit/SFT/run_instruct.py",
11
+ "codePathLocal": "trainer-kit/SFT/run_instruct.py",
12
+ "email": "shaiksirajuddin9949@gmail.com",
13
+ "root": "task2file/sft_devstral_24B_v2",
14
+ "host": "a100-2gpu-shell-session-757d587799-mfdvv",
15
+ "executable": "/workspace/llm_finetuning_env/bin/python",
16
+ "cpu_count": 12,
17
+ "cpu_count_logical": 24,
18
+ "gpu": "NVIDIA A100-SXM4-80GB",
19
+ "gpu_count": 2,
20
+ "disk": {
21
+ "/": {
22
+ "total": "791251738624",
23
+ "used": "386726633472"
24
+ }
25
+ },
26
+ "memory": {
27
+ "total": "359047892992"
28
+ },
29
+ "gpu_nvidia": [
30
+ {
31
+ "name": "NVIDIA A100-SXM4-80GB",
32
+ "memoryTotal": "85899345920",
33
+ "cudaCores": 6912,
34
+ "architecture": "Ampere",
35
+ "uuid": "GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba"
36
+ },
37
+ {
38
+ "name": "NVIDIA A100-SXM4-80GB",
39
+ "memoryTotal": "85899345920",
40
+ "cudaCores": 6912,
41
+ "architecture": "Ampere",
42
+ "uuid": "GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40"
43
+ }
44
+ ],
45
+ "cudaVersion": "13.0",
46
+ "writerId": "b1hui6k7d05xwq8cyz4bv453ig8gyf1q"
47
+ }
wandb/run-20251226_180702-oordmylf/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"_wandb":{"runtime":19},"_runtime":19}
wandb/run-20251226_180702-oordmylf/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:07:02.269040435Z","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpn17hlpsl/port-189808.txt","pid":189808,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2025-12-26T18:07:02.269706147Z","level":"INFO","msg":"server: will exit if parent process dies","ppid":189808}
3
+ {"time":"2025-12-26T18:07:02.26970836Z","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-189808-189898-3164802245/socket","Net":"unix"}}
4
+ {"time":"2025-12-26T18:07:02.451619132Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2025-12-26T18:07:02.458183004Z","level":"INFO","msg":"handleInformInit: received","streamId":"oordmylf","id":"1(@)"}
6
+ {"time":"2025-12-26T18:07:02.645894292Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"oordmylf","id":"1(@)"}
7
+ {"time":"2025-12-26T18:07:21.83310619Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2025-12-26T18:07:21.833169189Z","level":"INFO","msg":"server is shutting down"}
9
+ {"time":"2025-12-26T18:07:21.833173516Z","level":"INFO","msg":"connection: closing","id":"1(@)"}
10
+ {"time":"2025-12-26T18:07:21.833268292Z","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-189808-189898-3164802245/socket","Net":"unix"}}
11
+ {"time":"2025-12-26T18:07:21.833292467Z","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
12
+ {"time":"2025-12-26T18:07:22.158467464Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2025-12-26T18:07:22.158501256Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2025-12-26T18:07:22.158518889Z","level":"INFO","msg":"server is closed"}
wandb/run-20251226_180702-oordmylf/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:07:02.458313987Z","level":"INFO","msg":"stream: starting","core version":"0.23.1"}
2
+ {"time":"2025-12-26T18:07:02.645556611Z","level":"INFO","msg":"stream: created new stream","id":"oordmylf"}
3
+ {"time":"2025-12-26T18:07:02.645689372Z","level":"INFO","msg":"handler: started","stream_id":"oordmylf"}
4
+ {"time":"2025-12-26T18:07:02.645880522Z","level":"INFO","msg":"stream: started","id":"oordmylf"}
5
+ {"time":"2025-12-26T18:07:02.64591008Z","level":"INFO","msg":"writer: started","stream_id":"oordmylf"}
6
+ {"time":"2025-12-26T18:07:02.64593173Z","level":"INFO","msg":"sender: started","stream_id":"oordmylf"}
7
+ {"time":"2025-12-26T18:07:21.833167126Z","level":"INFO","msg":"stream: closing","id":"oordmylf"}
8
+ {"time":"2025-12-26T18:07:22.023419324Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2025-12-26T18:07:22.157584616Z","level":"INFO","msg":"handler: closed","stream_id":"oordmylf"}
10
+ {"time":"2025-12-26T18:07:22.157683147Z","level":"INFO","msg":"sender: closed","stream_id":"oordmylf"}
11
+ {"time":"2025-12-26T18:07:22.157690597Z","level":"INFO","msg":"stream: closed","id":"oordmylf"}
wandb/run-20251226_180702-oordmylf/logs/debug.log ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-26 18:07:02,185 INFO MainThread:189808 [wandb_setup.py:_flush():80] Current SDK version is 0.23.1
2
+ 2025-12-26 18:07:02,185 INFO MainThread:189808 [wandb_setup.py:_flush():80] Configure stats pid to 189808
3
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings
4
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_setup.py:_flush():80] Loading settings from /workspace/wandb/settings
5
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_init.py:setup_run_log_directory():714] Logging user logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180702-oordmylf/logs/debug.log
7
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_init.py:setup_run_log_directory():715] Logging internal logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180702-oordmylf/logs/debug-internal.log
8
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_init.py:init():841] calling init triggers
9
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_init.py:init():846] wandb.init called with sweep_config: {}
10
+ config: {'model': {'repo_id': './Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'base_local_dir': 'base_model', 'trust_remote_code': True, 'tokenizer_use_fast': True, 'device_map': 'auto', 'torch_dtype': 'bfloat16', 'use_4bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'attn_implementation': None}, 'data': {'train_jsonl': 'sft_dataset.jsonl', 'eval_jsonl': None, 'eval_split_ratio': 0.1, 'instruction_field': 'instruction', 'input_field': 'input', 'output_field': 'output', 'format_type': 'custom', 'system_prompt': 'You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain the data flow and why each component must change:\n- Flow: [Input → Processing → Output with arrows]\n- For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"\n- Explain coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\nadd::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for nested items: `status::StructName::Type::Name`\n3. Always explain "must change because" and "without this"\n3. Types of components: function, struct, enum, impl, trait\n4. If there is extra information (e.g., enum variants), include that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don\'t trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\ncrates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n', 'custom_template': '##INSTRUCTION\n{instruction}<|im_end|>\n{input}<|im_end|>\n{output}<|im_end|>', 'max_length': 2048, 'shuffle': True, 'num_proc': 4}, 'peft': {'enabled': True, 'r': 8, 'lora_alpha': 16, 'lora_dropout': 0.05, 'bias': 'none', 'target_modules': 'auto'}, 'train': {'num_train_epochs': 6, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'learning_rate': '1e-4', 'weight_decay': 0.0, 'warmup_ratio': 0.08, 'lr_scheduler_type': 'cosine', 'optim': 'adamw_torch', 'max_grad_norm': 0.8, 'gradient_checkpointing': True, 'logging_steps': 2, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'evaluation_strategy': 'steps', 'eval_steps': 100, 'load_best_model_at_end': True, 'early_stopping': {'enabled': True, 'patience': 5, 'min_delta': 0.001, 'metric': 'eval_loss', 'mode': 'min'}, 'resume_from_checkpoint': 'auto'}, 'run_dir': 'task2file/sft_devstral_24B_v2', '_wandb': {}}
11
+ 2025-12-26 18:07:02,186 INFO MainThread:189808 [wandb_init.py:init():889] starting backend
12
+ 2025-12-26 18:07:02,451 INFO MainThread:189808 [wandb_init.py:init():892] sending inform_init request
13
+ 2025-12-26 18:07:02,456 INFO MainThread:189808 [wandb_init.py:init():900] backend started and connected
14
+ 2025-12-26 18:07:02,459 INFO MainThread:189808 [wandb_init.py:init():970] updated telemetry
15
+ 2025-12-26 18:07:02,460 INFO MainThread:189808 [wandb_init.py:init():994] communicating run to backend with 90.0 second timeout
16
+ 2025-12-26 18:07:02,828 INFO MainThread:189808 [wandb_init.py:init():1041] starting run threads in backend
17
+ 2025-12-26 18:07:02,938 INFO MainThread:189808 [wandb_run.py:_console_start():2521] atexit reg
18
+ 2025-12-26 18:07:02,938 INFO MainThread:189808 [wandb_run.py:_redirect():2369] redirect: wrap_raw
19
+ 2025-12-26 18:07:02,938 INFO MainThread:189808 [wandb_run.py:_redirect():2438] Wrapping output streams.
20
+ 2025-12-26 18:07:02,938 INFO MainThread:189808 [wandb_run.py:_redirect():2461] Redirects installed.
21
+ 2025-12-26 18:07:02,942 INFO MainThread:189808 [wandb_init.py:init():1081] run started, returning control to user process
22
+ 2025-12-26 18:07:21,833 INFO wandb-AsyncioManager-main:189808 [service_client.py:_forward_responses():80] Reached EOF.
23
+ 2025-12-26 18:07:21,833 INFO wandb-AsyncioManager-main:189808 [mailbox.py:close():137] Closing mailbox, abandoning 1 handles.
wandb/run-20251226_180808-ny9q48hd/files/config.yaml ADDED
@@ -0,0 +1,630 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: Models/Devstral-Small-2-24B-HS-CPT
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.23.1
6
+ e:
7
+ k9a7zk6glrk7p1134kg1guh8b7acqlob:
8
+ args:
9
+ - --config
10
+ - trainer-kit/SFT/config_instruct.yaml
11
+ codePath: trainer-kit/SFT/run_instruct.py
12
+ codePathLocal: trainer-kit/SFT/run_instruct.py
13
+ cpu_count: 12
14
+ cpu_count_logical: 24
15
+ cudaVersion: "13.0"
16
+ disk:
17
+ /:
18
+ total: "791251738624"
19
+ used: "387681259520"
20
+ email: shaiksirajuddin9949@gmail.com
21
+ executable: /workspace/llm_finetuning_env/bin/python
22
+ gpu: NVIDIA A100-SXM4-80GB
23
+ gpu_count: 2
24
+ gpu_nvidia:
25
+ - architecture: Ampere
26
+ cudaCores: 6912
27
+ memoryTotal: "85899345920"
28
+ name: NVIDIA A100-SXM4-80GB
29
+ uuid: GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba
30
+ - architecture: Ampere
31
+ cudaCores: 6912
32
+ memoryTotal: "85899345920"
33
+ name: NVIDIA A100-SXM4-80GB
34
+ uuid: GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40
35
+ host: a100-2gpu-shell-session-757d587799-mfdvv
36
+ memory:
37
+ total: "359047892992"
38
+ os: Linux-6.12.46+-x86_64-with-glibc2.35
39
+ program: /workspace/trainer-kit/SFT/run_instruct.py
40
+ python: CPython 3.10.12
41
+ root: task2file/sft_devstral_24B_v2
42
+ startedAt: "2025-12-26T18:08:08.383305Z"
43
+ writerId: k9a7zk6glrk7p1134kg1guh8b7acqlob
44
+ m:
45
+ - "1": train/global_step
46
+ "6":
47
+ - 3
48
+ "7": []
49
+ - "2": '*'
50
+ "5": 1
51
+ "6":
52
+ - 1
53
+ "7": []
54
+ python_version: 3.10.12
55
+ t:
56
+ "1":
57
+ - 1
58
+ - 11
59
+ - 41
60
+ - 49
61
+ - 51
62
+ - 71
63
+ - 98
64
+ "2":
65
+ - 1
66
+ - 11
67
+ - 41
68
+ - 49
69
+ - 51
70
+ - 71
71
+ - 98
72
+ "3":
73
+ - 2
74
+ - 7
75
+ - 15
76
+ - 16
77
+ - 19
78
+ - 62
79
+ - 66
80
+ "4": 3.10.12
81
+ "5": 0.23.1
82
+ "6": 5.0.0.dev0
83
+ "9":
84
+ "1": transformers_trainer
85
+ "12": 0.23.1
86
+ "13": linux-x86_64
87
+ accelerator_config:
88
+ value:
89
+ dispatch_batches: null
90
+ even_batches: true
91
+ gradient_accumulation_kwargs: null
92
+ non_blocking: false
93
+ split_batches: false
94
+ use_seedable_sampler: true
95
+ adam_beta1:
96
+ value: 0.9
97
+ adam_beta2:
98
+ value: 0.999
99
+ adam_epsilon:
100
+ value: 1e-08
101
+ add_cross_attention:
102
+ value: false
103
+ architectures:
104
+ value:
105
+ - Mistral3ForConditionalGeneration
106
+ auto_find_batch_size:
107
+ value: false
108
+ average_tokens_across_devices:
109
+ value: true
110
+ batch_eval_metrics:
111
+ value: false
112
+ bf16:
113
+ value: true
114
+ bf16_full_eval:
115
+ value: false
116
+ bos_token_id:
117
+ value: null
118
+ chunk_size_feed_forward:
119
+ value: 0
120
+ cross_attention_hidden_size:
121
+ value: null
122
+ data:
123
+ value:
124
+ custom_template: |-
125
+ ##INSTRUCTION
126
+ {instruction}<|im_end|>
127
+ {input}<|im_end|>
128
+ {output}<|im_end|>
129
+ eval_jsonl: null
130
+ eval_split_ratio: 0.1
131
+ format_type: custom
132
+ input_field: input
133
+ instruction_field: instruction
134
+ max_length: 2048
135
+ num_proc: 4
136
+ output_field: output
137
+ shuffle: true
138
+ system_prompt: |
139
+ You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.
140
+
141
+ ## Output Format
142
+
143
+ ##OUTPUT
144
+ Explain the data flow and why each component must change:
145
+ - Flow: [Input → Processing → Output with arrows]
146
+ - For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"
147
+ - Explain coupling between components
148
+
149
+ ##SELECT
150
+ modify::crates/path/to/file.rs::impl::ComponentName
151
+ add::crates/another/file.rs::function::AnotherComponent
152
+ <EOS>
153
+
154
+ ## Rules
155
+
156
+ 1. Use full paths: `remove::crates/folder/file.rs::Type::Name`
157
+ 2. Use `::` for nested items: `status::StructName::Type::Name`
158
+ 3. Always explain "must change because" and "without this"
159
+ 3. Types of components: function, struct, enum, impl, trait
160
+ 4. If there is extra information (e.g., enum variants), include that too.
161
+ 5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>
162
+
163
+ ## Example
164
+
165
+ ##TASK
166
+ Add webhook subscription support
167
+
168
+ ##OUTPUT
169
+ The webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don't trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.
170
+
171
+ ##SELECT
172
+ crates/common_enums/src/enums.rs::EventClass
173
+ crates/common_enums/src/transformers.rs::SubscriptionStatus
174
+ <EOS>
175
+ train_jsonl: sft_dataset.jsonl
176
+ data_seed:
177
+ value: null
178
+ dataloader_drop_last:
179
+ value: false
180
+ dataloader_num_workers:
181
+ value: 0
182
+ dataloader_persistent_workers:
183
+ value: false
184
+ dataloader_pin_memory:
185
+ value: true
186
+ dataloader_prefetch_factor:
187
+ value: null
188
+ ddp_backend:
189
+ value: null
190
+ ddp_broadcast_buffers:
191
+ value: null
192
+ ddp_bucket_cap_mb:
193
+ value: null
194
+ ddp_find_unused_parameters:
195
+ value: null
196
+ ddp_timeout:
197
+ value: 1800
198
+ debug:
199
+ value: []
200
+ decoder_start_token_id:
201
+ value: null
202
+ deepspeed:
203
+ value: null
204
+ disable_tqdm:
205
+ value: false
206
+ do_eval:
207
+ value: true
208
+ do_predict:
209
+ value: false
210
+ do_train:
211
+ value: false
212
+ dtype:
213
+ value: bfloat16
214
+ enable_jit_checkpoint:
215
+ value: false
216
+ eos_token_id:
217
+ value: null
218
+ eval_accumulation_steps:
219
+ value: null
220
+ eval_delay:
221
+ value: 0
222
+ eval_do_concat_batches:
223
+ value: true
224
+ eval_on_start:
225
+ value: false
226
+ eval_steps:
227
+ value: 100
228
+ eval_strategy:
229
+ value: steps
230
+ eval_use_gather_object:
231
+ value: false
232
+ finetuning_task:
233
+ value: null
234
+ fp16:
235
+ value: false
236
+ fp16_full_eval:
237
+ value: false
238
+ fsdp:
239
+ value: []
240
+ fsdp_config:
241
+ value:
242
+ min_num_params: 0
243
+ xla: false
244
+ xla_fsdp_grad_ckpt: false
245
+ xla_fsdp_v2: false
246
+ full_determinism:
247
+ value: false
248
+ gradient_accumulation_steps:
249
+ value: 8
250
+ gradient_checkpointing:
251
+ value: false
252
+ gradient_checkpointing_kwargs:
253
+ value: null
254
+ greater_is_better:
255
+ value: false
256
+ group_by_length:
257
+ value: false
258
+ hub_always_push:
259
+ value: false
260
+ hub_model_id:
261
+ value: null
262
+ hub_private_repo:
263
+ value: null
264
+ hub_revision:
265
+ value: null
266
+ hub_strategy:
267
+ value: every_save
268
+ hub_token:
269
+ value: <HUB_TOKEN>
270
+ id2label:
271
+ value:
272
+ "0": LABEL_0
273
+ "1": LABEL_1
274
+ ignore_data_skip:
275
+ value: false
276
+ image_token_index:
277
+ value: 10
278
+ include_for_metrics:
279
+ value: []
280
+ include_num_input_tokens_seen:
281
+ value: "no"
282
+ is_decoder:
283
+ value: false
284
+ is_encoder_decoder:
285
+ value: false
286
+ label_names:
287
+ value: null
288
+ label_smoothing_factor:
289
+ value: 0
290
+ label2id:
291
+ value:
292
+ LABEL_0: 0
293
+ LABEL_1: 1
294
+ learning_rate:
295
+ value: 0.0001
296
+ length_column_name:
297
+ value: length
298
+ liger_kernel_config:
299
+ value: null
300
+ load_best_model_at_end:
301
+ value: true
302
+ local_rank:
303
+ value: -1
304
+ log_level:
305
+ value: passive
306
+ log_level_replica:
307
+ value: warning
308
+ log_on_each_node:
309
+ value: true
310
+ logging_dir:
311
+ value: null
312
+ logging_first_step:
313
+ value: false
314
+ logging_nan_inf_filter:
315
+ value: true
316
+ logging_steps:
317
+ value: 2
318
+ logging_strategy:
319
+ value: steps
320
+ lr_scheduler_kwargs:
321
+ value: null
322
+ lr_scheduler_type:
323
+ value: cosine
324
+ max_grad_norm:
325
+ value: 0.8
326
+ max_steps:
327
+ value: -1
328
+ metric_for_best_model:
329
+ value: eval_loss
330
+ model:
331
+ value:
332
+ attn_implementation: null
333
+ base_local_dir: base_model
334
+ bnb_4bit_compute_dtype: bfloat16
335
+ bnb_4bit_quant_type: nf4
336
+ bnb_4bit_use_double_quant: false
337
+ device_map: auto
338
+ repo_id: ./Models/Devstral-Small-2-24B-HS-CPT
339
+ revision: null
340
+ tokenizer_use_fast: true
341
+ torch_dtype: bfloat16
342
+ trust_remote_code: true
343
+ use_4bit: false
344
+ model/num_parameters:
345
+ value: 24022764544
346
+ model_type:
347
+ value: mistral3
348
+ multimodal_projector_bias:
349
+ value: false
350
+ neftune_noise_alpha:
351
+ value: null
352
+ num_train_epochs:
353
+ value: 6
354
+ optim:
355
+ value: adamw_torch
356
+ optim_args:
357
+ value: null
358
+ optim_target_modules:
359
+ value: null
360
+ output_attentions:
361
+ value: false
362
+ output_dir:
363
+ value: task2file/sft_devstral_24B_v2/checkpoints
364
+ output_hidden_states:
365
+ value: false
366
+ pad_token_id:
367
+ value: null
368
+ parallelism_config:
369
+ value: null
370
+ peft:
371
+ value:
372
+ bias: none
373
+ enabled: true
374
+ lora_alpha: 16
375
+ lora_dropout: 0.05
376
+ r: 8
377
+ target_modules: auto
378
+ peft_config:
379
+ value:
380
+ default:
381
+ alora_invocation_tokens: null
382
+ arrow_config: null
383
+ auto_mapping: null
384
+ base_model_name_or_path: Models/Devstral-Small-2-24B-HS-CPT
385
+ bias: none
386
+ corda_config: null
387
+ ensure_weight_tying: false
388
+ eva_config: null
389
+ exclude_modules: null
390
+ fan_in_fan_out: false
391
+ inference_mode: false
392
+ init_lora_weights: true
393
+ layer_replication: null
394
+ layers_pattern: null
395
+ layers_to_transform: null
396
+ lora_alpha: 16
397
+ lora_bias: false
398
+ lora_dropout: 0.05
399
+ megatron_config: null
400
+ megatron_core: megatron.core
401
+ modules_to_save: null
402
+ peft_type: LORA
403
+ peft_version: 0.18.0
404
+ qalora_group_size: 16
405
+ r: 8
406
+ revision: null
407
+ runtime_config:
408
+ ephemeral_gpu_offload: false
409
+ target_modules:
410
+ - v_proj
411
+ - q_proj
412
+ - o_proj
413
+ - k_proj
414
+ target_parameters: null
415
+ task_type: CAUSAL_LM
416
+ trainable_token_indices: null
417
+ use_dora: false
418
+ use_qalora: false
419
+ use_rslora: false
420
+ per_device_eval_batch_size:
421
+ value: 1
422
+ per_device_train_batch_size:
423
+ value: 1
424
+ prediction_loss_only:
425
+ value: false
426
+ prefix:
427
+ value: null
428
+ problem_type:
429
+ value: null
430
+ project:
431
+ value: huggingface
432
+ projector_hidden_act:
433
+ value: gelu
434
+ push_to_hub:
435
+ value: false
436
+ remove_unused_columns:
437
+ value: false
438
+ report_to:
439
+ value:
440
+ - wandb
441
+ restore_callback_states_from_checkpoint:
442
+ value: false
443
+ resume_from_checkpoint:
444
+ value: null
445
+ return_dict:
446
+ value: true
447
+ run_dir:
448
+ value: task2file/sft_devstral_24B_v2
449
+ run_name:
450
+ value: null
451
+ save_on_each_node:
452
+ value: false
453
+ save_only_model:
454
+ value: false
455
+ save_steps:
456
+ value: 500
457
+ save_strategy:
458
+ value: steps
459
+ save_total_limit:
460
+ value: 20
461
+ seed:
462
+ value: 42
463
+ sep_token_id:
464
+ value: null
465
+ skip_memory_metrics:
466
+ value: true
467
+ spatial_merge_size:
468
+ value: 2
469
+ task_specific_params:
470
+ value: null
471
+ text_config:
472
+ value:
473
+ _name_or_path: ""
474
+ add_cross_attention: false
475
+ architectures: null
476
+ attention_dropout: 0
477
+ bos_token_id: 1
478
+ chunk_size_feed_forward: 0
479
+ cross_attention_hidden_size: null
480
+ decoder_start_token_id: null
481
+ dtype: bfloat16
482
+ eos_token_id: 2
483
+ finetuning_task: null
484
+ head_dim: 128
485
+ hidden_act: silu
486
+ hidden_size: 5120
487
+ id2label:
488
+ "0": LABEL_0
489
+ "1": LABEL_1
490
+ initializer_range: 0.02
491
+ intermediate_size: 32768
492
+ is_decoder: false
493
+ is_encoder_decoder: false
494
+ label2id:
495
+ LABEL_0: 0
496
+ LABEL_1: 1
497
+ max_position_embeddings: 393216
498
+ model_type: ministral3
499
+ num_attention_heads: 32
500
+ num_hidden_layers: 40
501
+ num_key_value_heads: 8
502
+ output_attentions: false
503
+ output_hidden_states: false
504
+ pad_token_id: 11
505
+ prefix: null
506
+ problem_type: null
507
+ return_dict: true
508
+ rms_norm_eps: 1e-05
509
+ rope_parameters:
510
+ beta_fast: 32
511
+ beta_slow: 1
512
+ factor: 48
513
+ llama_4_scaling_beta: 0.1
514
+ mscale: 1
515
+ mscale_all_dim: 1
516
+ original_max_position_embeddings: 8192
517
+ rope_theta: 1e+08
518
+ rope_type: yarn
519
+ type: yarn
520
+ sep_token_id: null
521
+ sliding_window: null
522
+ task_specific_params: null
523
+ tie_word_embeddings: false
524
+ tokenizer_class: null
525
+ use_cache: true
526
+ vocab_size: 131072
527
+ tf32:
528
+ value: null
529
+ tie_word_embeddings:
530
+ value: false
531
+ tokenizer_class:
532
+ value: null
533
+ torch_compile:
534
+ value: false
535
+ torch_compile_backend:
536
+ value: null
537
+ torch_compile_mode:
538
+ value: null
539
+ torch_empty_cache_steps:
540
+ value: null
541
+ trackio_space_id:
542
+ value: trackio
543
+ train:
544
+ value:
545
+ early_stopping:
546
+ enabled: true
547
+ metric: eval_loss
548
+ min_delta: 0.001
549
+ mode: min
550
+ patience: 5
551
+ eval_steps: 100
552
+ evaluation_strategy: steps
553
+ gradient_accumulation_steps: 8
554
+ gradient_checkpointing: true
555
+ learning_rate: "1e-4"
556
+ load_best_model_at_end: true
557
+ logging_steps: 2
558
+ lr_scheduler_type: cosine
559
+ max_grad_norm: 0.8
560
+ num_train_epochs: 6
561
+ optim: adamw_torch
562
+ per_device_eval_batch_size: 1
563
+ per_device_train_batch_size: 1
564
+ resume_from_checkpoint: auto
565
+ save_steps: 500
566
+ save_strategy: steps
567
+ save_total_limit: 20
568
+ warmup_ratio: 0.08
569
+ weight_decay: 0
570
+ transformers_version:
571
+ value: 5.0.0.dev0
572
+ use_cache:
573
+ value: false
574
+ use_cpu:
575
+ value: false
576
+ use_liger_kernel:
577
+ value: false
578
+ vision_config:
579
+ value:
580
+ _name_or_path: ""
581
+ add_cross_attention: false
582
+ architectures: null
583
+ attention_dropout: 0
584
+ bos_token_id: null
585
+ chunk_size_feed_forward: 0
586
+ cross_attention_hidden_size: null
587
+ decoder_start_token_id: null
588
+ dtype: bfloat16
589
+ eos_token_id: null
590
+ finetuning_task: null
591
+ head_dim: 64
592
+ hidden_act: silu
593
+ hidden_size: 1024
594
+ id2label:
595
+ "0": LABEL_0
596
+ "1": LABEL_1
597
+ image_size: 1540
598
+ initializer_range: 0.02
599
+ intermediate_size: 4096
600
+ is_decoder: false
601
+ is_encoder_decoder: false
602
+ label2id:
603
+ LABEL_0: 0
604
+ LABEL_1: 1
605
+ model_type: pixtral
606
+ num_attention_heads: 16
607
+ num_channels: 3
608
+ num_hidden_layers: 24
609
+ output_attentions: false
610
+ output_hidden_states: false
611
+ pad_token_id: null
612
+ patch_size: 14
613
+ prefix: null
614
+ problem_type: null
615
+ return_dict: true
616
+ rope_parameters:
617
+ rope_theta: 10000
618
+ rope_type: default
619
+ sep_token_id: null
620
+ task_specific_params: null
621
+ tie_word_embeddings: true
622
+ tokenizer_class: null
623
+ vision_feature_layer:
624
+ value: -1
625
+ warmup_ratio:
626
+ value: 0.08
627
+ warmup_steps:
628
+ value: 0.08
629
+ weight_decay:
630
+ value: 0
wandb/run-20251226_180808-ny9q48hd/files/output.log ADDED
The diff for this file is too large to render. See raw diff
 
wandb/run-20251226_180808-ny9q48hd/files/requirements.txt ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exceptiongroup==1.3.1
2
+ wheel==0.45.1
3
+ python-dateutil==2.9.0.post0
4
+ nvidia-ml-py==13.580.82
5
+ huggingface_hub==1.2.3
6
+ idna==3.11
7
+ click==8.3.1
8
+ numpy==2.2.6
9
+ httpx==0.28.1
10
+ tokenizers==0.22.1
11
+ sympy==1.13.1
12
+ yarl==1.22.0
13
+ async-timeout==5.0.1
14
+ datasets==4.4.2
15
+ platformdirs==4.5.1
16
+ nvidia-cuda-cupti-cu12==12.1.105
17
+ nvidia-nvtx-cu12==12.1.105
18
+ smmap==5.0.2
19
+ accelerate==1.12.0
20
+ requests==2.32.5
21
+ aiohttp==3.13.2
22
+ bitsandbytes==0.49.0
23
+ nvidia-cublas-cu12==12.1.3.1
24
+ mpmath==1.3.0
25
+ torchaudio==2.5.1+cu121
26
+ nvidia-cuda-runtime-cu12==12.1.105
27
+ typing-inspection==0.4.2
28
+ GitPython==3.1.45
29
+ xxhash==3.6.0
30
+ nvidia-cusolver-cu12==11.4.5.107
31
+ pydantic_core==2.41.5
32
+ six==1.17.0
33
+ torchvision==0.20.1+cu121
34
+ typing_extensions==4.15.0
35
+ triton==3.1.0
36
+ charset-normalizer==3.4.4
37
+ nvitop==1.6.1
38
+ wandb==0.23.1
39
+ regex==2025.11.3
40
+ pip==25.3
41
+ nvidia-cusparse-cu12==12.1.0.106
42
+ pytz==2025.2
43
+ Jinja2==3.1.6
44
+ psutil==7.2.0
45
+ pillow==12.0.0
46
+ packaging==25.0
47
+ safetensors==0.7.0
48
+ sentry-sdk==2.48.0
49
+ gitdb==4.0.12
50
+ httpcore==1.0.9
51
+ setuptools==80.9.0
52
+ nvidia-cufft-cu12==11.0.2.54
53
+ anyio==4.12.0
54
+ transformers==5.0.0.dev0
55
+ pydantic==2.12.5
56
+ fsspec==2025.10.0
57
+ filelock==3.20.0
58
+ PyYAML==6.0.3
59
+ hf-xet==1.2.0
60
+ nvidia-cudnn-cu12==9.1.0.70
61
+ tqdm==4.67.1
62
+ MarkupSafe==2.1.5
63
+ attrs==25.4.0
64
+ nvidia-cuda-nvrtc-cu12==12.1.105
65
+ peft==0.18.0
66
+ aiohappyeyeballs==2.6.1
67
+ networkx==3.4.2
68
+ nvidia-nvjitlink-cu12==12.9.86
69
+ certifi==2025.11.12
70
+ pyarrow==22.0.0
71
+ dill==0.4.0
72
+ protobuf==6.33.2
73
+ aiosignal==1.4.0
74
+ frozenlist==1.8.0
75
+ urllib3==2.6.2
76
+ propcache==0.4.1
77
+ tzdata==2025.3
78
+ pandas==2.3.3
79
+ annotated-types==0.7.0
80
+ shellingham==1.5.4
81
+ nvidia-nccl-cu12==2.21.5
82
+ multidict==6.7.0
83
+ nvidia-curand-cu12==10.3.2.106
84
+ trl==0.26.2
85
+ torch==2.5.1+cu121
86
+ h11==0.16.0
87
+ multiprocess==0.70.18
88
+ typer-slim==0.21.0
89
+ wheel==0.45.1
90
+ tomli==2.0.1
91
+ autocommand==2.2.2
92
+ jaraco.context==5.3.0
93
+ zipp==3.19.2
94
+ packaging==24.2
95
+ inflect==7.3.1
96
+ typing_extensions==4.12.2
97
+ platformdirs==4.2.2
98
+ jaraco.functools==4.0.1
99
+ jaraco.collections==5.1.0
100
+ jaraco.text==3.12.1
101
+ backports.tarfile==1.2.0
102
+ more-itertools==10.3.0
103
+ importlib_metadata==8.0.0
104
+ typeguard==4.3.0
wandb/run-20251226_180808-ny9q48hd/files/wandb-metadata.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.12.46+-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.10.12",
4
+ "startedAt": "2025-12-26T18:08:08.383305Z",
5
+ "args": [
6
+ "--config",
7
+ "trainer-kit/SFT/config_instruct.yaml"
8
+ ],
9
+ "program": "/workspace/trainer-kit/SFT/run_instruct.py",
10
+ "codePath": "trainer-kit/SFT/run_instruct.py",
11
+ "codePathLocal": "trainer-kit/SFT/run_instruct.py",
12
+ "email": "shaiksirajuddin9949@gmail.com",
13
+ "root": "task2file/sft_devstral_24B_v2",
14
+ "host": "a100-2gpu-shell-session-757d587799-mfdvv",
15
+ "executable": "/workspace/llm_finetuning_env/bin/python",
16
+ "cpu_count": 12,
17
+ "cpu_count_logical": 24,
18
+ "gpu": "NVIDIA A100-SXM4-80GB",
19
+ "gpu_count": 2,
20
+ "disk": {
21
+ "/": {
22
+ "total": "791251738624",
23
+ "used": "387681259520"
24
+ }
25
+ },
26
+ "memory": {
27
+ "total": "359047892992"
28
+ },
29
+ "gpu_nvidia": [
30
+ {
31
+ "name": "NVIDIA A100-SXM4-80GB",
32
+ "memoryTotal": "85899345920",
33
+ "cudaCores": 6912,
34
+ "architecture": "Ampere",
35
+ "uuid": "GPU-989794b0-ec3b-13bf-db9f-3fbe341497ba"
36
+ },
37
+ {
38
+ "name": "NVIDIA A100-SXM4-80GB",
39
+ "memoryTotal": "85899345920",
40
+ "cudaCores": 6912,
41
+ "architecture": "Ampere",
42
+ "uuid": "GPU-3790aa64-60ef-9eac-b0b1-b278ee8c0d40"
43
+ }
44
+ ],
45
+ "cudaVersion": "13.0",
46
+ "writerId": "k9a7zk6glrk7p1134kg1guh8b7acqlob"
47
+ }
wandb/run-20251226_180808-ny9q48hd/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eval/runtime":511.6513,"_timestamp":1.766894424089547e+09,"train/learning_rate":5.0960525730763455e-05,"train_runtime":121379.3179,"_wandb":{"runtime":122035},"eval/samples_per_second":4.118,"train_loss":0.6813117427967097,"total_flos":7.892056292508187e+18,"train/loss":0.5559571981430054,"eval/steps_per_second":4.118,"_step":3877,"train/global_step":7600,"train_samples_per_second":0.937,"train/grad_norm":1.2778210639953613,"train_steps_per_second":0.117,"eval/loss":0.6706293225288391,"_runtime":122035,"train/epoch":3.2067510548523206}
wandb/run-20251226_180808-ny9q48hd/logs/debug-core.log ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:08:08.471552693Z","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmp_2pxhrkk/port-190322.txt","pid":190322,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2025-12-26T18:08:08.472277502Z","level":"INFO","msg":"server: will exit if parent process dies","ppid":190322}
3
+ {"time":"2025-12-26T18:08:08.472253441Z","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-190322-190397-1549185738/socket","Net":"unix"}}
4
+ {"time":"2025-12-26T18:08:08.653761101Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2025-12-26T18:08:08.660887612Z","level":"INFO","msg":"handleInformInit: received","streamId":"ny9q48hd","id":"1(@)"}
6
+ {"time":"2025-12-26T18:08:08.822064564Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"ny9q48hd","id":"1(@)"}
7
+ {"time":"2025-12-28T04:02:05.051125329Z","level":"INFO","msg":"handleInformFinish: finish message received","streamId":"ny9q48hd","id":"1(@)"}
8
+ {"time":"2025-12-28T04:02:05.052538266Z","level":"INFO","msg":"handleInformFinish: stream closed","streamId":"ny9q48hd","id":"1(@)"}
9
+ {"time":"2025-12-28T04:02:05.107259931Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
10
+ {"time":"2025-12-28T04:02:05.107301964Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
11
+ {"time":"2025-12-28T04:02:05.107312563Z","level":"INFO","msg":"connection: closing","id":"1(@)"}
12
+ {"time":"2025-12-28T04:02:05.107351058Z","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
13
+ {"time":"2025-12-28T04:02:05.107355378Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2025-12-28T04:02:05.107365658Z","level":"INFO","msg":"server is shutting down"}
15
+ {"time":"2025-12-28T04:02:05.107515239Z","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-190322-190397-1549185738/socket","Net":"unix"}}
16
+ {"time":"2025-12-28T04:02:05.107566032Z","level":"INFO","msg":"server is closed"}
wandb/run-20251226_180808-ny9q48hd/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2025-12-26T18:08:08.66103332Z","level":"INFO","msg":"stream: starting","core version":"0.23.1"}
2
+ {"time":"2025-12-26T18:08:08.82172381Z","level":"INFO","msg":"stream: created new stream","id":"ny9q48hd"}
3
+ {"time":"2025-12-26T18:08:08.821819478Z","level":"INFO","msg":"handler: started","stream_id":"ny9q48hd"}
4
+ {"time":"2025-12-26T18:08:08.822049155Z","level":"INFO","msg":"stream: started","id":"ny9q48hd"}
5
+ {"time":"2025-12-26T18:08:08.822072296Z","level":"INFO","msg":"writer: started","stream_id":"ny9q48hd"}
6
+ {"time":"2025-12-26T18:08:08.822098276Z","level":"INFO","msg":"sender: started","stream_id":"ny9q48hd"}
7
+ {"time":"2025-12-28T04:02:04.935383596Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
8
+ {"time":"2025-12-28T04:02:05.045953421Z","level":"INFO","msg":"handler: operation stats","stats":{}}
9
+ {"time":"2025-12-28T04:02:05.051806259Z","level":"INFO","msg":"stream: closing","id":"ny9q48hd"}
10
+ {"time":"2025-12-28T04:02:05.051833004Z","level":"INFO","msg":"handler: closed","stream_id":"ny9q48hd"}
11
+ {"time":"2025-12-28T04:02:05.051917075Z","level":"INFO","msg":"sender: closed","stream_id":"ny9q48hd"}
12
+ {"time":"2025-12-28T04:02:05.051937152Z","level":"INFO","msg":"stream: closed","id":"ny9q48hd"}
wandb/run-20251226_180808-ny9q48hd/logs/debug.log ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Current SDK version is 0.23.1
2
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Configure stats pid to 190322
3
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings
4
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from /workspace/wandb/settings
5
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:setup_run_log_directory():714] Logging user logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180808-ny9q48hd/logs/debug.log
7
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:setup_run_log_directory():715] Logging internal logs to task2file/sft_devstral_24B_v2/wandb/run-20251226_180808-ny9q48hd/logs/debug-internal.log
8
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():841] calling init triggers
9
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():846] wandb.init called with sweep_config: {}
10
+ config: {'model': {'repo_id': './Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'base_local_dir': 'base_model', 'trust_remote_code': True, 'tokenizer_use_fast': True, 'device_map': 'auto', 'torch_dtype': 'bfloat16', 'use_4bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': False, 'bnb_4bit_compute_dtype': 'bfloat16', 'attn_implementation': None}, 'data': {'train_jsonl': 'sft_dataset.jsonl', 'eval_jsonl': None, 'eval_split_ratio': 0.1, 'instruction_field': 'instruction', 'input_field': 'input', 'output_field': 'output', 'format_type': 'custom', 'system_prompt': 'You are a Hyperswitch Rust code analyzer. Identify functions/structs that need modification for a given task.\n\n## Output Format\n\n##OUTPUT\nExplain the data flow and why each component must change:\n- Flow: [Input → Processing → Output with arrows]\n- For each component: "The [ComponentName] ([path]) must [action] because [reason]—without this, [consequence]"\n- Explain coupling between components\n\n##SELECT\nmodify::crates/path/to/file.rs::impl::ComponentName\nadd::crates/another/file.rs::function::AnotherComponent\n<EOS>\n\n## Rules\n\n1. Use full paths: `remove::crates/folder/file.rs::Type::Name`\n2. Use `::` for nested items: `status::StructName::Type::Name`\n3. Always explain "must change because" and "without this"\n3. Types of components: function, struct, enum, impl, trait\n4. If there is extra information (e.g., enum variants), include that too.\n5. Start with ##OUTPUT, end with ##SELECT, terminate with <EOS>\n\n## Example\n\n##TASK\nAdd webhook subscription support\n\n##OUTPUT\nThe webhook system routes events via EventClass enum. Flow: webhook → EventClass → handler → processing. The EventClass enum (crates/common_enums/src/enums.rs::EventClass) must add Subscriptions variant because it defines event routing—without this, subscription events cannot be processed. The SubscriptionStatus impl (crates/common_enums/src/transformers.rs::SubscriptionStatus) must map to EventType because it converts status to events—without this, status changes don\'t trigger webhooks. These are coupled: EventClass routes to handlers that use SubscriptionStatus mappings.\n\n##SELECT\ncrates/common_enums/src/enums.rs::EventClass\ncrates/common_enums/src/transformers.rs::SubscriptionStatus\n<EOS>\n', 'custom_template': '##INSTRUCTION\n{instruction}<|im_end|>\n{input}<|im_end|>\n{output}<|im_end|>', 'max_length': 2048, 'shuffle': True, 'num_proc': 4}, 'peft': {'enabled': True, 'r': 8, 'lora_alpha': 16, 'lora_dropout': 0.05, 'bias': 'none', 'target_modules': 'auto'}, 'train': {'num_train_epochs': 6, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'learning_rate': '1e-4', 'weight_decay': 0.0, 'warmup_ratio': 0.08, 'lr_scheduler_type': 'cosine', 'optim': 'adamw_torch', 'max_grad_norm': 0.8, 'gradient_checkpointing': True, 'logging_steps': 2, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'evaluation_strategy': 'steps', 'eval_steps': 100, 'load_best_model_at_end': True, 'early_stopping': {'enabled': True, 'patience': 5, 'min_delta': 0.001, 'metric': 'eval_loss', 'mode': 'min'}, 'resume_from_checkpoint': 'auto'}, 'run_dir': 'task2file/sft_devstral_24B_v2', '_wandb': {}}
11
+ 2025-12-26 18:08:08,385 INFO MainThread:190322 [wandb_init.py:init():889] starting backend
12
+ 2025-12-26 18:08:08,653 INFO MainThread:190322 [wandb_init.py:init():892] sending inform_init request
13
+ 2025-12-26 18:08:08,658 INFO MainThread:190322 [wandb_init.py:init():900] backend started and connected
14
+ 2025-12-26 18:08:08,661 INFO MainThread:190322 [wandb_init.py:init():970] updated telemetry
15
+ 2025-12-26 18:08:08,662 INFO MainThread:190322 [wandb_init.py:init():994] communicating run to backend with 90.0 second timeout
16
+ 2025-12-26 18:08:09,021 INFO MainThread:190322 [wandb_init.py:init():1041] starting run threads in backend
17
+ 2025-12-26 18:08:09,134 INFO MainThread:190322 [wandb_run.py:_console_start():2521] atexit reg
18
+ 2025-12-26 18:08:09,134 INFO MainThread:190322 [wandb_run.py:_redirect():2369] redirect: wrap_raw
19
+ 2025-12-26 18:08:09,135 INFO MainThread:190322 [wandb_run.py:_redirect():2438] Wrapping output streams.
20
+ 2025-12-26 18:08:09,135 INFO MainThread:190322 [wandb_run.py:_redirect():2461] Redirects installed.
21
+ 2025-12-26 18:08:09,138 INFO MainThread:190322 [wandb_init.py:init():1081] run started, returning control to user process
22
+ 2025-12-26 18:08:52,955 INFO MainThread:190322 [wandb_run.py:_config_callback():1396] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.0', 'base_model_name_or_path': 'Models/Devstral-Small-2-24B-HS-CPT', 'revision': None, 'inference_mode': False, 'r': 8, 'target_modules': ['v_proj', 'q_proj', 'o_proj', 'k_proj'], 'exclude_modules': None, 'lora_alpha': 16, 'lora_dropout': 0.05, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'image_token_index': 10, 'projector_hidden_act': 'gelu', 'vision_feature_layer': -1, 'vision_config': {'hidden_size': 1024, 'intermediate_size': 4096, 'num_hidden_layers': 24, 'num_attention_heads': 16, 'num_channels': 3, 'patch_size': 14, 'image_size': 1540, 'attention_dropout': 0.0, 'hidden_act': 'silu', 'head_dim': 64, 'initializer_range': 0.02, 'rope_parameters': {'rope_theta': 10000.0, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': '', 'model_type': 'pixtral', 'output_attentions': False}, 'text_config': {'vocab_size': 131072, 'max_position_embeddings': 393216, 'hidden_size': 5120, 'intermediate_size': 32768, 'num_hidden_layers': 40, 'num_attention_heads': 32, 'sliding_window': None, 'head_dim': 128, 'num_key_value_heads': 8, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'use_cache': True, 'attention_dropout': 0.0, 'rope_parameters': {'beta_fast': 32.0, 'beta_slow': 1.0, 'factor': 48.0, 'llama_4_scaling_beta': 0.1, 'mscale': 1.0, 'mscale_all_dim': 1.0, 'original_max_position_embeddings': 8192, 'rope_theta': 100000000.0, 'rope_type': 'yarn', 'type': 'yarn'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': None, 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 11, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': '', 'model_type': 'ministral3', 'output_attentions': False}, 'multimodal_projector_bias': False, 'spatial_merge_size': 2, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'architectures': ['Mistral3ForConditionalGeneration'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': None, 'eos_token_id': None, 'sep_token_id': None, 'decoder_start_token_id': None, '_name_or_path': 'Models/Devstral-Small-2-24B-HS-CPT', 'transformers_version': '5.0.0.dev0', 'model_type': 'mistral3', 'use_cache': False, 'output_attentions': False, 'output_dir': 'task2file/sft_devstral_24B_v2/checkpoints', 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'gradient_accumulation_steps': 8, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 0.0001, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 0.8, 'num_train_epochs': 6.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.08, 'warmup_steps': 0.08, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 2, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 500, 'save_total_limit': 20, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 100, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'eval_loss', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True}
23
+ 2025-12-26 18:08:52,965 INFO MainThread:190322 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 24022764544 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7b8940b75420>>
24
+ 2025-12-26 18:08:52,965 INFO MainThread:190322 [wandb_run.py:_config_callback():1396] config_cb model/num_parameters 24022764544 None
25
+ 2025-12-28 04:02:04,643 INFO MainThread:190322 [wandb_run.py:_finish():2287] finishing run sirajuddin-shaik-007/sft-training/ny9q48hd
26
+ 2025-12-28 04:02:04,645 INFO MainThread:190322 [wandb_run.py:_atexit_cleanup():2486] got exitcode: 0
27
+ 2025-12-28 04:02:04,646 INFO MainThread:190322 [wandb_run.py:_restore():2468] restore
28
+ 2025-12-28 04:02:04,646 INFO MainThread:190322 [wandb_run.py:_restore():2474] restore done
29
+ 2025-12-28 04:02:05,050 INFO MainThread:190322 [wandb_run.py:_footer_sync_info():3862] logging synced files