nvan13 commited on
Commit
46caca0
·
verified ·
1 Parent(s): 2005915

Upload folder using huggingface_hub

Browse files
Files changed (36) hide show
  1. .gitattributes +3 -0
  2. llama/output/cms3/ft/README.md +202 -0
  3. llama/output/cms3/ft/adapter_config.json +23 -0
  4. llama/output/cms3/ft/adapter_model.safetensors +3 -0
  5. llama/output/cms3/ft/added_tokens.json +3 -0
  6. llama/output/cms3/ft/special_tokens_map.json +30 -0
  7. llama/output/cms3/ft/tokenizer.json +0 -0
  8. llama/output/cms3/ft/tokenizer.model +3 -0
  9. llama/output/cms3/ft/tokenizer_config.json +51 -0
  10. llama/output/cpr2/ft/adapter_config.json +3 -3
  11. llama/output/cpr2/ft/adapter_model.safetensors +2 -2
  12. llama/test.sh +1 -1
  13. llama/tune.sh +36 -7
  14. llama/wandb/debug-internal.log +12 -0
  15. llama/wandb/debug.log +26 -0
  16. llama/wandb/offline-run-20260113_162154-a4ea78sb/files/requirements.txt +199 -0
  17. llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug-core.log +14 -0
  18. llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug-internal.log +12 -0
  19. llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug.log +26 -0
  20. llama/wandb/offline-run-20260113_162154-a4ea78sb/run-a4ea78sb.wandb +3 -0
  21. llama/wandb/offline-run-20260113_213836-a3j2m1nj/files/requirements.txt +199 -0
  22. llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug-core.log +14 -0
  23. llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug-internal.log +12 -0
  24. llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug.log +26 -0
  25. llama/wandb/offline-run-20260113_213836-a3j2m1nj/run-a3j2m1nj.wandb +3 -0
  26. llama/wandb/offline-run-20260114_165804-73rsvobf/files/requirements.txt +199 -0
  27. llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug-core.log +14 -0
  28. llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug-internal.log +12 -0
  29. llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug.log +26 -0
  30. llama/wandb/offline-run-20260114_165804-73rsvobf/run-73rsvobf.wandb +0 -0
  31. llama/wandb/offline-run-20260114_173548-7ubed6qe/files/requirements.txt +199 -0
  32. llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-core.log +14 -0
  33. llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-internal.log +12 -0
  34. llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug.log +26 -0
  35. llama/wandb/offline-run-20260114_173548-7ubed6qe/run-7ubed6qe.wandb +3 -0
  36. llama/wandb/settings +3 -0
.gitattributes CHANGED
@@ -39,3 +39,6 @@ generation/control/ControlNet/font/DejaVuSans.ttf filter=lfs diff=lfs merge=lfs
39
  generation/control/ControlNet/ldm/modules/image_degradation/utils/test.png filter=lfs diff=lfs merge=lfs -text
40
  llama/data/MetaMathQA-40K.json filter=lfs diff=lfs merge=lfs -text
41
  llama/data/MetaMathQA.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
39
  generation/control/ControlNet/ldm/modules/image_degradation/utils/test.png filter=lfs diff=lfs merge=lfs -text
40
  llama/data/MetaMathQA-40K.json filter=lfs diff=lfs merge=lfs -text
41
  llama/data/MetaMathQA.json filter=lfs diff=lfs merge=lfs -text
42
+ llama/wandb/offline-run-20260113_162154-a4ea78sb/run-a4ea78sb.wandb filter=lfs diff=lfs merge=lfs -text
43
+ llama/wandb/offline-run-20260113_213836-a3j2m1nj/run-a3j2m1nj.wandb filter=lfs diff=lfs merge=lfs -text
44
+ llama/wandb/offline-run-20260114_173548-7ubed6qe/run-7ubed6qe.wandb filter=lfs diff=lfs merge=lfs -text
llama/output/cms3/ft/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-2-7b-hf
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
llama/output/cms3/ft/adapter_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-2-7b-hf",
5
+ "block_share": false,
6
+ "coft": false,
7
+ "eps": 0.0001,
8
+ "inference_mode": true,
9
+ "init_weights": true,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "module_dropout": 0.0,
13
+ "modules_to_save": null,
14
+ "peft_type": "OFT",
15
+ "r": 32,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "v_proj",
20
+ "q_proj"
21
+ ],
22
+ "task_type": "CAUSAL_LM"
23
+ }
llama/output/cms3/ft/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08bbd4b1c59c317764a1e946eff8aab4249c6aba2195d714ba8f6900b906f691
3
+ size 1082171824
llama/output/cms3/ft/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[PAD]": 32000
3
+ }
llama/output/cms3/ft/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "</s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "</s>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
llama/output/cms3/ft/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
llama/output/cms3/ft/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
llama/output/cms3/ft/tokenizer_config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "32000": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ }
38
+ },
39
+ "bos_token": "</s>",
40
+ "clean_up_tokenization_spaces": false,
41
+ "eos_token": "</s>",
42
+ "extra_special_tokens": {},
43
+ "legacy": false,
44
+ "model_max_length": 512,
45
+ "pad_token": "[PAD]",
46
+ "padding_side": "right",
47
+ "sp_model_kwargs": {},
48
+ "tokenizer_class": "LlamaTokenizer",
49
+ "unk_token": "</s>",
50
+ "use_default_system_prompt": false
51
+ }
llama/output/cpr2/ft/adapter_config.json CHANGED
@@ -12,12 +12,12 @@
12
  "module_dropout": 0.0,
13
  "modules_to_save": null,
14
  "peft_type": "OFT",
15
- "r": 1,
16
  "rank_pattern": {},
17
  "revision": null,
18
  "target_modules": [
19
- "q_proj",
20
- "v_proj"
21
  ],
22
  "task_type": "CAUSAL_LM"
23
  }
 
12
  "module_dropout": 0.0,
13
  "modules_to_save": null,
14
  "peft_type": "OFT",
15
+ "r": 32,
16
  "rank_pattern": {},
17
  "revision": null,
18
  "target_modules": [
19
+ "v_proj",
20
+ "q_proj"
21
  ],
22
  "task_type": "CAUSAL_LM"
23
  }
llama/output/cpr2/ft/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:daede58d9fd4806298d90f9af12ba478c119afab844244f355f35ab3829eb029
3
- size 1049665904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e77e7b826764e15763f84d06752d0dae06c0a6b97003d20593a0070567e417a
3
+ size 1082171824
llama/test.sh CHANGED
@@ -2,7 +2,7 @@ BASE_MODEL="meta-llama/Llama-2-7b-hf"
2
  # OUTPUT="output/cp3e5"
3
  # OUTPUT="output/cp1e5N"
4
  # OUTPUT="output/cp1e5N"
5
- OUTPUT="output/cpr2"
6
  python merge_adapter_to_base_model.py --base_mode $BASE_MODEL --adapter $OUTPUT/ft/ --output_path $OUTPUT/merged/
7
  python inference/gsm8k_inference.py --model $OUTPUT/merged/
8
  python inference/MATH_inference.py --model $OUTPUT/merged/
 
2
  # OUTPUT="output/cp3e5"
3
  # OUTPUT="output/cp1e5N"
4
  # OUTPUT="output/cp1e5N"
5
+ OUTPUT="output/cms3"
6
  python merge_adapter_to_base_model.py --base_mode $BASE_MODEL --adapter $OUTPUT/ft/ --output_path $OUTPUT/merged/
7
  python inference/gsm8k_inference.py --model $OUTPUT/merged/
8
  python inference/MATH_inference.py --model $OUTPUT/merged/
llama/tune.sh CHANGED
@@ -89,11 +89,39 @@ export WANDB_PROJECT="HRA_MetaMath395"
89
  # --report_to "wandb"
90
  # wandb sync wandb/latest-run
91
 
92
- OUTPUT="output/cpr2"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  python finetune_32.py \
94
  --model_name_or_path $BASE_MODEL \
95
  --output_dir $OUTPUT \
96
- --hrft_r 1 \
97
  --init_a 1e-4 \
98
  --eps 1e-4 \
99
  --add_orth "none" \
@@ -101,13 +129,13 @@ python finetune_32.py \
101
  --data_path $DATA_PATH \
102
  --dataset_split "train"\
103
  --dataset_field query response \
104
- --num_train_epochs 3 \
105
- --per_device_train_batch_size 32 \
106
- --gradient_accumulation_steps 1 \
107
  --save_strategy "steps" \
108
  --save_steps 0 \
109
  --save_total_limit 1 \
110
- --learning_rate 3e-5 \
111
  --weight_decay 0. \
112
  --warmup_ratio 0.005 \
113
  --lr_scheduler_type "cosine" \
@@ -115,4 +143,5 @@ python finetune_32.py \
115
  --bf16 True \
116
  --tf32 True \
117
  --report_to "wandb"
118
- wandb sync wandb/latest-run
 
 
89
  # --report_to "wandb"
90
  # wandb sync wandb/latest-run
91
 
92
+ # OUTPUT="output/cpr2"
93
+ # python finetune_32.py \
94
+ # --model_name_or_path $BASE_MODEL \
95
+ # --output_dir $OUTPUT \
96
+ # --hrft_r 1 \
97
+ # --init_a 1e-4 \
98
+ # --eps 1e-4 \
99
+ # --add_orth "none" \
100
+ # --lamda 1e-4 \
101
+ # --data_path $DATA_PATH \
102
+ # --dataset_split "train"\
103
+ # --dataset_field query response \
104
+ # --num_train_epochs 3 \
105
+ # --per_device_train_batch_size 32 \
106
+ # --gradient_accumulation_steps 1 \
107
+ # --save_strategy "steps" \
108
+ # --save_steps 0 \
109
+ # --save_total_limit 1 \
110
+ # --learning_rate 3e-5 \
111
+ # --weight_decay 0. \
112
+ # --warmup_ratio 0.005 \
113
+ # --lr_scheduler_type "cosine" \
114
+ # --logging_steps 200 \
115
+ # --bf16 True \
116
+ # --tf32 True \
117
+ # --report_to "wandb"
118
+ # wandb sync wandb/latest-run
119
+
120
+ OUTPUT="output/cms3"
121
  python finetune_32.py \
122
  --model_name_or_path $BASE_MODEL \
123
  --output_dir $OUTPUT \
124
+ --hrft_r 32 \
125
  --init_a 1e-4 \
126
  --eps 1e-4 \
127
  --add_orth "none" \
 
129
  --data_path $DATA_PATH \
130
  --dataset_split "train"\
131
  --dataset_field query response \
132
+ --num_train_epochs 2 \
133
+ --per_device_train_batch_size 8 \
134
+ --gradient_accumulation_steps 4 \
135
  --save_strategy "steps" \
136
  --save_steps 0 \
137
  --save_total_limit 1 \
138
+ --learning_rate 1e-5 \
139
  --weight_decay 0. \
140
  --warmup_ratio 0.005 \
141
  --lr_scheduler_type "cosine" \
 
143
  --bf16 True \
144
  --tf32 True \
145
  --report_to "wandb"
146
+ date +"%F %T"
147
+ # wandb sync wandb/latest-run
llama/wandb/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-14T17:35:49.006544401+09:00","level":"INFO","msg":"stream: starting","core version":"0.23.0"}
2
+ {"time":"2026-01-14T17:35:49.149824363+09:00","level":"WARN","msg":"featurechecker: GraphQL client is nil, skipping feature loading"}
3
+ {"time":"2026-01-14T17:35:49.149873743+09:00","level":"INFO","msg":"stream: created new stream","id":"7ubed6qe"}
4
+ {"time":"2026-01-14T17:35:49.149898431+09:00","level":"INFO","msg":"handler: started","stream_id":"7ubed6qe"}
5
+ {"time":"2026-01-14T17:35:49.151612025+09:00","level":"INFO","msg":"stream: started","id":"7ubed6qe"}
6
+ {"time":"2026-01-14T17:35:49.151616181+09:00","level":"INFO","msg":"writer: started","stream_id":"7ubed6qe"}
7
+ {"time":"2026-01-14T17:35:49.151631131+09:00","level":"INFO","msg":"sender: started","stream_id":"7ubed6qe"}
8
+ {"time":"2026-01-14T17:35:49.152375031+09:00","level":"WARN","msg":"runupserter: server does not expand metric globs but the x_server_side_expand_glob_metrics setting is set; ignoring"}
9
+ {"time":"2026-01-14T21:46:20.466650711+09:00","level":"INFO","msg":"stream: closing","id":"7ubed6qe"}
10
+ {"time":"2026-01-14T21:46:20.466968327+09:00","level":"INFO","msg":"handler: closed","stream_id":"7ubed6qe"}
11
+ {"time":"2026-01-14T21:46:20.468836334+09:00","level":"INFO","msg":"sender: closed","stream_id":"7ubed6qe"}
12
+ {"time":"2026-01-14T21:46:20.468852562+09:00","level":"INFO","msg":"stream: closed","id":"7ubed6qe"}
llama/wandb/debug.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0
2
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Configure stats pid to 1752905
3
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from /home/work/.config/wandb/settings
4
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from /home/work/an_nguyen/HRA/llama/wandb/settings
5
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug.log
7
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-internal.log
8
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():840] calling init triggers
9
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():845] wandb.init called with sweep_config: {}
10
+ config: {'_wandb': {}}
11
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():888] starting backend
12
+ 2026-01-14 17:35:48,984 INFO MainThread:1752905 [wandb_init.py:init():891] sending inform_init request
13
+ 2026-01-14 17:35:48,997 INFO MainThread:1752905 [wandb_init.py:init():899] backend started and connected
14
+ 2026-01-14 17:35:48,998 INFO MainThread:1752905 [wandb_init.py:init():969] updated telemetry
15
+ 2026-01-14 17:35:48,999 INFO MainThread:1752905 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout
16
+ 2026-01-14 17:35:49,154 INFO MainThread:1752905 [wandb_init.py:init():1040] starting run threads in backend
17
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_console_start():2504] atexit reg
18
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2352] redirect: wrap_raw
19
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2421] Wrapping output streams.
20
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2444] Redirects installed.
21
+ 2026-01-14 17:35:49,259 INFO MainThread:1752905 [wandb_init.py:init():1080] run started, returning control to user process
22
+ 2026-01-14 17:35:49,260 INFO MainThread:1752905 [wandb_run.py:_config_callback():1385] config_cb None None {'peft_config': {'default': {'peft_type': 'OFT', 'auto_mapping': None, 'base_model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'revision': None, 'task_type': 'CAUSAL_LM', 'inference_mode': False, 'rank_pattern': {}, 'alpha_pattern': {}, 'r': 32, 'module_dropout': 0.0, 'target_modules': ['v_proj', 'q_proj'], 'init_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'modules_to_save': None, 'coft': False, 'eps': 0.0001, 'block_share': False}}, 'vocab_size': 32001, 'max_position_embeddings': 4096, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'num_key_value_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 128, 'return_dict': True, 'output_hidden_states': False, 'torchscript': False, 'dtype': 'float32', 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': 32000, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'num_beam_groups': 1, 'diversity_penalty': 0.0, '_name_or_path': 'meta-llama/Llama-2-7b-hf', 'transformers_version': '4.57.3', 'model_type': 'llama', 'tf_legacy_loss': False, 'use_bfloat16': False, 'output_attentions': False, 'output_dir': 'output/cms3', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 4, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 1e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 2.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.005, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'output/cms3/runs/Jan14_17-35-42_main1', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 200, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 0.0, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': True, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'adapter_name_or_path': None, 'data_path': './data/MetaMathQA-40K.json', 'dataset_split': 'train', 'dataset_field': ['query', 'response'], 'model_max_length': 512, 'hrft_r': 32, 'init_a': 0.0001, 'eps': 0.0001, 'lamda': 0.0001, 'add_orth': 'none', 'init_weights': True}
23
+ 2026-01-14 17:35:49,269 INFO MainThread:1752905 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 6746812416 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f1ba1cabf40>>
24
+ 2026-01-14 17:35:49,269 INFO MainThread:1752905 [wandb_run.py:_config_callback():1385] config_cb model/num_parameters 6746812416 None
25
+ 2026-01-14 21:46:20,466 INFO wandb-AsyncioManager-main:1752905 [service_client.py:_forward_responses():80] Reached EOF.
26
+ 2026-01-14 21:46:20,467 INFO wandb-AsyncioManager-main:1752905 [mailbox.py:close():137] Closing mailbox, abandoning 0 handles.
llama/wandb/offline-run-20260113_162154-a4ea78sb/files/requirements.txt ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ setuptools==80.9.0
2
+ wheel==0.45.1
3
+ pip==25.3
4
+ Brotli==1.1.0
5
+ certifi==2025.11.12
6
+ charset-normalizer==3.4.4
7
+ filelock==3.20.0
8
+ hpack==4.1.0
9
+ hyperframe==6.1.0
10
+ idna==3.11
11
+ MarkupSafe==3.0.3
12
+ mpmath==1.3.0
13
+ networkx==3.4.2
14
+ pycparser==2.22
15
+ PySocks==1.7.1
16
+ PyYAML==6.0.3
17
+ typing_extensions==4.15.0
18
+ cffi==2.0.0
19
+ gmpy2==2.2.1
20
+ h2==4.3.0
21
+ Jinja2==3.1.6
22
+ sympy==1.14.0
23
+ zstandard==0.23.0
24
+ urllib3==2.5.0
25
+ requests==2.32.5
26
+ appdirs==1.4.4
27
+ rich-toolkit==0.17.0
28
+ torchaudio==2.9.0
29
+ triton==3.5.0
30
+ tqdm==4.67.1
31
+ safetensors==0.7.0
32
+ regex==2025.11.3
33
+ packaging==25.0
34
+ hf-xet==1.2.0
35
+ hf-xet==1.2.1
36
+ huggingface_hub==0.36.0
37
+ tokenizers==0.22.1
38
+ pytz==2025.2
39
+ xxhash==3.6.0
40
+ tzdata==2025.2
41
+ six==1.17.0
42
+ pyarrow-hotfix==0.7
43
+ pyarrow==22.0.0
44
+ pyarrow==21.0.0
45
+ propcache==0.4.1
46
+ propcache==0.3.1
47
+ multidict==6.7.0
48
+ multidict==6.6.3
49
+ aiohappyeyeballs==2.6.1
50
+ fsspec==2024.3.1
51
+ fsspec==2025.10.0
52
+ frozenlist==1.8.0
53
+ frozenlist==1.7.0
54
+ dill==0.3.8
55
+ dill==0.4.0
56
+ attrs==25.4.0
57
+ async-timeout==5.0.1
58
+ yarl==1.22.0
59
+ python-dateutil==2.9.0.post0
60
+ multiprocess==0.70.16
61
+ multiprocess==0.70.18
62
+ aiosignal==1.4.0
63
+ pandas==2.3.3
64
+ aiohttp==3.13.2
65
+ pycountry==24.6.1
66
+ psutil==7.1.3
67
+ accelerate==1.12.0
68
+ peft==0.10.0
69
+ Pygments==2.19.2
70
+ colorama==0.4.6
71
+ shellingham==1.5.4
72
+ sniffio==1.3.1
73
+ exceptiongroup==1.3.1
74
+ h11==0.16.0
75
+ typer-slim==0.20.0
76
+ anyio==4.12.0
77
+ httpcore==1.0.9
78
+ httpx==0.28.1
79
+ datasets==4.4.1
80
+ ninja==1.13.0
81
+ docker-pycreds==0.4.0
82
+ eval_type_backport==0.3.1
83
+ platformdirs==4.5.0
84
+ sentry-sdk==2.47.0
85
+ annotated-types==0.7.0
86
+ typing-inspection==0.4.2
87
+ smmap==5.0.2
88
+ gitdb==4.0.12
89
+ GitPython==3.1.45
90
+ protobuf==6.31.1
91
+ setproctitle==1.3.6
92
+ pydantic_core==2.41.5
93
+ pydantic==2.12.5
94
+ wandb==0.23.0
95
+ jsonlines==4.0.0
96
+ supervisor==4.3.0
97
+ py-cpuinfo==9.0.0
98
+ nvidia-ml-py==13.580.82
99
+ nvidia-cusparselt-cu12==0.7.1
100
+ fastrlock==0.8.3
101
+ websockets==15.0.1
102
+ uvloop==0.22.1
103
+ tomli==2.3.0
104
+ tabulate==0.9.0
105
+ sentencepiece==0.2.1
106
+ rpds-py==0.30.0
107
+ rignore==0.7.6
108
+ pyzmq==27.1.0
109
+ python-multipart==0.0.20
110
+ python-json-logger==4.0.0
111
+ python-dotenv==1.2.1
112
+ pybase64==1.4.2
113
+ prometheus_client==0.23.1
114
+ starlette==0.50.0
115
+ pillow==12.0.0
116
+ partial-json-parser==0.2.1.1.post7
117
+ outlines_core==0.2.11
118
+ nvidia-nvtx-cu12==12.8.90
119
+ nvidia-nvshmem-cu12==3.3.20
120
+ nvidia-nvjitlink-cu12==12.8.93
121
+ nvidia-nccl-cu12==2.27.5
122
+ nvidia-curand-cu12==10.3.9.90
123
+ nvidia-cufile-cu12==1.13.1.3
124
+ nvidia-cudnn-frontend==1.16.0
125
+ nvidia-cuda-runtime-cu12==12.8.90
126
+ nvidia-cuda-nvrtc-cu12==12.8.93
127
+ nvidia-cuda-cupti-cu12==12.8.90
128
+ nvidia-cublas-cu12==12.8.4.1
129
+ numpy==2.2.6
130
+ msgspec==0.20.0
131
+ msgpack==1.1.2
132
+ mdurl==0.1.2
133
+ loguru==0.7.3
134
+ llvmlite==0.44.0
135
+ llguidance==1.3.0
136
+ lark==1.2.2
137
+ jmespath==1.0.1
138
+ jiter==0.12.0
139
+ interegular==0.3.3
140
+ httptools==0.7.1
141
+ fastar==0.8.0
142
+ einops==0.8.1
143
+ docstring_parser==0.17.0
144
+ dnspython==2.8.0
145
+ distro==1.9.0
146
+ diskcache==5.6.3
147
+ cuda-pathfinder==1.3.3
148
+ cloudpickle==3.1.2
149
+ rich==14.2.0
150
+ click==8.2.1
151
+ cbor2==5.7.1
152
+ cachetools==6.2.2
153
+ blake3==1.0.8
154
+ astor==0.8.1
155
+ apache-tvm-ffi==0.1.4
156
+ annotated-doc==0.0.4
157
+ uvicorn==0.38.0
158
+ tiktoken==0.12.0
159
+ scipy==1.15.3
160
+ referencing==0.37.0
161
+ opencv-python-headless==4.12.0.88
162
+ nvidia-cusparse-cu12==12.5.8.93
163
+ nvidia-cufft-cu12==11.3.3.83
164
+ nvidia-cudnn-cu12==9.10.2.21
165
+ numba==0.61.2
166
+ markdown-it-py==4.0.0
167
+ gguf==0.17.1
168
+ email-validator==2.3.0
169
+ depyf==0.20.0
170
+ cupy-cuda12x==13.6.0
171
+ cuda-bindings==13.1.0
172
+ watchfiles==1.1.1
173
+ pydantic-extra-types==2.10.6
174
+ openai-harmony==0.0.8
175
+ nvidia-cusolver-cu12==11.7.3.90
176
+ lm-format-enforcer==0.11.3
177
+ jsonschema-specifications==2025.9.1
178
+ cuda-python==13.1.0
179
+ typer==0.20.0
180
+ transformers==4.57.3
181
+ torch==2.9.0
182
+ prometheus-fastapi-instrumentator==7.1.0
183
+ openai==2.9.0
184
+ nvidia-cutlass-dsl==4.3.2
185
+ jsonschema==4.25.1
186
+ fastapi==0.123.10
187
+ anthropic==0.71.0
188
+ xgrammar==0.1.27
189
+ torchvision==0.24.0
190
+ ray==2.52.1
191
+ model-hosting-container-standards==0.1.9
192
+ mistral_common==1.8.6
193
+ flashinfer-python==0.5.3
194
+ fastapi-cloud-cli==0.6.0
195
+ fastapi-cli==0.0.16
196
+ compressed-tensors==0.12.2
197
+ vllm==0.12.0
198
+ Fraction==2.2.0
199
+ DeBERTa==0.1.13
llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-13T16:21:54.904171277+09:00","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmp76uloz6z/port-478237.txt","pid":478237,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2026-01-13T16:21:54.904639602+09:00","level":"INFO","msg":"server: will exit if parent process dies","ppid":478237}
3
+ {"time":"2026-01-13T16:21:54.904620011+09:00","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-478237-478489-4043973307/socket","Net":"unix"}}
4
+ {"time":"2026-01-13T16:21:55.082336692+09:00","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2026-01-13T16:21:55.096600338+09:00","level":"INFO","msg":"handleInformInit: received","streamId":"a4ea78sb","id":"1(@)"}
6
+ {"time":"2026-01-13T16:21:55.246575771+09:00","level":"INFO","msg":"handleInformInit: stream started","streamId":"a4ea78sb","id":"1(@)"}
7
+ {"time":"2026-01-13T20:27:02.083772511+09:00","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2026-01-13T20:27:02.083834376+09:00","level":"INFO","msg":"server is shutting down"}
9
+ {"time":"2026-01-13T20:27:02.083834598+09:00","level":"INFO","msg":"connection: closing","id":"1(@)"}
10
+ {"time":"2026-01-13T20:27:02.083885849+09:00","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
11
+ {"time":"2026-01-13T20:27:02.083928772+09:00","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-478237-478489-4043973307/socket","Net":"unix"}}
12
+ {"time":"2026-01-13T20:27:02.084663238+09:00","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2026-01-13T20:27:02.084673431+09:00","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2026-01-13T20:27:02.084679026+09:00","level":"INFO","msg":"server is closed"}
llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-13T16:21:55.104436934+09:00","level":"INFO","msg":"stream: starting","core version":"0.23.0"}
2
+ {"time":"2026-01-13T16:21:55.245591634+09:00","level":"WARN","msg":"featurechecker: GraphQL client is nil, skipping feature loading"}
3
+ {"time":"2026-01-13T16:21:55.245629072+09:00","level":"INFO","msg":"stream: created new stream","id":"a4ea78sb"}
4
+ {"time":"2026-01-13T16:21:55.245660605+09:00","level":"INFO","msg":"handler: started","stream_id":"a4ea78sb"}
5
+ {"time":"2026-01-13T16:21:55.246570364+09:00","level":"INFO","msg":"stream: started","id":"a4ea78sb"}
6
+ {"time":"2026-01-13T16:21:55.246576469+09:00","level":"INFO","msg":"writer: started","stream_id":"a4ea78sb"}
7
+ {"time":"2026-01-13T16:21:55.24658713+09:00","level":"INFO","msg":"sender: started","stream_id":"a4ea78sb"}
8
+ {"time":"2026-01-13T16:21:55.246982093+09:00","level":"WARN","msg":"runupserter: server does not expand metric globs but the x_server_side_expand_glob_metrics setting is set; ignoring"}
9
+ {"time":"2026-01-13T20:27:02.083832579+09:00","level":"INFO","msg":"stream: closing","id":"a4ea78sb"}
10
+ {"time":"2026-01-13T20:27:02.083995312+09:00","level":"INFO","msg":"handler: closed","stream_id":"a4ea78sb"}
11
+ {"time":"2026-01-13T20:27:02.084380422+09:00","level":"INFO","msg":"sender: closed","stream_id":"a4ea78sb"}
12
+ {"time":"2026-01-13T20:27:02.084390327+09:00","level":"INFO","msg":"stream: closed","id":"a4ea78sb"}
llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0
2
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_setup.py:_flush():80] Configure stats pid to 478237
3
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_setup.py:_flush():80] Loading settings from /home/work/.config/wandb/settings
4
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_setup.py:_flush():80] Loading settings from /home/work/an_nguyen/HRA/llama/wandb/settings
5
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug.log
7
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260113_162154-a4ea78sb/logs/debug-internal.log
8
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_init.py:init():840] calling init triggers
9
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_init.py:init():845] wandb.init called with sweep_config: {}
10
+ config: {'_wandb': {}}
11
+ 2026-01-13 16:21:54,830 INFO MainThread:478237 [wandb_init.py:init():888] starting backend
12
+ 2026-01-13 16:21:55,082 INFO MainThread:478237 [wandb_init.py:init():891] sending inform_init request
13
+ 2026-01-13 16:21:55,095 INFO MainThread:478237 [wandb_init.py:init():899] backend started and connected
14
+ 2026-01-13 16:21:55,096 INFO MainThread:478237 [wandb_init.py:init():969] updated telemetry
15
+ 2026-01-13 16:21:55,096 INFO MainThread:478237 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout
16
+ 2026-01-13 16:21:55,248 INFO MainThread:478237 [wandb_init.py:init():1040] starting run threads in backend
17
+ 2026-01-13 16:21:55,351 INFO MainThread:478237 [wandb_run.py:_console_start():2504] atexit reg
18
+ 2026-01-13 16:21:55,351 INFO MainThread:478237 [wandb_run.py:_redirect():2352] redirect: wrap_raw
19
+ 2026-01-13 16:21:55,351 INFO MainThread:478237 [wandb_run.py:_redirect():2421] Wrapping output streams.
20
+ 2026-01-13 16:21:55,351 INFO MainThread:478237 [wandb_run.py:_redirect():2444] Redirects installed.
21
+ 2026-01-13 16:21:55,352 INFO MainThread:478237 [wandb_init.py:init():1080] run started, returning control to user process
22
+ 2026-01-13 16:21:55,354 INFO MainThread:478237 [wandb_run.py:_config_callback():1385] config_cb None None {'peft_config': {'default': {'peft_type': 'OFT', 'auto_mapping': None, 'base_model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'revision': None, 'task_type': 'CAUSAL_LM', 'inference_mode': False, 'rank_pattern': {}, 'alpha_pattern': {}, 'r': 32, 'module_dropout': 0.0, 'target_modules': ['v_proj', 'q_proj'], 'init_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'modules_to_save': None, 'coft': False, 'eps': 0.0001, 'block_share': False}}, 'vocab_size': 32001, 'max_position_embeddings': 4096, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'num_key_value_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 128, 'return_dict': True, 'output_hidden_states': False, 'torchscript': False, 'dtype': 'float32', 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': 32000, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'num_beam_groups': 1, 'diversity_penalty': 0.0, '_name_or_path': 'meta-llama/Llama-2-7b-hf', 'transformers_version': '4.57.3', 'model_type': 'llama', 'tf_legacy_loss': False, 'use_bfloat16': False, 'output_attentions': False, 'output_dir': 'output/cpr2', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 4, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 1e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 2.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.005, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'output/cpr2/runs/Jan13_16-21-48_main1', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 200, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 0.0, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': True, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'adapter_name_or_path': None, 'data_path': './data/MetaMathQA-40K.json', 'dataset_split': 'train', 'dataset_field': ['query', 'response'], 'model_max_length': 512, 'hrft_r': 32, 'init_a': 0.0001, 'eps': 0.0001, 'lamda': 0.0001, 'add_orth': 'none', 'init_weights': True}
23
+ 2026-01-13 16:21:55,362 INFO MainThread:478237 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 6746812416 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7fd7f1c57f40>>
24
+ 2026-01-13 16:21:55,362 INFO MainThread:478237 [wandb_run.py:_config_callback():1385] config_cb model/num_parameters 6746812416 None
25
+ 2026-01-13 20:27:02,083 INFO wandb-AsyncioManager-main:478237 [service_client.py:_forward_responses():80] Reached EOF.
26
+ 2026-01-13 20:27:02,084 INFO wandb-AsyncioManager-main:478237 [mailbox.py:close():137] Closing mailbox, abandoning 0 handles.
llama/wandb/offline-run-20260113_162154-a4ea78sb/run-a4ea78sb.wandb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfe9389a013542101753a8627157c2f4f4846c82c6dca4ef43ae47e7d501c863
3
+ size 2049726
llama/wandb/offline-run-20260113_213836-a3j2m1nj/files/requirements.txt ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ setuptools==80.9.0
2
+ wheel==0.45.1
3
+ pip==25.3
4
+ Brotli==1.1.0
5
+ certifi==2025.11.12
6
+ charset-normalizer==3.4.4
7
+ filelock==3.20.0
8
+ hpack==4.1.0
9
+ hyperframe==6.1.0
10
+ idna==3.11
11
+ MarkupSafe==3.0.3
12
+ mpmath==1.3.0
13
+ networkx==3.4.2
14
+ pycparser==2.22
15
+ PySocks==1.7.1
16
+ PyYAML==6.0.3
17
+ typing_extensions==4.15.0
18
+ cffi==2.0.0
19
+ gmpy2==2.2.1
20
+ h2==4.3.0
21
+ Jinja2==3.1.6
22
+ sympy==1.14.0
23
+ zstandard==0.23.0
24
+ urllib3==2.5.0
25
+ requests==2.32.5
26
+ appdirs==1.4.4
27
+ rich-toolkit==0.17.0
28
+ torchaudio==2.9.0
29
+ triton==3.5.0
30
+ tqdm==4.67.1
31
+ safetensors==0.7.0
32
+ regex==2025.11.3
33
+ packaging==25.0
34
+ hf-xet==1.2.0
35
+ hf-xet==1.2.1
36
+ huggingface_hub==0.36.0
37
+ tokenizers==0.22.1
38
+ pytz==2025.2
39
+ xxhash==3.6.0
40
+ tzdata==2025.2
41
+ six==1.17.0
42
+ pyarrow-hotfix==0.7
43
+ pyarrow==22.0.0
44
+ pyarrow==21.0.0
45
+ propcache==0.4.1
46
+ propcache==0.3.1
47
+ multidict==6.7.0
48
+ multidict==6.6.3
49
+ aiohappyeyeballs==2.6.1
50
+ fsspec==2024.3.1
51
+ fsspec==2025.10.0
52
+ frozenlist==1.8.0
53
+ frozenlist==1.7.0
54
+ dill==0.3.8
55
+ dill==0.4.0
56
+ attrs==25.4.0
57
+ async-timeout==5.0.1
58
+ yarl==1.22.0
59
+ python-dateutil==2.9.0.post0
60
+ multiprocess==0.70.16
61
+ multiprocess==0.70.18
62
+ aiosignal==1.4.0
63
+ pandas==2.3.3
64
+ aiohttp==3.13.2
65
+ pycountry==24.6.1
66
+ psutil==7.1.3
67
+ accelerate==1.12.0
68
+ peft==0.10.0
69
+ Pygments==2.19.2
70
+ colorama==0.4.6
71
+ shellingham==1.5.4
72
+ sniffio==1.3.1
73
+ exceptiongroup==1.3.1
74
+ h11==0.16.0
75
+ typer-slim==0.20.0
76
+ anyio==4.12.0
77
+ httpcore==1.0.9
78
+ httpx==0.28.1
79
+ datasets==4.4.1
80
+ ninja==1.13.0
81
+ docker-pycreds==0.4.0
82
+ eval_type_backport==0.3.1
83
+ platformdirs==4.5.0
84
+ sentry-sdk==2.47.0
85
+ annotated-types==0.7.0
86
+ typing-inspection==0.4.2
87
+ smmap==5.0.2
88
+ gitdb==4.0.12
89
+ GitPython==3.1.45
90
+ protobuf==6.31.1
91
+ setproctitle==1.3.6
92
+ pydantic_core==2.41.5
93
+ pydantic==2.12.5
94
+ wandb==0.23.0
95
+ jsonlines==4.0.0
96
+ supervisor==4.3.0
97
+ py-cpuinfo==9.0.0
98
+ nvidia-ml-py==13.580.82
99
+ nvidia-cusparselt-cu12==0.7.1
100
+ fastrlock==0.8.3
101
+ websockets==15.0.1
102
+ uvloop==0.22.1
103
+ tomli==2.3.0
104
+ tabulate==0.9.0
105
+ sentencepiece==0.2.1
106
+ rpds-py==0.30.0
107
+ rignore==0.7.6
108
+ pyzmq==27.1.0
109
+ python-multipart==0.0.20
110
+ python-json-logger==4.0.0
111
+ python-dotenv==1.2.1
112
+ pybase64==1.4.2
113
+ prometheus_client==0.23.1
114
+ starlette==0.50.0
115
+ pillow==12.0.0
116
+ partial-json-parser==0.2.1.1.post7
117
+ outlines_core==0.2.11
118
+ nvidia-nvtx-cu12==12.8.90
119
+ nvidia-nvshmem-cu12==3.3.20
120
+ nvidia-nvjitlink-cu12==12.8.93
121
+ nvidia-nccl-cu12==2.27.5
122
+ nvidia-curand-cu12==10.3.9.90
123
+ nvidia-cufile-cu12==1.13.1.3
124
+ nvidia-cudnn-frontend==1.16.0
125
+ nvidia-cuda-runtime-cu12==12.8.90
126
+ nvidia-cuda-nvrtc-cu12==12.8.93
127
+ nvidia-cuda-cupti-cu12==12.8.90
128
+ nvidia-cublas-cu12==12.8.4.1
129
+ numpy==2.2.6
130
+ msgspec==0.20.0
131
+ msgpack==1.1.2
132
+ mdurl==0.1.2
133
+ loguru==0.7.3
134
+ llvmlite==0.44.0
135
+ llguidance==1.3.0
136
+ lark==1.2.2
137
+ jmespath==1.0.1
138
+ jiter==0.12.0
139
+ interegular==0.3.3
140
+ httptools==0.7.1
141
+ fastar==0.8.0
142
+ einops==0.8.1
143
+ docstring_parser==0.17.0
144
+ dnspython==2.8.0
145
+ distro==1.9.0
146
+ diskcache==5.6.3
147
+ cuda-pathfinder==1.3.3
148
+ cloudpickle==3.1.2
149
+ rich==14.2.0
150
+ click==8.2.1
151
+ cbor2==5.7.1
152
+ cachetools==6.2.2
153
+ blake3==1.0.8
154
+ astor==0.8.1
155
+ apache-tvm-ffi==0.1.4
156
+ annotated-doc==0.0.4
157
+ uvicorn==0.38.0
158
+ tiktoken==0.12.0
159
+ scipy==1.15.3
160
+ referencing==0.37.0
161
+ opencv-python-headless==4.12.0.88
162
+ nvidia-cusparse-cu12==12.5.8.93
163
+ nvidia-cufft-cu12==11.3.3.83
164
+ nvidia-cudnn-cu12==9.10.2.21
165
+ numba==0.61.2
166
+ markdown-it-py==4.0.0
167
+ gguf==0.17.1
168
+ email-validator==2.3.0
169
+ depyf==0.20.0
170
+ cupy-cuda12x==13.6.0
171
+ cuda-bindings==13.1.0
172
+ watchfiles==1.1.1
173
+ pydantic-extra-types==2.10.6
174
+ openai-harmony==0.0.8
175
+ nvidia-cusolver-cu12==11.7.3.90
176
+ lm-format-enforcer==0.11.3
177
+ jsonschema-specifications==2025.9.1
178
+ cuda-python==13.1.0
179
+ typer==0.20.0
180
+ transformers==4.57.3
181
+ torch==2.9.0
182
+ prometheus-fastapi-instrumentator==7.1.0
183
+ openai==2.9.0
184
+ nvidia-cutlass-dsl==4.3.2
185
+ jsonschema==4.25.1
186
+ fastapi==0.123.10
187
+ anthropic==0.71.0
188
+ xgrammar==0.1.27
189
+ torchvision==0.24.0
190
+ ray==2.52.1
191
+ model-hosting-container-standards==0.1.9
192
+ mistral_common==1.8.6
193
+ flashinfer-python==0.5.3
194
+ fastapi-cloud-cli==0.6.0
195
+ fastapi-cli==0.0.16
196
+ compressed-tensors==0.12.2
197
+ vllm==0.12.0
198
+ Fraction==2.2.0
199
+ DeBERTa==0.1.13
llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-13T21:38:36.612826143+09:00","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpmps2xwqt/port-891360.txt","pid":891360,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2026-01-13T21:38:36.61331276+09:00","level":"INFO","msg":"server: will exit if parent process dies","ppid":891360}
3
+ {"time":"2026-01-13T21:38:36.613298464+09:00","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-891360-891645-1667871202/socket","Net":"unix"}}
4
+ {"time":"2026-01-13T21:38:36.783607525+09:00","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2026-01-13T21:38:36.79752518+09:00","level":"INFO","msg":"handleInformInit: received","streamId":"a3j2m1nj","id":"1(@)"}
6
+ {"time":"2026-01-13T21:38:36.974920158+09:00","level":"INFO","msg":"handleInformInit: stream started","streamId":"a3j2m1nj","id":"1(@)"}
7
+ {"time":"2026-01-14T01:44:49.729274048+09:00","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2026-01-14T01:44:49.7293491+09:00","level":"INFO","msg":"server is shutting down"}
9
+ {"time":"2026-01-14T01:44:49.729335449+09:00","level":"INFO","msg":"connection: closing","id":"1(@)"}
10
+ {"time":"2026-01-14T01:44:49.729432172+09:00","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-891360-891645-1667871202/socket","Net":"unix"}}
11
+ {"time":"2026-01-14T01:44:49.729474793+09:00","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
12
+ {"time":"2026-01-14T01:44:49.730357876+09:00","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2026-01-14T01:44:49.730376141+09:00","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2026-01-14T01:44:49.730383674+09:00","level":"INFO","msg":"server is closed"}
llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-13T21:38:36.808750595+09:00","level":"INFO","msg":"stream: starting","core version":"0.23.0"}
2
+ {"time":"2026-01-13T21:38:36.965984438+09:00","level":"WARN","msg":"featurechecker: GraphQL client is nil, skipping feature loading"}
3
+ {"time":"2026-01-13T21:38:36.966027626+09:00","level":"INFO","msg":"stream: created new stream","id":"a3j2m1nj"}
4
+ {"time":"2026-01-13T21:38:36.96613096+09:00","level":"INFO","msg":"handler: started","stream_id":"a3j2m1nj"}
5
+ {"time":"2026-01-13T21:38:36.974910942+09:00","level":"INFO","msg":"stream: started","id":"a3j2m1nj"}
6
+ {"time":"2026-01-13T21:38:36.97493371+09:00","level":"INFO","msg":"sender: started","stream_id":"a3j2m1nj"}
7
+ {"time":"2026-01-13T21:38:36.974933869+09:00","level":"INFO","msg":"writer: started","stream_id":"a3j2m1nj"}
8
+ {"time":"2026-01-13T21:38:36.975342432+09:00","level":"WARN","msg":"runupserter: server does not expand metric globs but the x_server_side_expand_glob_metrics setting is set; ignoring"}
9
+ {"time":"2026-01-14T01:44:49.729333759+09:00","level":"INFO","msg":"stream: closing","id":"a3j2m1nj"}
10
+ {"time":"2026-01-14T01:44:49.729533525+09:00","level":"INFO","msg":"handler: closed","stream_id":"a3j2m1nj"}
11
+ {"time":"2026-01-14T01:44:49.730075227+09:00","level":"INFO","msg":"sender: closed","stream_id":"a3j2m1nj"}
12
+ {"time":"2026-01-14T01:44:49.730094075+09:00","level":"INFO","msg":"stream: closed","id":"a3j2m1nj"}
llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0
2
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_setup.py:_flush():80] Configure stats pid to 891360
3
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_setup.py:_flush():80] Loading settings from /home/work/.config/wandb/settings
4
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_setup.py:_flush():80] Loading settings from /home/work/an_nguyen/HRA/llama/wandb/settings
5
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug.log
7
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260113_213836-a3j2m1nj/logs/debug-internal.log
8
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_init.py:init():840] calling init triggers
9
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_init.py:init():845] wandb.init called with sweep_config: {}
10
+ config: {'_wandb': {}}
11
+ 2026-01-13 21:38:36,531 INFO MainThread:891360 [wandb_init.py:init():888] starting backend
12
+ 2026-01-13 21:38:36,783 INFO MainThread:891360 [wandb_init.py:init():891] sending inform_init request
13
+ 2026-01-13 21:38:36,796 INFO MainThread:891360 [wandb_init.py:init():899] backend started and connected
14
+ 2026-01-13 21:38:36,796 INFO MainThread:891360 [wandb_init.py:init():969] updated telemetry
15
+ 2026-01-13 21:38:36,797 INFO MainThread:891360 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout
16
+ 2026-01-13 21:38:36,976 INFO MainThread:891360 [wandb_init.py:init():1040] starting run threads in backend
17
+ 2026-01-13 21:38:37,082 INFO MainThread:891360 [wandb_run.py:_console_start():2504] atexit reg
18
+ 2026-01-13 21:38:37,082 INFO MainThread:891360 [wandb_run.py:_redirect():2352] redirect: wrap_raw
19
+ 2026-01-13 21:38:37,082 INFO MainThread:891360 [wandb_run.py:_redirect():2421] Wrapping output streams.
20
+ 2026-01-13 21:38:37,082 INFO MainThread:891360 [wandb_run.py:_redirect():2444] Redirects installed.
21
+ 2026-01-13 21:38:37,084 INFO MainThread:891360 [wandb_init.py:init():1080] run started, returning control to user process
22
+ 2026-01-13 21:38:37,085 INFO MainThread:891360 [wandb_run.py:_config_callback():1385] config_cb None None {'peft_config': {'default': {'peft_type': 'OFT', 'auto_mapping': None, 'base_model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'revision': None, 'task_type': 'CAUSAL_LM', 'inference_mode': False, 'rank_pattern': {}, 'alpha_pattern': {}, 'r': 32, 'module_dropout': 0.0, 'target_modules': ['v_proj', 'q_proj'], 'init_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'modules_to_save': None, 'coft': False, 'eps': 0.0001, 'block_share': False}}, 'vocab_size': 32001, 'max_position_embeddings': 4096, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'num_key_value_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 128, 'return_dict': True, 'output_hidden_states': False, 'torchscript': False, 'dtype': 'float32', 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': 32000, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'num_beam_groups': 1, 'diversity_penalty': 0.0, '_name_or_path': 'meta-llama/Llama-2-7b-hf', 'transformers_version': '4.57.3', 'model_type': 'llama', 'tf_legacy_loss': False, 'use_bfloat16': False, 'output_attentions': False, 'output_dir': 'output/cms3', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 4, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 1e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 2.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.005, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'output/cms3/runs/Jan13_21-38-29_main1', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 200, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 0.0, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': True, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'adapter_name_or_path': None, 'data_path': './data/MetaMathQA-40K.json', 'dataset_split': 'train', 'dataset_field': ['query', 'response'], 'model_max_length': 512, 'hrft_r': 32, 'init_a': 0.0001, 'eps': 0.0001, 'lamda': 0.0001, 'add_orth': 'none', 'init_weights': True}
23
+ 2026-01-13 21:38:37,094 INFO MainThread:891360 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 6746812416 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f2e11cb7fa0>>
24
+ 2026-01-13 21:38:37,094 INFO MainThread:891360 [wandb_run.py:_config_callback():1385] config_cb model/num_parameters 6746812416 None
25
+ 2026-01-14 01:44:49,729 INFO wandb-AsyncioManager-main:891360 [service_client.py:_forward_responses():80] Reached EOF.
26
+ 2026-01-14 01:44:49,729 INFO wandb-AsyncioManager-main:891360 [mailbox.py:close():137] Closing mailbox, abandoning 0 handles.
llama/wandb/offline-run-20260113_213836-a3j2m1nj/run-a3j2m1nj.wandb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a8c64e47b303b43dd85aceddf6406bef9b0ee26cfa09d90f822e95343fb236e
3
+ size 2101615
llama/wandb/offline-run-20260114_165804-73rsvobf/files/requirements.txt ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ setuptools==80.9.0
2
+ wheel==0.45.1
3
+ pip==25.3
4
+ Brotli==1.1.0
5
+ certifi==2025.11.12
6
+ charset-normalizer==3.4.4
7
+ filelock==3.20.0
8
+ hpack==4.1.0
9
+ hyperframe==6.1.0
10
+ idna==3.11
11
+ MarkupSafe==3.0.3
12
+ mpmath==1.3.0
13
+ networkx==3.4.2
14
+ pycparser==2.22
15
+ PySocks==1.7.1
16
+ PyYAML==6.0.3
17
+ typing_extensions==4.15.0
18
+ cffi==2.0.0
19
+ gmpy2==2.2.1
20
+ h2==4.3.0
21
+ Jinja2==3.1.6
22
+ sympy==1.14.0
23
+ zstandard==0.23.0
24
+ urllib3==2.5.0
25
+ requests==2.32.5
26
+ appdirs==1.4.4
27
+ rich-toolkit==0.17.0
28
+ torchaudio==2.9.0
29
+ triton==3.5.0
30
+ tqdm==4.67.1
31
+ safetensors==0.7.0
32
+ regex==2025.11.3
33
+ packaging==25.0
34
+ hf-xet==1.2.0
35
+ hf-xet==1.2.1
36
+ huggingface_hub==0.36.0
37
+ tokenizers==0.22.1
38
+ pytz==2025.2
39
+ xxhash==3.6.0
40
+ tzdata==2025.2
41
+ six==1.17.0
42
+ pyarrow-hotfix==0.7
43
+ pyarrow==22.0.0
44
+ pyarrow==21.0.0
45
+ propcache==0.4.1
46
+ propcache==0.3.1
47
+ multidict==6.7.0
48
+ multidict==6.6.3
49
+ aiohappyeyeballs==2.6.1
50
+ fsspec==2024.3.1
51
+ fsspec==2025.10.0
52
+ frozenlist==1.8.0
53
+ frozenlist==1.7.0
54
+ dill==0.3.8
55
+ dill==0.4.0
56
+ attrs==25.4.0
57
+ async-timeout==5.0.1
58
+ yarl==1.22.0
59
+ python-dateutil==2.9.0.post0
60
+ multiprocess==0.70.16
61
+ multiprocess==0.70.18
62
+ aiosignal==1.4.0
63
+ pandas==2.3.3
64
+ aiohttp==3.13.2
65
+ pycountry==24.6.1
66
+ psutil==7.1.3
67
+ accelerate==1.12.0
68
+ peft==0.10.0
69
+ Pygments==2.19.2
70
+ colorama==0.4.6
71
+ shellingham==1.5.4
72
+ sniffio==1.3.1
73
+ exceptiongroup==1.3.1
74
+ h11==0.16.0
75
+ typer-slim==0.20.0
76
+ anyio==4.12.0
77
+ httpcore==1.0.9
78
+ httpx==0.28.1
79
+ datasets==4.4.1
80
+ ninja==1.13.0
81
+ docker-pycreds==0.4.0
82
+ eval_type_backport==0.3.1
83
+ platformdirs==4.5.0
84
+ sentry-sdk==2.47.0
85
+ annotated-types==0.7.0
86
+ typing-inspection==0.4.2
87
+ smmap==5.0.2
88
+ gitdb==4.0.12
89
+ GitPython==3.1.45
90
+ protobuf==6.31.1
91
+ setproctitle==1.3.6
92
+ pydantic_core==2.41.5
93
+ pydantic==2.12.5
94
+ wandb==0.23.0
95
+ jsonlines==4.0.0
96
+ supervisor==4.3.0
97
+ py-cpuinfo==9.0.0
98
+ nvidia-ml-py==13.580.82
99
+ nvidia-cusparselt-cu12==0.7.1
100
+ fastrlock==0.8.3
101
+ websockets==15.0.1
102
+ uvloop==0.22.1
103
+ tomli==2.3.0
104
+ tabulate==0.9.0
105
+ sentencepiece==0.2.1
106
+ rpds-py==0.30.0
107
+ rignore==0.7.6
108
+ pyzmq==27.1.0
109
+ python-multipart==0.0.20
110
+ python-json-logger==4.0.0
111
+ python-dotenv==1.2.1
112
+ pybase64==1.4.2
113
+ prometheus_client==0.23.1
114
+ starlette==0.50.0
115
+ pillow==12.0.0
116
+ partial-json-parser==0.2.1.1.post7
117
+ outlines_core==0.2.11
118
+ nvidia-nvtx-cu12==12.8.90
119
+ nvidia-nvshmem-cu12==3.3.20
120
+ nvidia-nvjitlink-cu12==12.8.93
121
+ nvidia-nccl-cu12==2.27.5
122
+ nvidia-curand-cu12==10.3.9.90
123
+ nvidia-cufile-cu12==1.13.1.3
124
+ nvidia-cudnn-frontend==1.16.0
125
+ nvidia-cuda-runtime-cu12==12.8.90
126
+ nvidia-cuda-nvrtc-cu12==12.8.93
127
+ nvidia-cuda-cupti-cu12==12.8.90
128
+ nvidia-cublas-cu12==12.8.4.1
129
+ numpy==2.2.6
130
+ msgspec==0.20.0
131
+ msgpack==1.1.2
132
+ mdurl==0.1.2
133
+ loguru==0.7.3
134
+ llvmlite==0.44.0
135
+ llguidance==1.3.0
136
+ lark==1.2.2
137
+ jmespath==1.0.1
138
+ jiter==0.12.0
139
+ interegular==0.3.3
140
+ httptools==0.7.1
141
+ fastar==0.8.0
142
+ einops==0.8.1
143
+ docstring_parser==0.17.0
144
+ dnspython==2.8.0
145
+ distro==1.9.0
146
+ diskcache==5.6.3
147
+ cuda-pathfinder==1.3.3
148
+ cloudpickle==3.1.2
149
+ rich==14.2.0
150
+ click==8.2.1
151
+ cbor2==5.7.1
152
+ cachetools==6.2.2
153
+ blake3==1.0.8
154
+ astor==0.8.1
155
+ apache-tvm-ffi==0.1.4
156
+ annotated-doc==0.0.4
157
+ uvicorn==0.38.0
158
+ tiktoken==0.12.0
159
+ scipy==1.15.3
160
+ referencing==0.37.0
161
+ opencv-python-headless==4.12.0.88
162
+ nvidia-cusparse-cu12==12.5.8.93
163
+ nvidia-cufft-cu12==11.3.3.83
164
+ nvidia-cudnn-cu12==9.10.2.21
165
+ numba==0.61.2
166
+ markdown-it-py==4.0.0
167
+ gguf==0.17.1
168
+ email-validator==2.3.0
169
+ depyf==0.20.0
170
+ cupy-cuda12x==13.6.0
171
+ cuda-bindings==13.1.0
172
+ watchfiles==1.1.1
173
+ pydantic-extra-types==2.10.6
174
+ openai-harmony==0.0.8
175
+ nvidia-cusolver-cu12==11.7.3.90
176
+ lm-format-enforcer==0.11.3
177
+ jsonschema-specifications==2025.9.1
178
+ cuda-python==13.1.0
179
+ typer==0.20.0
180
+ transformers==4.57.3
181
+ torch==2.9.0
182
+ prometheus-fastapi-instrumentator==7.1.0
183
+ openai==2.9.0
184
+ nvidia-cutlass-dsl==4.3.2
185
+ jsonschema==4.25.1
186
+ fastapi==0.123.10
187
+ anthropic==0.71.0
188
+ xgrammar==0.1.27
189
+ torchvision==0.24.0
190
+ ray==2.52.1
191
+ model-hosting-container-standards==0.1.9
192
+ mistral_common==1.8.6
193
+ flashinfer-python==0.5.3
194
+ fastapi-cloud-cli==0.6.0
195
+ fastapi-cli==0.0.16
196
+ compressed-tensors==0.12.2
197
+ vllm==0.12.0
198
+ Fraction==2.2.0
199
+ DeBERTa==0.1.13
llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-14T16:58:04.868620839+09:00","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpelwrvnf6/port-1741342.txt","pid":1741342,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2026-01-14T16:58:04.86909634+09:00","level":"INFO","msg":"server: will exit if parent process dies","ppid":1741342}
3
+ {"time":"2026-01-14T16:58:04.869074421+09:00","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-1741342-1741535-2497333284/socket","Net":"unix"}}
4
+ {"time":"2026-01-14T16:58:05.042266919+09:00","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2026-01-14T16:58:05.057856705+09:00","level":"INFO","msg":"handleInformInit: received","streamId":"73rsvobf","id":"1(@)"}
6
+ {"time":"2026-01-14T16:58:05.211099004+09:00","level":"INFO","msg":"handleInformInit: stream started","streamId":"73rsvobf","id":"1(@)"}
7
+ {"time":"2026-01-14T16:59:21.72449823+09:00","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2026-01-14T16:59:21.725623244+09:00","level":"INFO","msg":"connection: closing","id":"1(@)"}
9
+ {"time":"2026-01-14T16:59:21.725669826+09:00","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
10
+ {"time":"2026-01-14T16:59:21.725637604+09:00","level":"INFO","msg":"server is shutting down"}
11
+ {"time":"2026-01-14T16:59:21.725755633+09:00","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-1741342-1741535-2497333284/socket","Net":"unix"}}
12
+ {"time":"2026-01-14T16:59:21.733756261+09:00","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2026-01-14T16:59:21.733773028+09:00","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2026-01-14T16:59:21.733781827+09:00","level":"INFO","msg":"server is closed"}
llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-14T16:58:05.066543155+09:00","level":"INFO","msg":"stream: starting","core version":"0.23.0"}
2
+ {"time":"2026-01-14T16:58:05.210029267+09:00","level":"WARN","msg":"featurechecker: GraphQL client is nil, skipping feature loading"}
3
+ {"time":"2026-01-14T16:58:05.210075058+09:00","level":"INFO","msg":"stream: created new stream","id":"73rsvobf"}
4
+ {"time":"2026-01-14T16:58:05.210149108+09:00","level":"INFO","msg":"handler: started","stream_id":"73rsvobf"}
5
+ {"time":"2026-01-14T16:58:05.211090279+09:00","level":"INFO","msg":"stream: started","id":"73rsvobf"}
6
+ {"time":"2026-01-14T16:58:05.211104813+09:00","level":"INFO","msg":"writer: started","stream_id":"73rsvobf"}
7
+ {"time":"2026-01-14T16:58:05.211113248+09:00","level":"INFO","msg":"sender: started","stream_id":"73rsvobf"}
8
+ {"time":"2026-01-14T16:58:05.211515731+09:00","level":"WARN","msg":"runupserter: server does not expand metric globs but the x_server_side_expand_glob_metrics setting is set; ignoring"}
9
+ {"time":"2026-01-14T16:59:21.72563689+09:00","level":"INFO","msg":"stream: closing","id":"73rsvobf"}
10
+ {"time":"2026-01-14T16:59:21.725868691+09:00","level":"INFO","msg":"handler: closed","stream_id":"73rsvobf"}
11
+ {"time":"2026-01-14T16:59:21.733478727+09:00","level":"INFO","msg":"sender: closed","stream_id":"73rsvobf"}
12
+ {"time":"2026-01-14T16:59:21.733495479+09:00","level":"INFO","msg":"stream: closed","id":"73rsvobf"}
llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0
2
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_setup.py:_flush():80] Configure stats pid to 1741342
3
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_setup.py:_flush():80] Loading settings from /home/work/.config/wandb/settings
4
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_setup.py:_flush():80] Loading settings from /home/work/an_nguyen/HRA/llama/wandb/settings
5
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug.log
7
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_165804-73rsvobf/logs/debug-internal.log
8
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_init.py:init():840] calling init triggers
9
+ 2026-01-14 16:58:04,788 INFO MainThread:1741342 [wandb_init.py:init():845] wandb.init called with sweep_config: {}
10
+ config: {'_wandb': {}}
11
+ 2026-01-14 16:58:04,789 INFO MainThread:1741342 [wandb_init.py:init():888] starting backend
12
+ 2026-01-14 16:58:05,042 INFO MainThread:1741342 [wandb_init.py:init():891] sending inform_init request
13
+ 2026-01-14 16:58:05,056 INFO MainThread:1741342 [wandb_init.py:init():899] backend started and connected
14
+ 2026-01-14 16:58:05,057 INFO MainThread:1741342 [wandb_init.py:init():969] updated telemetry
15
+ 2026-01-14 16:58:05,058 INFO MainThread:1741342 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout
16
+ 2026-01-14 16:58:05,213 INFO MainThread:1741342 [wandb_init.py:init():1040] starting run threads in backend
17
+ 2026-01-14 16:58:05,317 INFO MainThread:1741342 [wandb_run.py:_console_start():2504] atexit reg
18
+ 2026-01-14 16:58:05,317 INFO MainThread:1741342 [wandb_run.py:_redirect():2352] redirect: wrap_raw
19
+ 2026-01-14 16:58:05,317 INFO MainThread:1741342 [wandb_run.py:_redirect():2421] Wrapping output streams.
20
+ 2026-01-14 16:58:05,317 INFO MainThread:1741342 [wandb_run.py:_redirect():2444] Redirects installed.
21
+ 2026-01-14 16:58:05,319 INFO MainThread:1741342 [wandb_init.py:init():1080] run started, returning control to user process
22
+ 2026-01-14 16:58:05,321 INFO MainThread:1741342 [wandb_run.py:_config_callback():1385] config_cb None None {'peft_config': {'default': {'peft_type': 'OFT', 'auto_mapping': None, 'base_model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'revision': None, 'task_type': 'CAUSAL_LM', 'inference_mode': False, 'rank_pattern': {}, 'alpha_pattern': {}, 'r': 32, 'module_dropout': 0.0, 'target_modules': ['v_proj', 'q_proj'], 'init_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'modules_to_save': None, 'coft': False, 'eps': 0.0001, 'block_share': False}}, 'vocab_size': 32001, 'max_position_embeddings': 4096, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'num_key_value_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 128, 'return_dict': True, 'output_hidden_states': False, 'torchscript': False, 'dtype': 'float32', 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': 32000, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'num_beam_groups': 1, 'diversity_penalty': 0.0, '_name_or_path': 'meta-llama/Llama-2-7b-hf', 'transformers_version': '4.57.3', 'model_type': 'llama', 'tf_legacy_loss': False, 'use_bfloat16': False, 'output_attentions': False, 'output_dir': 'output/cms3', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 4, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 1e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 2.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.005, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'output/cms3/runs/Jan14_16-57-58_main1', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 200, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 0.0, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': True, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'adapter_name_or_path': None, 'data_path': './data/MetaMathQA-40K.json', 'dataset_split': 'train', 'dataset_field': ['query', 'response'], 'model_max_length': 512, 'hrft_r': 32, 'init_a': 0.0001, 'eps': 0.0001, 'lamda': 0.0001, 'add_orth': 'none', 'init_weights': True}
23
+ 2026-01-14 16:58:05,329 INFO MainThread:1741342 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 6746812416 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f4761caff40>>
24
+ 2026-01-14 16:58:05,329 INFO MainThread:1741342 [wandb_run.py:_config_callback():1385] config_cb model/num_parameters 6746812416 None
25
+ 2026-01-14 16:59:21,724 INFO wandb-AsyncioManager-main:1741342 [service_client.py:_forward_responses():80] Reached EOF.
26
+ 2026-01-14 16:59:21,725 INFO wandb-AsyncioManager-main:1741342 [mailbox.py:close():137] Closing mailbox, abandoning 0 handles.
llama/wandb/offline-run-20260114_165804-73rsvobf/run-73rsvobf.wandb ADDED
Binary file (21.5 kB). View file
 
llama/wandb/offline-run-20260114_173548-7ubed6qe/files/requirements.txt ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ setuptools==80.9.0
2
+ wheel==0.45.1
3
+ pip==25.3
4
+ Brotli==1.1.0
5
+ certifi==2025.11.12
6
+ charset-normalizer==3.4.4
7
+ filelock==3.20.0
8
+ hpack==4.1.0
9
+ hyperframe==6.1.0
10
+ idna==3.11
11
+ MarkupSafe==3.0.3
12
+ mpmath==1.3.0
13
+ networkx==3.4.2
14
+ pycparser==2.22
15
+ PySocks==1.7.1
16
+ PyYAML==6.0.3
17
+ typing_extensions==4.15.0
18
+ cffi==2.0.0
19
+ gmpy2==2.2.1
20
+ h2==4.3.0
21
+ Jinja2==3.1.6
22
+ sympy==1.14.0
23
+ zstandard==0.23.0
24
+ urllib3==2.5.0
25
+ requests==2.32.5
26
+ appdirs==1.4.4
27
+ rich-toolkit==0.17.0
28
+ torchaudio==2.9.0
29
+ triton==3.5.0
30
+ tqdm==4.67.1
31
+ safetensors==0.7.0
32
+ regex==2025.11.3
33
+ packaging==25.0
34
+ hf-xet==1.2.0
35
+ hf-xet==1.2.1
36
+ huggingface_hub==0.36.0
37
+ tokenizers==0.22.1
38
+ pytz==2025.2
39
+ xxhash==3.6.0
40
+ tzdata==2025.2
41
+ six==1.17.0
42
+ pyarrow-hotfix==0.7
43
+ pyarrow==22.0.0
44
+ pyarrow==21.0.0
45
+ propcache==0.4.1
46
+ propcache==0.3.1
47
+ multidict==6.7.0
48
+ multidict==6.6.3
49
+ aiohappyeyeballs==2.6.1
50
+ fsspec==2024.3.1
51
+ fsspec==2025.10.0
52
+ frozenlist==1.8.0
53
+ frozenlist==1.7.0
54
+ dill==0.3.8
55
+ dill==0.4.0
56
+ attrs==25.4.0
57
+ async-timeout==5.0.1
58
+ yarl==1.22.0
59
+ python-dateutil==2.9.0.post0
60
+ multiprocess==0.70.16
61
+ multiprocess==0.70.18
62
+ aiosignal==1.4.0
63
+ pandas==2.3.3
64
+ aiohttp==3.13.2
65
+ pycountry==24.6.1
66
+ psutil==7.1.3
67
+ accelerate==1.12.0
68
+ peft==0.10.0
69
+ Pygments==2.19.2
70
+ colorama==0.4.6
71
+ shellingham==1.5.4
72
+ sniffio==1.3.1
73
+ exceptiongroup==1.3.1
74
+ h11==0.16.0
75
+ typer-slim==0.20.0
76
+ anyio==4.12.0
77
+ httpcore==1.0.9
78
+ httpx==0.28.1
79
+ datasets==4.4.1
80
+ ninja==1.13.0
81
+ docker-pycreds==0.4.0
82
+ eval_type_backport==0.3.1
83
+ platformdirs==4.5.0
84
+ sentry-sdk==2.47.0
85
+ annotated-types==0.7.0
86
+ typing-inspection==0.4.2
87
+ smmap==5.0.2
88
+ gitdb==4.0.12
89
+ GitPython==3.1.45
90
+ protobuf==6.31.1
91
+ setproctitle==1.3.6
92
+ pydantic_core==2.41.5
93
+ pydantic==2.12.5
94
+ wandb==0.23.0
95
+ jsonlines==4.0.0
96
+ supervisor==4.3.0
97
+ py-cpuinfo==9.0.0
98
+ nvidia-ml-py==13.580.82
99
+ nvidia-cusparselt-cu12==0.7.1
100
+ fastrlock==0.8.3
101
+ websockets==15.0.1
102
+ uvloop==0.22.1
103
+ tomli==2.3.0
104
+ tabulate==0.9.0
105
+ sentencepiece==0.2.1
106
+ rpds-py==0.30.0
107
+ rignore==0.7.6
108
+ pyzmq==27.1.0
109
+ python-multipart==0.0.20
110
+ python-json-logger==4.0.0
111
+ python-dotenv==1.2.1
112
+ pybase64==1.4.2
113
+ prometheus_client==0.23.1
114
+ starlette==0.50.0
115
+ pillow==12.0.0
116
+ partial-json-parser==0.2.1.1.post7
117
+ outlines_core==0.2.11
118
+ nvidia-nvtx-cu12==12.8.90
119
+ nvidia-nvshmem-cu12==3.3.20
120
+ nvidia-nvjitlink-cu12==12.8.93
121
+ nvidia-nccl-cu12==2.27.5
122
+ nvidia-curand-cu12==10.3.9.90
123
+ nvidia-cufile-cu12==1.13.1.3
124
+ nvidia-cudnn-frontend==1.16.0
125
+ nvidia-cuda-runtime-cu12==12.8.90
126
+ nvidia-cuda-nvrtc-cu12==12.8.93
127
+ nvidia-cuda-cupti-cu12==12.8.90
128
+ nvidia-cublas-cu12==12.8.4.1
129
+ numpy==2.2.6
130
+ msgspec==0.20.0
131
+ msgpack==1.1.2
132
+ mdurl==0.1.2
133
+ loguru==0.7.3
134
+ llvmlite==0.44.0
135
+ llguidance==1.3.0
136
+ lark==1.2.2
137
+ jmespath==1.0.1
138
+ jiter==0.12.0
139
+ interegular==0.3.3
140
+ httptools==0.7.1
141
+ fastar==0.8.0
142
+ einops==0.8.1
143
+ docstring_parser==0.17.0
144
+ dnspython==2.8.0
145
+ distro==1.9.0
146
+ diskcache==5.6.3
147
+ cuda-pathfinder==1.3.3
148
+ cloudpickle==3.1.2
149
+ rich==14.2.0
150
+ click==8.2.1
151
+ cbor2==5.7.1
152
+ cachetools==6.2.2
153
+ blake3==1.0.8
154
+ astor==0.8.1
155
+ apache-tvm-ffi==0.1.4
156
+ annotated-doc==0.0.4
157
+ uvicorn==0.38.0
158
+ tiktoken==0.12.0
159
+ scipy==1.15.3
160
+ referencing==0.37.0
161
+ opencv-python-headless==4.12.0.88
162
+ nvidia-cusparse-cu12==12.5.8.93
163
+ nvidia-cufft-cu12==11.3.3.83
164
+ nvidia-cudnn-cu12==9.10.2.21
165
+ numba==0.61.2
166
+ markdown-it-py==4.0.0
167
+ gguf==0.17.1
168
+ email-validator==2.3.0
169
+ depyf==0.20.0
170
+ cupy-cuda12x==13.6.0
171
+ cuda-bindings==13.1.0
172
+ watchfiles==1.1.1
173
+ pydantic-extra-types==2.10.6
174
+ openai-harmony==0.0.8
175
+ nvidia-cusolver-cu12==11.7.3.90
176
+ lm-format-enforcer==0.11.3
177
+ jsonschema-specifications==2025.9.1
178
+ cuda-python==13.1.0
179
+ typer==0.20.0
180
+ transformers==4.57.3
181
+ torch==2.9.0
182
+ prometheus-fastapi-instrumentator==7.1.0
183
+ openai==2.9.0
184
+ nvidia-cutlass-dsl==4.3.2
185
+ jsonschema==4.25.1
186
+ fastapi==0.123.10
187
+ anthropic==0.71.0
188
+ xgrammar==0.1.27
189
+ torchvision==0.24.0
190
+ ray==2.52.1
191
+ model-hosting-container-standards==0.1.9
192
+ mistral_common==1.8.6
193
+ flashinfer-python==0.5.3
194
+ fastapi-cloud-cli==0.6.0
195
+ fastapi-cli==0.0.16
196
+ compressed-tensors==0.12.2
197
+ vllm==0.12.0
198
+ Fraction==2.2.0
199
+ DeBERTa==0.1.13
llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-core.log ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-14T17:35:48.80538283+09:00","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmph6vv11_n/port-1752905.txt","pid":1752905,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false,"enable-dcgm-profiling":false}
2
+ {"time":"2026-01-14T17:35:48.805860127+09:00","level":"INFO","msg":"server: will exit if parent process dies","ppid":1752905}
3
+ {"time":"2026-01-14T17:35:48.805841032+09:00","level":"INFO","msg":"server: accepting connections","addr":{"Name":"/tmp/wandb-1752905-1753049-2894214458/socket","Net":"unix"}}
4
+ {"time":"2026-01-14T17:35:48.983834145+09:00","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"1(@)"}
5
+ {"time":"2026-01-14T17:35:48.998863815+09:00","level":"INFO","msg":"handleInformInit: received","streamId":"7ubed6qe","id":"1(@)"}
6
+ {"time":"2026-01-14T17:35:49.151617773+09:00","level":"INFO","msg":"handleInformInit: stream started","streamId":"7ubed6qe","id":"1(@)"}
7
+ {"time":"2026-01-14T21:46:20.466585934+09:00","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"1(@)"}
8
+ {"time":"2026-01-14T21:46:20.466651973+09:00","level":"INFO","msg":"connection: closing","id":"1(@)"}
9
+ {"time":"2026-01-14T21:46:20.466694957+09:00","level":"INFO","msg":"server is shutting down"}
10
+ {"time":"2026-01-14T21:46:20.466711701+09:00","level":"INFO","msg":"connection: closed successfully","id":"1(@)"}
11
+ {"time":"2026-01-14T21:46:20.46679412+09:00","level":"INFO","msg":"server: listener closed","addr":{"Name":"/tmp/wandb-1752905-1753049-2894214458/socket","Net":"unix"}}
12
+ {"time":"2026-01-14T21:46:20.46924732+09:00","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"1(@)"}
13
+ {"time":"2026-01-14T21:46:20.469263788+09:00","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"1(@)"}
14
+ {"time":"2026-01-14T21:46:20.469270741+09:00","level":"INFO","msg":"server is closed"}
llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-01-14T17:35:49.006544401+09:00","level":"INFO","msg":"stream: starting","core version":"0.23.0"}
2
+ {"time":"2026-01-14T17:35:49.149824363+09:00","level":"WARN","msg":"featurechecker: GraphQL client is nil, skipping feature loading"}
3
+ {"time":"2026-01-14T17:35:49.149873743+09:00","level":"INFO","msg":"stream: created new stream","id":"7ubed6qe"}
4
+ {"time":"2026-01-14T17:35:49.149898431+09:00","level":"INFO","msg":"handler: started","stream_id":"7ubed6qe"}
5
+ {"time":"2026-01-14T17:35:49.151612025+09:00","level":"INFO","msg":"stream: started","id":"7ubed6qe"}
6
+ {"time":"2026-01-14T17:35:49.151616181+09:00","level":"INFO","msg":"writer: started","stream_id":"7ubed6qe"}
7
+ {"time":"2026-01-14T17:35:49.151631131+09:00","level":"INFO","msg":"sender: started","stream_id":"7ubed6qe"}
8
+ {"time":"2026-01-14T17:35:49.152375031+09:00","level":"WARN","msg":"runupserter: server does not expand metric globs but the x_server_side_expand_glob_metrics setting is set; ignoring"}
9
+ {"time":"2026-01-14T21:46:20.466650711+09:00","level":"INFO","msg":"stream: closing","id":"7ubed6qe"}
10
+ {"time":"2026-01-14T21:46:20.466968327+09:00","level":"INFO","msg":"handler: closed","stream_id":"7ubed6qe"}
11
+ {"time":"2026-01-14T21:46:20.468836334+09:00","level":"INFO","msg":"sender: closed","stream_id":"7ubed6qe"}
12
+ {"time":"2026-01-14T21:46:20.468852562+09:00","level":"INFO","msg":"stream: closed","id":"7ubed6qe"}
llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0
2
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Configure stats pid to 1752905
3
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from /home/work/.config/wandb/settings
4
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from /home/work/an_nguyen/HRA/llama/wandb/settings
5
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_setup.py:_flush():80] Loading settings from environment variables
6
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug.log
7
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /home/work/an_nguyen/HRA/llama/wandb/offline-run-20260114_173548-7ubed6qe/logs/debug-internal.log
8
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():840] calling init triggers
9
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():845] wandb.init called with sweep_config: {}
10
+ config: {'_wandb': {}}
11
+ 2026-01-14 17:35:48,731 INFO MainThread:1752905 [wandb_init.py:init():888] starting backend
12
+ 2026-01-14 17:35:48,984 INFO MainThread:1752905 [wandb_init.py:init():891] sending inform_init request
13
+ 2026-01-14 17:35:48,997 INFO MainThread:1752905 [wandb_init.py:init():899] backend started and connected
14
+ 2026-01-14 17:35:48,998 INFO MainThread:1752905 [wandb_init.py:init():969] updated telemetry
15
+ 2026-01-14 17:35:48,999 INFO MainThread:1752905 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout
16
+ 2026-01-14 17:35:49,154 INFO MainThread:1752905 [wandb_init.py:init():1040] starting run threads in backend
17
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_console_start():2504] atexit reg
18
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2352] redirect: wrap_raw
19
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2421] Wrapping output streams.
20
+ 2026-01-14 17:35:49,257 INFO MainThread:1752905 [wandb_run.py:_redirect():2444] Redirects installed.
21
+ 2026-01-14 17:35:49,259 INFO MainThread:1752905 [wandb_init.py:init():1080] run started, returning control to user process
22
+ 2026-01-14 17:35:49,260 INFO MainThread:1752905 [wandb_run.py:_config_callback():1385] config_cb None None {'peft_config': {'default': {'peft_type': 'OFT', 'auto_mapping': None, 'base_model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'revision': None, 'task_type': 'CAUSAL_LM', 'inference_mode': False, 'rank_pattern': {}, 'alpha_pattern': {}, 'r': 32, 'module_dropout': 0.0, 'target_modules': ['v_proj', 'q_proj'], 'init_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'modules_to_save': None, 'coft': False, 'eps': 0.0001, 'block_share': False}}, 'vocab_size': 32001, 'max_position_embeddings': 4096, 'hidden_size': 4096, 'intermediate_size': 11008, 'num_hidden_layers': 32, 'num_attention_heads': 32, 'num_key_value_heads': 32, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 128, 'return_dict': True, 'output_hidden_states': False, 'torchscript': False, 'dtype': 'float32', 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'task_specific_params': None, 'problem_type': None, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': 32000, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'num_beam_groups': 1, 'diversity_penalty': 0.0, '_name_or_path': 'meta-llama/Llama-2-7b-hf', 'transformers_version': '4.57.3', 'model_type': 'llama', 'tf_legacy_loss': False, 'use_bfloat16': False, 'output_attentions': False, 'output_dir': 'output/cms3', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 4, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 1e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 2.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.005, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'output/cms3/runs/Jan14_17-35-42_main1', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 200, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 0.0, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': True, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': 'no', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'model_name_or_path': 'meta-llama/Llama-2-7b-hf', 'adapter_name_or_path': None, 'data_path': './data/MetaMathQA-40K.json', 'dataset_split': 'train', 'dataset_field': ['query', 'response'], 'model_max_length': 512, 'hrft_r': 32, 'init_a': 0.0001, 'eps': 0.0001, 'lamda': 0.0001, 'add_orth': 'none', 'init_weights': True}
23
+ 2026-01-14 17:35:49,269 INFO MainThread:1752905 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 6746812416 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f1ba1cabf40>>
24
+ 2026-01-14 17:35:49,269 INFO MainThread:1752905 [wandb_run.py:_config_callback():1385] config_cb model/num_parameters 6746812416 None
25
+ 2026-01-14 21:46:20,466 INFO wandb-AsyncioManager-main:1752905 [service_client.py:_forward_responses():80] Reached EOF.
26
+ 2026-01-14 21:46:20,467 INFO wandb-AsyncioManager-main:1752905 [mailbox.py:close():137] Closing mailbox, abandoning 0 handles.
llama/wandb/offline-run-20260114_173548-7ubed6qe/run-7ubed6qe.wandb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e5c9530847687fcc34a120a91c7775ee55117bf74b57e04f06a07d00a0ce36a
3
+ size 2021175
llama/wandb/settings ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ [default]
2
+ mode = offline
3
+