raniero commited on
Commit
2fac470
Β·
verified Β·
1 Parent(s): a9ded78

upload ares56-test-text @ 2025-09-09T07:55:07.531105Z

Browse files
README.md CHANGED
@@ -7,12 +7,11 @@ tags: [lora, bittensor, subnet-56, gradients]
7
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
8
  ---
9
 
10
- # ARES56 β€” Instruction (LoRA)
11
- Adapter LoRA (r=8, alpha=16, dropout=0.05) per TinyLlama-1.1B-Chat-v1.0.
12
  File inclusi:
13
- - adapter_model.safetensors
14
- - adapter_config.json
15
- - tokenizer_config.json
16
- - special_tokens_map.json
17
 
18
- Output generato via Axolotl (CPU, smoke rapido). Nessun checkpoint completo incluso.
 
7
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
8
  ---
9
 
10
+ # ARES56 β€” LoRA adapter
 
11
  File inclusi:
12
+ - `adapter_model.safetensors` β€” SHA256: `6e4a69d350d932f2bf2292a0f6aadad42fcc0ddd7b8a378405faac7f5b7133b3`
13
+ - `adapter_config.json` β€” SHA256: `40e1d19b1b1393d8102640ef527ae2ef0184b1d1592066b623af5f890e2c88ad`
14
+ - `tokenizer_config.json` β€” SHA256: `27c5ddd03dd5e605959d3a0f6d4dcfc238e5475bbde941e8c358f3776ac1221b`
15
+ - `special_tokens_map.json` β€” SHA256: `82d96d7a9e6ced037f12394b7ea6a5b02e6ca87e0d11edaa8d60d9be857ce7db`
16
 
17
+ Output generato via Axolotl (CPU / smoke). Nessun checkpoint completo incluso.
chat_template.jinja ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}
2
+ {% if message['role'] == 'user' %}
3
+ {{ '<|user|>
4
+ ' + message['content'] + eos_token }}
5
+ {% elif message['role'] == 'system' %}
6
+ {{ '<|system|>
7
+ ' + message['content'] + eos_token }}
8
+ {% elif message['role'] == 'assistant' %}
9
+ {{ '<|assistant|>
10
+ ' + message['content'] + eos_token }}
11
+ {% endif %}
12
+ {% if loop.last and add_generation_prompt %}
13
+ {{ '<|assistant|>' }}
14
+ {% endif %}
15
+ {% endfor %}
checkpoint-10/README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
8
+ - lora
9
+ - transformers
10
+ ---
11
+
12
+ # Model Card for Model ID
13
+
14
+ <!-- Provide a quick summary of what the model is/does. -->
15
+
16
+
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+
25
+
26
+ - **Developed by:** [More Information Needed]
27
+ - **Funded by [optional]:** [More Information Needed]
28
+ - **Shared by [optional]:** [More Information Needed]
29
+ - **Model type:** [More Information Needed]
30
+ - **Language(s) (NLP):** [More Information Needed]
31
+ - **License:** [More Information Needed]
32
+ - **Finetuned from model [optional]:** [More Information Needed]
33
+
34
+ ### Model Sources [optional]
35
+
36
+ <!-- Provide the basic links for the model. -->
37
+
38
+ - **Repository:** [More Information Needed]
39
+ - **Paper [optional]:** [More Information Needed]
40
+ - **Demo [optional]:** [More Information Needed]
41
+
42
+ ## Uses
43
+
44
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
45
+
46
+ ### Direct Use
47
+
48
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Downstream Use [optional]
53
+
54
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
55
+
56
+ [More Information Needed]
57
+
58
+ ### Out-of-Scope Use
59
+
60
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ## Bias, Risks, and Limitations
65
+
66
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
67
+
68
+ [More Information Needed]
69
+
70
+ ### Recommendations
71
+
72
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
73
+
74
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
75
+
76
+ ## How to Get Started with the Model
77
+
78
+ Use the code below to get started with the model.
79
+
80
+ [More Information Needed]
81
+
82
+ ## Training Details
83
+
84
+ ### Training Data
85
+
86
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
87
+
88
+ [More Information Needed]
89
+
90
+ ### Training Procedure
91
+
92
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
93
+
94
+ #### Preprocessing [optional]
95
+
96
+ [More Information Needed]
97
+
98
+
99
+ #### Training Hyperparameters
100
+
101
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
102
+
103
+ #### Speeds, Sizes, Times [optional]
104
+
105
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
106
+
107
+ [More Information Needed]
108
+
109
+ ## Evaluation
110
+
111
+ <!-- This section describes the evaluation protocols and provides the results. -->
112
+
113
+ ### Testing Data, Factors & Metrics
114
+
115
+ #### Testing Data
116
+
117
+ <!-- This should link to a Dataset Card if possible. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Factors
122
+
123
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
124
+
125
+ [More Information Needed]
126
+
127
+ #### Metrics
128
+
129
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
130
+
131
+ [More Information Needed]
132
+
133
+ ### Results
134
+
135
+ [More Information Needed]
136
+
137
+ #### Summary
138
+
139
+
140
+
141
+ ## Model Examination [optional]
142
+
143
+ <!-- Relevant interpretability work for the model goes here -->
144
+
145
+ [More Information Needed]
146
+
147
+ ## Environmental Impact
148
+
149
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
150
+
151
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
152
+
153
+ - **Hardware Type:** [More Information Needed]
154
+ - **Hours used:** [More Information Needed]
155
+ - **Cloud Provider:** [More Information Needed]
156
+ - **Compute Region:** [More Information Needed]
157
+ - **Carbon Emitted:** [More Information Needed]
158
+
159
+ ## Technical Specifications [optional]
160
+
161
+ ### Model Architecture and Objective
162
+
163
+ [More Information Needed]
164
+
165
+ ### Compute Infrastructure
166
+
167
+ [More Information Needed]
168
+
169
+ #### Hardware
170
+
171
+ [More Information Needed]
172
+
173
+ #### Software
174
+
175
+ [More Information Needed]
176
+
177
+ ## Citation [optional]
178
+
179
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
180
+
181
+ **BibTeX:**
182
+
183
+ [More Information Needed]
184
+
185
+ **APA:**
186
+
187
+ [More Information Needed]
188
+
189
+ ## Glossary [optional]
190
+
191
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
192
+
193
+ [More Information Needed]
194
+
195
+ ## More Information [optional]
196
+
197
+ [More Information Needed]
198
+
199
+ ## Model Card Authors [optional]
200
+
201
+ [More Information Needed]
202
+
203
+ ## Model Card Contact
204
+
205
+ [More Information Needed]
206
+ ### Framework versions
207
+
208
+ - PEFT 0.17.1
checkpoint-10/adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": null,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 8,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "gate_proj",
30
+ "up_proj",
31
+ "v_proj",
32
+ "k_proj",
33
+ "o_proj",
34
+ "down_proj"
35
+ ],
36
+ "target_parameters": [],
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
checkpoint-10/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e4a69d350d932f2bf2292a0f6aadad42fcc0ddd7b8a378405faac7f5b7133b3
3
+ size 25271744
checkpoint-10/chat_template.jinja ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}
2
+ {% if message['role'] == 'user' %}
3
+ {{ '<|user|>
4
+ ' + message['content'] + eos_token }}
5
+ {% elif message['role'] == 'system' %}
6
+ {{ '<|system|>
7
+ ' + message['content'] + eos_token }}
8
+ {% elif message['role'] == 'assistant' %}
9
+ {{ '<|assistant|>
10
+ ' + message['content'] + eos_token }}
11
+ {% endif %}
12
+ {% if loop.last and add_generation_prompt %}
13
+ {{ '<|assistant|>' }}
14
+ {% endif %}
15
+ {% endfor %}
checkpoint-10/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366ce750c8c7fcc0ac3f637571b484ed8fb777c9ffc5cf76031dd11968108591
3
+ size 50712506
checkpoint-10/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:542b676257de67eccbb5ecdff64ef4b633bda6598cafadf9a431a0a32a1d692f
3
+ size 13990
checkpoint-10/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92895d149359090ddf3ae4a97578a9224915364b8880d14de783a868db38bfc8
3
+ size 1064
checkpoint-10/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
checkpoint-10/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-10/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
checkpoint-10/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": false,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "padding_side": "right",
39
+ "sp_model_kwargs": {},
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
checkpoint-10/trainer_state.json ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.2,
6
+ "eval_steps": 500,
7
+ "global_step": 10,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.02,
14
+ "grad_norm": 5.485438823699951,
15
+ "learning_rate": 0.0002,
16
+ "loss": 4.5061,
17
+ "memory/device_reserved (GiB)": 0.0,
18
+ "memory/max_active (GiB)": 0.0,
19
+ "memory/max_allocated (GiB)": 0.0,
20
+ "step": 1
21
+ },
22
+ {
23
+ "epoch": 0.04,
24
+ "grad_norm": 4.593176364898682,
25
+ "learning_rate": 0.00019510565162951537,
26
+ "loss": 3.7913,
27
+ "memory/device_reserved (GiB)": 0.0,
28
+ "memory/max_active (GiB)": 0.0,
29
+ "memory/max_allocated (GiB)": 0.0,
30
+ "step": 2
31
+ },
32
+ {
33
+ "epoch": 0.06,
34
+ "grad_norm": 4.607494354248047,
35
+ "learning_rate": 0.00018090169943749476,
36
+ "loss": 3.0368,
37
+ "memory/device_reserved (GiB)": 0.0,
38
+ "memory/max_active (GiB)": 0.0,
39
+ "memory/max_allocated (GiB)": 0.0,
40
+ "step": 3
41
+ },
42
+ {
43
+ "epoch": 0.08,
44
+ "grad_norm": 4.247849464416504,
45
+ "learning_rate": 0.00015877852522924732,
46
+ "loss": 2.4057,
47
+ "memory/device_reserved (GiB)": 0.0,
48
+ "memory/max_active (GiB)": 0.0,
49
+ "memory/max_allocated (GiB)": 0.0,
50
+ "step": 4
51
+ },
52
+ {
53
+ "epoch": 0.1,
54
+ "grad_norm": 3.5455574989318848,
55
+ "learning_rate": 0.00013090169943749476,
56
+ "loss": 1.9879,
57
+ "memory/device_reserved (GiB)": 0.0,
58
+ "memory/max_active (GiB)": 0.0,
59
+ "memory/max_allocated (GiB)": 0.0,
60
+ "step": 5
61
+ },
62
+ {
63
+ "epoch": 0.12,
64
+ "grad_norm": 3.5534489154815674,
65
+ "learning_rate": 0.0001,
66
+ "loss": 1.6576,
67
+ "memory/device_reserved (GiB)": 0.0,
68
+ "memory/max_active (GiB)": 0.0,
69
+ "memory/max_allocated (GiB)": 0.0,
70
+ "step": 6
71
+ },
72
+ {
73
+ "epoch": 0.14,
74
+ "grad_norm": 3.670276403427124,
75
+ "learning_rate": 6.909830056250527e-05,
76
+ "loss": 1.4126,
77
+ "memory/device_reserved (GiB)": 0.0,
78
+ "memory/max_active (GiB)": 0.0,
79
+ "memory/max_allocated (GiB)": 0.0,
80
+ "step": 7
81
+ },
82
+ {
83
+ "epoch": 0.16,
84
+ "grad_norm": 4.0369062423706055,
85
+ "learning_rate": 4.12214747707527e-05,
86
+ "loss": 1.2206,
87
+ "memory/device_reserved (GiB)": 0.0,
88
+ "memory/max_active (GiB)": 0.0,
89
+ "memory/max_allocated (GiB)": 0.0,
90
+ "step": 8
91
+ },
92
+ {
93
+ "epoch": 0.18,
94
+ "grad_norm": 4.194610595703125,
95
+ "learning_rate": 1.9098300562505266e-05,
96
+ "loss": 1.0935,
97
+ "memory/device_reserved (GiB)": 0.0,
98
+ "memory/max_active (GiB)": 0.0,
99
+ "memory/max_allocated (GiB)": 0.0,
100
+ "step": 9
101
+ },
102
+ {
103
+ "epoch": 0.2,
104
+ "grad_norm": 4.174754619598389,
105
+ "learning_rate": 4.8943483704846475e-06,
106
+ "loss": 1.0354,
107
+ "memory/device_reserved (GiB)": 0.0,
108
+ "memory/max_active (GiB)": 0.0,
109
+ "memory/max_allocated (GiB)": 0.0,
110
+ "step": 10
111
+ }
112
+ ],
113
+ "logging_steps": 1,
114
+ "max_steps": 10,
115
+ "num_input_tokens_seen": 0,
116
+ "num_train_epochs": 1,
117
+ "save_steps": 10,
118
+ "stateful_callbacks": {
119
+ "TrainerControl": {
120
+ "args": {
121
+ "should_epoch_stop": false,
122
+ "should_evaluate": false,
123
+ "should_log": false,
124
+ "should_save": true,
125
+ "should_training_stop": true
126
+ },
127
+ "attributes": {}
128
+ }
129
+ },
130
+ "total_flos": 7993499320320.0,
131
+ "train_batch_size": 1,
132
+ "trial_name": null,
133
+ "trial_params": null
134
+ }
checkpoint-10/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:459c729d5fdea1794088448648c4a7b2bcf30f0c9225aee3b632430c3ba5fb87
3
+ size 6840
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2048,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 5632,
15
+ "max_position_embeddings": 2048,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 32,
19
+ "num_hidden_layers": 22,
20
+ "num_key_value_heads": 4,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 10000.0,
25
+ "tie_word_embeddings": false,
26
+ "transformers_version": "4.56.1",
27
+ "use_cache": false,
28
+ "vocab_size": 32000
29
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
train.log ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/10 [00:00<?, ?it/s]
1
  10%|β–ˆ | 1/10 [00:01<00:13, 1.45s/it]
2
 
 
3
  10%|β–ˆ | 1/10 [00:01<00:13, 1.45s/it]
4
  20%|β–ˆβ–ˆ | 2/10 [00:02<00:09, 1.22s/it]
5
 
 
6
  20%|β–ˆβ–ˆ | 2/10 [00:02<00:09, 1.22s/it]
7
  30%|β–ˆβ–ˆβ–ˆ | 3/10 [00:03<00:08, 1.18s/it]
8
 
 
9
  30%|β–ˆβ–ˆβ–ˆ | 3/10 [00:03<00:08, 1.18s/it]
10
  40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:04<00:06, 1.15s/it]
11
 
 
12
  40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:04<00:06, 1.15s/it]
13
  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:05<00:05, 1.15s/it]
14
 
 
15
  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:05<00:05, 1.15s/it]
16
  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:06<00:04, 1.08s/it]
17
 
 
18
  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:06<00:04, 1.08s/it]
19
  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:07<00:03, 1.03s/it]
20
 
 
21
  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:07<00:03, 1.03s/it]
22
  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:08<00:02, 1.01s/it]
23
 
 
24
  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:08<00:02, 1.01s/it]
25
  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:09<00:00, 1.02it/s]
26
 
 
27
  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:09<00:00, 1.02it/s]
28
 
 
 
 
29
 
 
 
 
 
1
+ [2025-09-09 07:47:05,190] [INFO] [axolotl.cli.config.load_cfg:245] [PID:37] [RANK:0] config:
2
+ {
3
+ "activation_offloading": false,
4
+ "adapter": "lora",
5
+ "attn_implementation": "eager",
6
+ "axolotl_config_path": "/app/checkpoints/instr-fast-052b/ares56-test-text/train_instr-fast-052b.yml",
7
+ "base_model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
8
+ "base_model_config": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
9
+ "batch_size": 1,
10
+ "bf16": false,
11
+ "capabilities": {
12
+ "bf16": false,
13
+ "fp8": false,
14
+ "n_gpu": 1,
15
+ "n_node": 1
16
+ },
17
+ "context_parallel_size": 1,
18
+ "dataloader_num_workers": 1,
19
+ "dataloader_pin_memory": true,
20
+ "dataloader_prefetch_factor": 256,
21
+ "dataset_processes": 32,
22
+ "datasets": [
23
+ {
24
+ "message_property_mappings": {
25
+ "content": "content",
26
+ "role": "role"
27
+ },
28
+ "path": "/app/axolotl/data/mini_instruct_50.jsonl",
29
+ "trust_remote_code": false,
30
+ "type": "alpaca"
31
+ }
32
+ ],
33
+ "ddp": false,
34
+ "device": "cpu",
35
+ "device_map": "auto",
36
+ "dion_rank_fraction": 1.0,
37
+ "dion_rank_multiple_of": 1,
38
+ "env_capabilities": {
39
+ "torch_version": "2.6.0"
40
+ },
41
+ "eval_batch_size": 1,
42
+ "eval_causal_lm_metrics": [
43
+ "sacrebleu",
44
+ "comet",
45
+ "ter",
46
+ "chrf"
47
+ ],
48
+ "eval_max_new_tokens": 128,
49
+ "eval_steps": 0,
50
+ "eval_table_size": 0,
51
+ "experimental_skip_move_to_device": true,
52
+ "fp16": false,
53
+ "gradient_accumulation_steps": 1,
54
+ "gradient_checkpointing": false,
55
+ "is_llama_derived_model": true,
56
+ "learning_rate": 0.0002,
57
+ "lisa_layers_attribute": "model.layers",
58
+ "load_best_model_at_end": false,
59
+ "load_in_4bit": false,
60
+ "load_in_8bit": false,
61
+ "local_rank": 0,
62
+ "logging_steps": 1,
63
+ "lora_alpha": 16,
64
+ "lora_dropout": 0.05,
65
+ "lora_r": 8,
66
+ "lora_target_modules": [
67
+ "q_proj",
68
+ "k_proj",
69
+ "v_proj",
70
+ "o_proj",
71
+ "gate_proj",
72
+ "up_proj",
73
+ "down_proj"
74
+ ],
75
+ "loraplus_lr_embedding": 1e-06,
76
+ "lr_scheduler": "cosine",
77
+ "max_prompt_len": 512,
78
+ "max_steps": 10,
79
+ "mean_resizing_embeddings": false,
80
+ "micro_batch_size": 1,
81
+ "model_config_type": "llama",
82
+ "num_epochs": 1.0,
83
+ "optimizer": "adamw_torch",
84
+ "output_dir": "/app/checkpoints/instr-fast-052b/ares56-test-text",
85
+ "pretrain_multipack_attn": true,
86
+ "profiler_steps_start": 0,
87
+ "qlora_sharded_model_loading": false,
88
+ "ray_num_workers": 1,
89
+ "resources_per_worker": {
90
+ "GPU": 1
91
+ },
92
+ "sample_packing": false,
93
+ "sample_packing_bin_size": 200,
94
+ "sample_packing_group_size": 100000,
95
+ "save_only_model": false,
96
+ "save_safetensors": true,
97
+ "save_steps": 10,
98
+ "save_strategy": "steps",
99
+ "save_total_limit": 1,
100
+ "sequence_len": 256,
101
+ "shuffle_before_merging_datasets": false,
102
+ "shuffle_merged_datasets": true,
103
+ "skip_prepare_dataset": false,
104
+ "streaming_multipack_buffer_size": 10000,
105
+ "strict": false,
106
+ "tensor_parallel_size": 1,
107
+ "tf32": false,
108
+ "tiled_mlp_use_original_mlp": true,
109
+ "tokenizer_config": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
110
+ "tokenizer_save_jinja_files": true,
111
+ "torch_dtype": "torch.float32",
112
+ "train_on_inputs": false,
113
+ "trl": {
114
+ "log_completions": false,
115
+ "mask_truncated_completions": false,
116
+ "ref_model_mixup_alpha": 0.9,
117
+ "ref_model_sync_steps": 64,
118
+ "scale_rewards": true,
119
+ "sync_ref_model": false,
120
+ "use_vllm": false,
121
+ "vllm_server_host": "0.0.0.0",
122
+ "vllm_server_port": 8000
123
+ },
124
+ "use_ray": false,
125
+ "val_set_size": 0.0,
126
+ "vllm": {
127
+ "device": "auto",
128
+ "dtype": "auto",
129
+ "gpu_memory_utilization": 0.9,
130
+ "host": "0.0.0.0",
131
+ "port": 8000
132
+ },
133
+ "warmup_steps": 0,
134
+ "weight_decay": 0.0,
135
+ "world_size": 1
136
+ }
137
+ [2025-09-09 07:47:05,871] [INFO] [axolotl.loaders.tokenizer.load_tokenizer:300] [PID:37] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
138
+ [2025-09-09 07:47:05,871] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:476] [PID:37] [RANK:0] Unable to find prepared dataset in last_run_prepared/103416ae75fe35cf3a7cdd59f8415c5e
139
+ [2025-09-09 07:47:05,871] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:37] [RANK:0] Loading raw datasets...
140
+ [2025-09-09 07:47:05,871] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:37] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.
141
+
142
+ [2025-09-09 07:47:06,858] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:37] [RANK:0] Loading dataset: /app/axolotl/data/mini_instruct_50.jsonl with base_type: alpaca and prompt_style: None
143
+
144
+ [2025-09-09 07:47:07,731] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:218] [PID:37] [RANK:0] min_input_len: 69
145
+ [2025-09-09 07:47:07,731] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:220] [PID:37] [RANK:0] max_input_len: 71
146
+
147
+
148
+ [2025-09-09 07:47:08,152] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:37] [RANK:0] Maximum number of steps set at 10
149
+ [2025-09-09 07:47:08,722] [INFO] [axolotl.loaders.tokenizer.load_tokenizer:300] [PID:37] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
150
+ [2025-09-09 07:47:08,917] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:37] [RANK:0] Patched Trainer.evaluation_loop with nanmean loss calculation
151
+ [2025-09-09 07:47:08,918] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:37] [RANK:0] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation
152
+ `torch_dtype` is deprecated! Use `dtype` instead!
153
+ [2025-09-09 07:47:09,681] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:351] [PID:37] [RANK:0] Converting modules to torch.float32
154
+ trainable params: 6,307,840 || all params: 1,106,356,224 || trainable%: 0.5701
155
+ [2025-09-09 07:47:10,932] [INFO] [axolotl.train.save_initial_configs:414] [PID:37] [RANK:0] Pre-saving adapter config to /app/checkpoints/instr-fast-052b/ares56-test-text...
156
+ [2025-09-09 07:47:10,932] [INFO] [axolotl.train.save_initial_configs:418] [PID:37] [RANK:0] Pre-saving tokenizer to /app/checkpoints/instr-fast-052b/ares56-test-text...
157
+ [2025-09-09 07:47:10,946] [INFO] [axolotl.train.save_initial_configs:423] [PID:37] [RANK:0] Pre-saving model config to /app/checkpoints/instr-fast-052b/ares56-test-text...
158
+ [2025-09-09 07:47:10,947] [INFO] [axolotl.train.execute_training:203] [PID:37] [RANK:0] Starting trainer...
159
+
160
  0%| | 0/10 [00:00<?, ?it/s]
161
  10%|β–ˆ | 1/10 [00:01<00:13, 1.45s/it]
162
 
163
+
164
  10%|β–ˆ | 1/10 [00:01<00:13, 1.45s/it]
165
  20%|β–ˆβ–ˆ | 2/10 [00:02<00:09, 1.22s/it]
166
 
167
+
168
  20%|β–ˆβ–ˆ | 2/10 [00:02<00:09, 1.22s/it]
169
  30%|β–ˆβ–ˆβ–ˆ | 3/10 [00:03<00:08, 1.18s/it]
170
 
171
+
172
  30%|β–ˆβ–ˆβ–ˆ | 3/10 [00:03<00:08, 1.18s/it]
173
  40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:04<00:06, 1.15s/it]
174
 
175
+
176
  40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:04<00:06, 1.15s/it]
177
  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:05<00:05, 1.15s/it]
178
 
179
+
180
  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:05<00:05, 1.15s/it]
181
  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:06<00:04, 1.08s/it]
182
 
183
+
184
  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:06<00:04, 1.08s/it]
185
  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:07<00:03, 1.03s/it]
186
 
187
+
188
  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:07<00:03, 1.03s/it]
189
  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:08<00:02, 1.01s/it]
190
 
191
+
192
  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:08<00:02, 1.01s/it]
193
  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:09<00:00, 1.02it/s]
194
 
195
+
196
  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:09<00:00, 1.02it/s]
197
 
198
+
199
+ [2025-09-09 07:47:22,404] [INFO] [axolotl.core.trainers.base._save:681] [PID:37] [RANK:0] Saving Trainer.data_collator.tokenizer by default as Trainer.processing_class is `None`
200
+
201
 
202
+
203
+ [2025-09-09 07:47:22,504] [INFO] [axolotl.train.save_trained_model:228] [PID:37] [RANK:0] Training completed! Saving trained model to /app/checkpoints/instr-fast-052b/ares56-test-text.
204
+ [2025-09-09 07:47:22,841] [INFO] [axolotl.train.save_trained_model:352] [PID:37] [RANK:0] Model successfully saved to /app/checkpoints/instr-fast-052b/ares56-test-text
train_instr-fast-052b.yml ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ prompt_style: alpaca
2
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
3
+ adapter: lora
4
+
5
+ # Solo CPU per smoke
6
+ load_in_8bit: false
7
+ load_in_4bit: false
8
+ bf16: false
9
+ fp16: false
10
+ tf32: false
11
+ flash_attn: false
12
+ torch_dtype: torch.float32
13
+ attn_implementation: eager
14
+
15
+ datasets:
16
+ - path: /app/axolotl/data/mini_instruct_50.jsonl
17
+ type: alpaca
18
+ field_instruction: instruction
19
+ field_input: input
20
+ field_output: output
21
+ prompt_style: alpaca
22
+ output_dir: /app/checkpoints/instr-fast-052b/ares56-test-text
23
+
24
+ sequence_len: 256
25
+ sample_packing: false
26
+ val_set_size: 0
27
+ micro_batch_size: 1
28
+ gradient_accumulation_steps: 1
29
+ num_epochs: 1
30
+ max_steps: 10
31
+ save_steps: 10
32
+ logging_steps: 1
33
+ eval_steps: 0
34
+
35
+ optimizer: adamw_torch
36
+ learning_rate: 2e-4
37
+ warmup_steps: 0
38
+ weight_decay: 0.0
39
+
40
+ # ==== LoRA ====
41
+ lora_r: 8
42
+ lora_alpha: 16
43
+ lora_dropout: 0.05
44
+ lora_target_modules:
45
+ - q_proj
46
+ - k_proj
47
+ - v_proj
48
+ - o_proj
49
+ - gate_proj
50
+ - up_proj
51
+ - down_proj
52
+
53
+ # ==== Salvataggio solo adapter ====
54
+ save_safetensors: true
55
+ save_16bit: false
56
+
57
+
58
+ save_strategy: steps
59
+ save_total_limit: 1
60
+ save_only_adapter: true