raniero commited on Sep 9, 2025

Commit

2fac470

verified ·

1 Parent(s): a9ded78

upload ares56-test-text @ 2025-09-09T07:55:07.531105Z

Browse files

Files changed (20) hide show

README.md +6 -7
chat_template.jinja +15 -0
checkpoint-10/README.md +208 -0
checkpoint-10/adapter_config.json +42 -0
checkpoint-10/adapter_model.safetensors +3 -0
checkpoint-10/chat_template.jinja +15 -0
checkpoint-10/optimizer.pt +3 -0
checkpoint-10/rng_state.pth +3 -0
checkpoint-10/scheduler.pt +3 -0
checkpoint-10/special_tokens_map.json +30 -0
checkpoint-10/tokenizer.json +0 -0
checkpoint-10/tokenizer.model +3 -0
checkpoint-10/tokenizer_config.json +43 -0
checkpoint-10/trainer_state.json +134 -0
checkpoint-10/training_args.bin +3 -0
config.json +29 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
train.log +174 -0
train_instr-fast-052b.yml +60 -0

README.md CHANGED Viewed

@@ -7,12 +7,11 @@ tags: [lora, bittensor, subnet-56, gradients]
 base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
 ---
-# ARES56 — Instruction (LoRA)
-Adapter LoRA (r=8, alpha=16, dropout=0.05) per TinyLlama-1.1B-Chat-v1.0.
 File inclusi:
-- adapter_model.safetensors
-- adapter_config.json
-- tokenizer_config.json
-- special_tokens_map.json
-Output generato via Axolotl (CPU, smoke rapido). Nessun checkpoint completo incluso.

 base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
 ---
+# ARES56 — LoRA adapter
 File inclusi:
+- `adapter_model.safetensors` — SHA256: `6e4a69d350d932f2bf2292a0f6aadad42fcc0ddd7b8a378405faac7f5b7133b3`
+- `adapter_config.json` — SHA256: `40e1d19b1b1393d8102640ef527ae2ef0184b1d1592066b623af5f890e2c88ad`
+- `tokenizer_config.json` — SHA256: `27c5ddd03dd5e605959d3a0f6d4dcfc238e5475bbde941e8c358f3776ac1221b`
+- `special_tokens_map.json` — SHA256: `82d96d7a9e6ced037f12394b7ea6a5b02e6ca87e0d11edaa8d60d9be857ce7db`
+Output generato via Axolotl (CPU / smoke). Nessun checkpoint completo incluso.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-10/README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- axolotl
+- base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-10/adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "gate_proj",
+    "up_proj",
+    "v_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "target_parameters": [],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-10/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e4a69d350d932f2bf2292a0f6aadad42fcc0ddd7b8a378405faac7f5b7133b3
+size 25271744

checkpoint-10/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-10/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:366ce750c8c7fcc0ac3f637571b484ed8fb777c9ffc5cf76031dd11968108591
+size 50712506

checkpoint-10/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:542b676257de67eccbb5ecdff64ef4b633bda6598cafadf9a431a0a32a1d692f
+size 13990

checkpoint-10/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92895d149359090ddf3ae4a97578a9224915364b8880d14de783a868db38bfc8
+size 1064

checkpoint-10/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-10/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-10/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

checkpoint-10/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-10/trainer_state.json ADDED Viewed

	@@ -0,0 +1,134 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2,
+  "eval_steps": 500,
+  "global_step": 10,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02,
+      "grad_norm": 5.485438823699951,
+      "learning_rate": 0.0002,
+      "loss": 4.5061,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 4.593176364898682,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 3.7913,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 4.607494354248047,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 3.0368,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 4.247849464416504,
+      "learning_rate": 0.00015877852522924732,
+      "loss": 2.4057,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 3.5455574989318848,
+      "learning_rate": 0.00013090169943749476,
+      "loss": 1.9879,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 3.5534489154815674,
+      "learning_rate": 0.0001,
+      "loss": 1.6576,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 3.670276403427124,
+      "learning_rate": 6.909830056250527e-05,
+      "loss": 1.4126,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 4.0369062423706055,
+      "learning_rate": 4.12214747707527e-05,
+      "loss": 1.2206,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 4.194610595703125,
+      "learning_rate": 1.9098300562505266e-05,
+      "loss": 1.0935,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 4.174754619598389,
+      "learning_rate": 4.8943483704846475e-06,
+      "loss": 1.0354,
+      "memory/device_reserved (GiB)": 0.0,
+      "memory/max_active (GiB)": 0.0,
+      "memory/max_allocated (GiB)": 0.0,
+      "step": 10
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 10,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 10,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7993499320320.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-10/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:459c729d5fdea1794088448648c4a7b2bcf30f0c9225aee3b632430c3ba5fb87
+size 6840

config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5632,
+  "max_position_embeddings": 2048,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 22,
+  "num_key_value_heads": 4,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.56.1",
+  "use_cache": false,
+  "vocab_size": 32000
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

train.log ADDED Viewed

@@ -0,0 +1,174 @@
  0%|          | 0/10 [00:00<?, ?it/s]
 10%|█         | 1/10 [00:01<00:13,  1.45s/it]
 10%|█         | 1/10 [00:01<00:13,  1.45s/it]
 20%|██        | 2/10 [00:02<00:09,  1.22s/it]
 20%|██        | 2/10 [00:02<00:09,  1.22s/it]
 30%|███       | 3/10 [00:03<00:08,  1.18s/it]
 30%|███       | 3/10 [00:03<00:08,  1.18s/it]
 40%|████      | 4/10 [00:04<00:06,  1.15s/it]
 40%|████      | 4/10 [00:04<00:06,  1.15s/it]
 50%|█████     | 5/10 [00:05<00:05,  1.15s/it]
 50%|█████     | 5/10 [00:05<00:05,  1.15s/it]
 60%|██████    | 6/10 [00:06<00:04,  1.08s/it]
 60%|██████    | 6/10 [00:06<00:04,  1.08s/it]
 70%|███████   | 7/10 [00:07<00:03,  1.03s/it]
 70%|███████   | 7/10 [00:07<00:03,  1.03s/it]
 80%|████████  | 8/10 [00:08<00:02,  1.01s/it]
 80%|████████  | 8/10 [00:08<00:02,  1.01s/it]
 90%|█████████ | 9/10 [00:09<00:00,  1.02it/s]
 90%|█████████ | 9/10 [00:09<00:00,  1.02it/s]

+[2025-09-09 07:47:05,190] [INFO] [axolotl.cli.config.load_cfg:245] [PID:37] [RANK:0] config:
+{
+  "activation_offloading": false,
+  "adapter": "lora",
+  "attn_implementation": "eager",
+  "axolotl_config_path": "/app/checkpoints/instr-fast-052b/ares56-test-text/train_instr-fast-052b.yml",
+  "base_model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "base_model_config": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "batch_size": 1,
+  "bf16": false,
+  "capabilities": {
+    "bf16": false,
+    "fp8": false,
+    "n_gpu": 1,
+    "n_node": 1
+  },
+  "context_parallel_size": 1,
+  "dataloader_num_workers": 1,
+  "dataloader_pin_memory": true,
+  "dataloader_prefetch_factor": 256,
+  "dataset_processes": 32,
+  "datasets": [
+    {
+      "message_property_mappings": {
+        "content": "content",
+        "role": "role"
+      },
+      "path": "/app/axolotl/data/mini_instruct_50.jsonl",
+      "trust_remote_code": false,
+      "type": "alpaca"
+    }
+  ],
+  "ddp": false,
+  "device": "cpu",
+  "device_map": "auto",
+  "dion_rank_fraction": 1.0,
+  "dion_rank_multiple_of": 1,
+  "env_capabilities": {
+    "torch_version": "2.6.0"
+  },
+  "eval_batch_size": 1,
+  "eval_causal_lm_metrics": [
+    "sacrebleu",
+    "comet",
+    "ter",
+    "chrf"
+  ],
+  "eval_max_new_tokens": 128,
+  "eval_steps": 0,
+  "eval_table_size": 0,
+  "experimental_skip_move_to_device": true,
+  "fp16": false,
+  "gradient_accumulation_steps": 1,
+  "gradient_checkpointing": false,
+  "is_llama_derived_model": true,
+  "learning_rate": 0.0002,
+  "lisa_layers_attribute": "model.layers",
+  "load_best_model_at_end": false,
+  "load_in_4bit": false,
+  "load_in_8bit": false,
+  "local_rank": 0,
+  "logging_steps": 1,
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "lora_r": 8,
+  "lora_target_modules": [
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "up_proj",
+    "down_proj"
+  ],
+  "loraplus_lr_embedding": 1e-06,
+  "lr_scheduler": "cosine",
+  "max_prompt_len": 512,
+  "max_steps": 10,
+  "mean_resizing_embeddings": false,
+  "micro_batch_size": 1,
+  "model_config_type": "llama",
+  "num_epochs": 1.0,
+  "optimizer": "adamw_torch",
+  "output_dir": "/app/checkpoints/instr-fast-052b/ares56-test-text",
+  "pretrain_multipack_attn": true,
+  "profiler_steps_start": 0,
+  "qlora_sharded_model_loading": false,
+  "ray_num_workers": 1,
+  "resources_per_worker": {
+    "GPU": 1
+  },
+  "sample_packing": false,
+  "sample_packing_bin_size": 200,
+  "sample_packing_group_size": 100000,
+  "save_only_model": false,
+  "save_safetensors": true,
+  "save_steps": 10,
+  "save_strategy": "steps",
+  "save_total_limit": 1,
+  "sequence_len": 256,
+  "shuffle_before_merging_datasets": false,
+  "shuffle_merged_datasets": true,
+  "skip_prepare_dataset": false,
+  "streaming_multipack_buffer_size": 10000,
+  "strict": false,
+  "tensor_parallel_size": 1,
+  "tf32": false,
+  "tiled_mlp_use_original_mlp": true,
+  "tokenizer_config": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "tokenizer_save_jinja_files": true,
+  "torch_dtype": "torch.float32",
+  "train_on_inputs": false,
+  "trl": {
+    "log_completions": false,
+    "mask_truncated_completions": false,
+    "ref_model_mixup_alpha": 0.9,
+    "ref_model_sync_steps": 64,
+    "scale_rewards": true,
+    "sync_ref_model": false,
+    "use_vllm": false,
+    "vllm_server_host": "0.0.0.0",
+    "vllm_server_port": 8000
+  },
+  "use_ray": false,
+  "val_set_size": 0.0,
+  "vllm": {
+    "device": "auto",
+    "dtype": "auto",
+    "gpu_memory_utilization": 0.9,
+    "host": "0.0.0.0",
+    "port": 8000
+  },
+  "warmup_steps": 0,
+  "weight_decay": 0.0,
+  "world_size": 1
+}[39m
+[2025-09-09 07:47:05,871] [INFO] [axolotl.loaders.tokenizer.load_tokenizer:300] [PID:37] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.[39m
+[2025-09-09 07:47:05,871] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:476] [PID:37] [RANK:0] Unable to find prepared dataset in last_run_prepared/103416ae75fe35cf3a7cdd59f8415c5e[39m
+[2025-09-09 07:47:05,871] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:37] [RANK:0] Loading raw datasets...[39m
+[33m[2025-09-09 07:47:05,871] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:37] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.[39m
+[2025-09-09 07:47:06,858] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:37] [RANK:0] Loading dataset: /app/axolotl/data/mini_instruct_50.jsonl with base_type: alpaca and prompt_style: None[39m
+[2025-09-09 07:47:07,731] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:218] [PID:37] [RANK:0] min_input_len: 69[39m
+[2025-09-09 07:47:07,731] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:220] [PID:37] [RANK:0] max_input_len: 71[39m
+[2025-09-09 07:47:08,152] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:37] [RANK:0] Maximum number of steps set at 10[39m
+[2025-09-09 07:47:08,722] [INFO] [axolotl.loaders.tokenizer.load_tokenizer:300] [PID:37] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.[39m
+[2025-09-09 07:47:08,917] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:37] [RANK:0] Patched Trainer.evaluation_loop with nanmean loss calculation[39m
+[2025-09-09 07:47:08,918] [INFO] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:37] [RANK:0] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation[39m
+`torch_dtype` is deprecated! Use `dtype` instead!
+[2025-09-09 07:47:09,681] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:351] [PID:37] [RANK:0] Converting modules to torch.float32[39m
+trainable params: 6,307,840 || all params: 1,106,356,224 || trainable%: 0.5701
+[2025-09-09 07:47:10,932] [INFO] [axolotl.train.save_initial_configs:414] [PID:37] [RANK:0] Pre-saving adapter config to /app/checkpoints/instr-fast-052b/ares56-test-text...[39m
+[2025-09-09 07:47:10,932] [INFO] [axolotl.train.save_initial_configs:418] [PID:37] [RANK:0] Pre-saving tokenizer to /app/checkpoints/instr-fast-052b/ares56-test-text...[39m
+[2025-09-09 07:47:10,946] [INFO] [axolotl.train.save_initial_configs:423] [PID:37] [RANK:0] Pre-saving model config to /app/checkpoints/instr-fast-052b/ares56-test-text...[39m
+[2025-09-09 07:47:10,947] [INFO] [axolotl.train.execute_training:203] [PID:37] [RANK:0] Starting trainer...[39m
  0%|          | 0/10 [00:00<?, ?it/s]
 10%|█         | 1/10 [00:01<00:13,  1.45s/it]
 10%|█         | 1/10 [00:01<00:13,  1.45s/it]
 20%|██        | 2/10 [00:02<00:09,  1.22s/it]
 20%|██        | 2/10 [00:02<00:09,  1.22s/it]
 30%|███       | 3/10 [00:03<00:08,  1.18s/it]
 30%|███       | 3/10 [00:03<00:08,  1.18s/it]
 40%|████      | 4/10 [00:04<00:06,  1.15s/it]
 40%|████      | 4/10 [00:04<00:06,  1.15s/it]
 50%|█████     | 5/10 [00:05<00:05,  1.15s/it]
 50%|█████     | 5/10 [00:05<00:05,  1.15s/it]
 60%|██████    | 6/10 [00:06<00:04,  1.08s/it]
 60%|██████    | 6/10 [00:06<00:04,  1.08s/it]
 70%|███████   | 7/10 [00:07<00:03,  1.03s/it]
 70%|███████   | 7/10 [00:07<00:03,  1.03s/it]
 80%|████████  | 8/10 [00:08<00:02,  1.01s/it]
 80%|████████  | 8/10 [00:08<00:02,  1.01s/it]
 90%|█████████ | 9/10 [00:09<00:00,  1.02it/s]
 90%|█████████ | 9/10 [00:09<00:00,  1.02it/s]
+[2025-09-09 07:47:22,404] [INFO] [axolotl.core.trainers.base._save:681] [PID:37] [RANK:0] Saving Trainer.data_collator.tokenizer by default as Trainer.processing_class is `None`[39m
+[2025-09-09 07:47:22,504] [INFO] [axolotl.train.save_trained_model:228] [PID:37] [RANK:0] Training completed! Saving trained model to /app/checkpoints/instr-fast-052b/ares56-test-text.[39m
+[2025-09-09 07:47:22,841] [INFO] [axolotl.train.save_trained_model:352] [PID:37] [RANK:0] Model successfully saved to /app/checkpoints/instr-fast-052b/ares56-test-text[39m

train_instr-fast-052b.yml ADDED Viewed

	@@ -0,0 +1,60 @@

+prompt_style: alpaca
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+adapter: lora
+# Solo CPU per smoke
+load_in_8bit: false
+load_in_4bit: false
+bf16: false
+fp16: false
+tf32: false
+flash_attn: false
+torch_dtype: torch.float32
+attn_implementation: eager
+datasets:
+- path: /app/axolotl/data/mini_instruct_50.jsonl
+  type: alpaca
+  field_instruction: instruction
+  field_input: input
+  field_output: output
+  prompt_style: alpaca
+output_dir: /app/checkpoints/instr-fast-052b/ares56-test-text
+sequence_len: 256
+sample_packing: false
+val_set_size: 0
+micro_batch_size: 1
+gradient_accumulation_steps: 1
+num_epochs: 1
+max_steps: 10
+save_steps: 10
+logging_steps: 1
+eval_steps: 0
+optimizer: adamw_torch
+learning_rate: 2e-4
+warmup_steps: 0
+weight_decay: 0.0
+# ==== LoRA ====
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules:
+- q_proj
+- k_proj
+- v_proj
+- o_proj
+- gate_proj
+- up_proj
+- down_proj
+# ==== Salvataggio solo adapter ====
+save_safetensors: true
+save_16bit: false
+save_strategy: steps
+save_total_limit: 1
+save_only_adapter: true