Instructions to use ConicCat/Nemo-super-wip-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ConicCat/Nemo-super-wip-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("nvidia/Llama-3_3-Nemotron-Super-49B-v1_5")
model = PeftModel.from_pretrained(base_model, "ConicCat/Nemo-super-wip-lora")

Transformers

How to use ConicCat/Nemo-super-wip-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ConicCat/Nemo-super-wip-lora", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ConicCat/Nemo-super-wip-lora", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ConicCat/Nemo-super-wip-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ConicCat/Nemo-super-wip-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ConicCat/Nemo-super-wip-lora

SGLang

How to use ConicCat/Nemo-super-wip-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ConicCat/Nemo-super-wip-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ConicCat/Nemo-super-wip-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Nemo-super-wip-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ConicCat/Nemo-super-wip-lora with Docker Model Runner:
```
docker model run hf.co/ConicCat/Nemo-super-wip-lora
```

Nemo-super-wip-lora / debug.log

ConicCat

Upload folder using huggingface_hub

0776dca verified about 2 months ago

raw

history blame contribute delete

75.6 kB

	[2026-03-31 02:46:14,052] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:10906] baseline 0.000GB ()
	[2026-03-31 02:46:14,053] [INFO] [axolotl.cli.config.load_cfg:341] [PID:10906] config:
	{
	"activation_offloading": false,
	"adapter": "lora",
	"axolotl_config_path": "writer.yaml",
	"base_model": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"base_model_config": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"batch_size": 16,
	"bf16": true,
	"capabilities": {
	"bf16": true,
	"compute_capability": "sm_90",
	"fp8": true,
	"n_gpu": 1,
	"n_node": 1,
	"tf32": true
	},
	"chat_template": "jinja",
	"chat_template_jinja": "{% set bos = \"<\|begin_of_text\|>\" %}{%- set enable_thinking = false -%}{% set system_start_header = \"<\|start_header_id\|>\" %}{% set system_end_header = \"<\|end_header_id\|>\n\n\" %}{% set start_header = \"<\|start_header_id\|>\" %}{% set end_header = \"<\|end_header_id\|>\n\n\" %}{% set eot = \"<\|eot_id\|>\" %}{% set system_token = \"system\" %}{% set user_token = \"user\" %}{% set assistant_token = \"assistant\" %}{% set tool_token = \"tool\" %}{{- bos ~ system_start_header ~ system_token ~ system_end_header -}}{%- if messages[0].role == 'system' and messages[0].content != '' -%}{%- set system_content = messages[0].content -%}{%- if '/no_think' in system_content -%}{%- set system_content = system_content.replace('/no_think', '')\|trim -%}{%- set enable_thinking = false -%}{%- elif '/think' in system_content -%}{%- set system_content = system_content.replace('/think', '')\|trim -%}{%- set enable_thinking = true -%}{%- endif -%}{{- system_content + '\n\n' -}}{%- endif -%}{%- if tools -%}{{- 'You can use the following tools to assist the user if required:\n<AVAILABLE_TOOLS>[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) \| tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']</AVAILABLE_TOOLS>\n\nIf you decide to call any tool(s), use the following format:\n<TOOLCALL>[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, {{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]</TOOLCALL>\n\nResponse from tool(s) will be returned in this format:\n<TOOL_RESPONSE>[{{\"response\": \"tool_response1\"}}, {{\"response\": \"tool_response2\"}}]</TOOL_RESPONSE>\n\nBased on the results returned by the tool(s), you can call additional tools if needed, correct tool calls if any errors are found, or just respond with the answer to the user.' -}}{%- endif -%}{{- eot -}}{%- for message in messages -%}{%- if message.role == user_token -%}{{- start_header ~ user_token ~ end_header -}}{{ message.content -}}{{ eot -}}{%- elif message.role == assistant_token -%}{%- if '</think>' in message.content -%}{%- set content = message.content.split('</think>')[-1].lstrip() -%}{%- else -%}{%- set content = message.content -%}{%- endif -%}{{- start_header ~ assistant_token ~ end_header -}}{{ content -}}{%- if message.tool_calls -%}{{- '<TOOLCALL>[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments \| tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']</TOOLCALL>' -}}{%- endif -%}{{- eot -}}{%- elif message.role == tool_token -%}{%- if loop.first or (messages[loop.index0 - 1].role != tool_token) -%}{{- start_header ~ tool_token ~ end_header -}}{{ '<TOOL_RESPONSE>[' -}}{%- endif -%}{{- message.content -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == tool_token) else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != tool_token) -%}{{- ']</TOOL_RESPONSE>' -}}{{ eot -}}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- start_header ~ assistant_token ~ end_header -}}{%- if not enable_thinking -%}{{- '<think>\n\n</think>\n\n' -}}{%- endif -%}{%- endif -%}",
	"context_parallel_size": 1,
	"dataloader_num_workers": 1,
	"dataloader_pin_memory": true,
	"dataloader_prefetch_factor": 256,
	"dataset_num_proc": 8,
	"datasets": [
	{
	"chat_template": "tokenizer_default",
	"message_field_training": "train",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/GLiMA_Thinking",
	"roles_to_train": [],
	"train_on_eos": "turn",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Gutenberg-SFT",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Condor-SFT-Filtered",
	"split": "train[:250]",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Ao3_Soft_Refusal",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/VSF",
	"trust_remote_code": false,
	"type": "chat_template"
	}
	],
	"ddp": false,
	"device": "cuda:0",
	"device_map": "auto",
	"dion_rank_fraction": 1.0,
	"dion_rank_multiple_of": 1,
	"eaft_alpha": 1.0,
	"eaft_k": 20,
	"env_capabilities": {
	"torch_version": "2.9.1"
	},
	"eval_batch_size": 1,
	"eval_causal_lm_metrics": [
	"sacrebleu",
	"comet",
	"ter",
	"chrf"
	],
	"eval_max_new_tokens": 128,
	"eval_sample_packing": true,
	"eval_table_size": 0,
	"experimental_skip_move_to_device": true,
	"flash_attention": true,
	"fp16": false,
	"generate_samples": false,
	"generation_do_sample": true,
	"generation_max_new_tokens": 50,
	"generation_prompt_ratio": 0.5,
	"generation_temperature": 0.7,
	"gradient_accumulation_steps": 16,
	"gradient_checkpointing": true,
	"gradient_checkpointing_kwargs": {
	"use_reentrant": true
	},
	"include_tkps": true,
	"is_llama_derived_model": true,
	"layer_offloading": false,
	"learning_rate": 1.25e-05,
	"lisa_layers_attribute": "model.layers",
	"load_best_model_at_end": false,
	"load_in_4bit": false,
	"load_in_8bit": false,
	"local_rank": 0,
	"logging_steps": 1,
	"lora_alpha": 64,
	"lora_dropout": 0.0,
	"lora_mlp_kernel": false,
	"lora_o_kernel": false,
	"lora_qkv_kernel": false,
	"lora_r": 32,
	"lora_target_linear": true,
	"loraplus_lr_embedding": 1e-06,
	"loraplus_lr_ratio": 16.0,
	"lr_scheduler": "constant_with_warmup",
	"max_grad_norm": 1.0,
	"mean_resizing_embeddings": false,
	"merge_method": "memory_efficient",
	"micro_batch_size": 1,
	"model_config_type": "nemotron-nas",
	"num_epochs": 3.0,
	"num_generation_samples": 3,
	"optimizer": "paged_adamw_8bit",
	"otel_metrics_host": "localhost",
	"otel_metrics_port": 8000,
	"output_dir": "./Writer-Stage-1",
	"pad_to_sequence_len": true,
	"pretrain_multipack_attn": true,
	"profiler_steps_start": 0,
	"qlora_sharded_model_loading": false,
	"quantize_moe_experts": false,
	"ray_num_workers": 1,
	"resources_per_worker": {
	"GPU": 1
	},
	"sample_packing": true,
	"sample_packing_bin_size": 200,
	"sample_packing_group_size": 100000,
	"save_only_model": false,
	"save_safetensors": true,
	"save_strategy": "no",
	"seed": 42,
	"sequence_len": 5120,
	"shuffle_before_merging_datasets": false,
	"shuffle_merged_datasets": true,
	"skip_prepare_dataset": false,
	"streaming_multipack_buffer_size": 10000,
	"strict": false,
	"tensor_parallel_size": 1,
	"tf32": true,
	"tiled_mlp_use_original_mlp": true,
	"tokenizer_config": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"tokenizer_save_jinja_files": true,
	"torch_dtype": "torch.bfloat16",
	"train_on_inputs": false,
	"trl": {
	"async_prefetch": false,
	"log_completions": false,
	"mask_truncated_completions": false,
	"ref_model_mixup_alpha": 0.9,
	"ref_model_sync_steps": 64,
	"replay_buffer_size": 0,
	"replay_recompute_logps": true,
	"reroll_max_groups": 1,
	"reroll_start_fraction": 1.0,
	"reward_num_workers": 1,
	"scale_rewards": true,
	"skip_zero_advantage_batches": true,
	"sync_ref_model": false,
	"use_data_producer": false,
	"use_vllm": false,
	"vllm_lora_sync": false,
	"vllm_server_host": "0.0.0.0",
	"vllm_server_port": 8000
	},
	"trust_remote_code": true,
	"use_otel_metrics": false,
	"use_ray": false,
	"use_tensorboard": true,
	"val_set_size": 0.0,
	"vllm": {
	"device": "auto",
	"dtype": "auto",
	"gpu_memory_utilization": 0.9,
	"host": "0.0.0.0",
	"port": 8000
	},
	"warmup_ratio": 0.05,
	"weight_decay": 0.0,
	"world_size": 1
	}
	[2026-03-31 02:46:14,057] [INFO] [axolotl.utils.schemas.validation.check_eval_packing:129] [PID:10906] explicitly setting `eval_sample_packing` to match `sample_packing`
	[2026-03-31 02:46:14,057] [WARNING] [axolotl.utils.schemas.validation.check_sample_packing_without_attention:190] [PID:10906] sample_packing without flash, sdp, xformers, sage, or flex attention does not handle cross sample decontamination.
	[2026-03-31 02:46:14,057] [INFO] [axolotl.utils.schemas.validation.hint_sample_packing_padding:239] [PID:10906] Setting `pad_to_sequence_len: true` to prevent memory leaks when sample_packing
	[2026-03-31 02:46:14,057] [WARNING] [axolotl.utils.schemas.model.hint_trust_remote_code:103] [PID:10906] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
	[2026-03-31 02:46:14,759] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:10906] baseline 0.000GB ()
	[2026-03-31 02:46:14,760] [INFO] [axolotl.cli.config.load_cfg:341] [PID:10906] config:
	{
	"activation_offloading": false,
	"adapter": "lora",
	"axolotl_config_path": "writer.yaml",
	"base_model": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"base_model_config": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"batch_size": 16,
	"bf16": true,
	"capabilities": {
	"bf16": true,
	"compute_capability": "sm_90",
	"fp8": true,
	"n_gpu": 1,
	"n_node": 1,
	"tf32": true
	},
	"chat_template": "jinja",
	"chat_template_jinja": "{% set bos = \"<\|begin_of_text\|>\" %}{%- set enable_thinking = false -%}{% set system_start_header = \"<\|start_header_id\|>\" %}{% set system_end_header = \"<\|end_header_id\|>\n\n\" %}{% set start_header = \"<\|start_header_id\|>\" %}{% set end_header = \"<\|end_header_id\|>\n\n\" %}{% set eot = \"<\|eot_id\|>\" %}{% set system_token = \"system\" %}{% set user_token = \"user\" %}{% set assistant_token = \"assistant\" %}{% set tool_token = \"tool\" %}{{- bos ~ system_start_header ~ system_token ~ system_end_header -}}{%- if messages[0].role == 'system' and messages[0].content != '' -%}{%- set system_content = messages[0].content -%}{%- if '/no_think' in system_content -%}{%- set system_content = system_content.replace('/no_think', '')\|trim -%}{%- set enable_thinking = false -%}{%- elif '/think' in system_content -%}{%- set system_content = system_content.replace('/think', '')\|trim -%}{%- set enable_thinking = true -%}{%- endif -%}{{- system_content + '\n\n' -}}{%- endif -%}{%- if tools -%}{{- 'You can use the following tools to assist the user if required:\n<AVAILABLE_TOOLS>[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) \| tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']</AVAILABLE_TOOLS>\n\nIf you decide to call any tool(s), use the following format:\n<TOOLCALL>[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, {{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]</TOOLCALL>\n\nResponse from tool(s) will be returned in this format:\n<TOOL_RESPONSE>[{{\"response\": \"tool_response1\"}}, {{\"response\": \"tool_response2\"}}]</TOOL_RESPONSE>\n\nBased on the results returned by the tool(s), you can call additional tools if needed, correct tool calls if any errors are found, or just respond with the answer to the user.' -}}{%- endif -%}{{- eot -}}{%- for message in messages -%}{%- if message.role == user_token -%}{{- start_header ~ user_token ~ end_header -}}{{ message.content -}}{{ eot -}}{%- elif message.role == assistant_token -%}{%- if '</think>' in message.content -%}{%- set content = message.content.split('</think>')[-1].lstrip() -%}{%- else -%}{%- set content = message.content -%}{%- endif -%}{{- start_header ~ assistant_token ~ end_header -}}{{ content -}}{%- if message.tool_calls -%}{{- '<TOOLCALL>[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments \| tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']</TOOLCALL>' -}}{%- endif -%}{{- eot -}}{%- elif message.role == tool_token -%}{%- if loop.first or (messages[loop.index0 - 1].role != tool_token) -%}{{- start_header ~ tool_token ~ end_header -}}{{ '<TOOL_RESPONSE>[' -}}{%- endif -%}{{- message.content -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == tool_token) else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != tool_token) -%}{{- ']</TOOL_RESPONSE>' -}}{{ eot -}}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- start_header ~ assistant_token ~ end_header -}}{%- if not enable_thinking -%}{{- '<think>\n\n</think>\n\n' -}}{%- endif -%}{%- endif -%}",
	"context_parallel_size": 1,
	"dataloader_num_workers": 1,
	"dataloader_pin_memory": true,
	"dataloader_prefetch_factor": 256,
	"dataset_num_proc": 8,
	"datasets": [
	{
	"chat_template": "tokenizer_default",
	"message_field_training": "train",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/GLiMA_Thinking",
	"roles_to_train": [],
	"train_on_eos": "turn",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Gutenberg-SFT",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Condor-SFT-Filtered",
	"split": "train[:250]",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/Ao3_Soft_Refusal",
	"trust_remote_code": false,
	"type": "chat_template"
	},
	{
	"chat_template": "tokenizer_default",
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "ConicCat/VSF",
	"trust_remote_code": false,
	"type": "chat_template"
	}
	],
	"ddp": false,
	"device": "cuda:0",
	"device_map": "auto",
	"dion_rank_fraction": 1.0,
	"dion_rank_multiple_of": 1,
	"eaft_alpha": 1.0,
	"eaft_k": 20,
	"env_capabilities": {
	"torch_version": "2.9.1"
	},
	"eval_batch_size": 1,
	"eval_causal_lm_metrics": [
	"sacrebleu",
	"comet",
	"ter",
	"chrf"
	],
	"eval_max_new_tokens": 128,
	"eval_sample_packing": true,
	"eval_table_size": 0,
	"experimental_skip_move_to_device": true,
	"flash_attention": false,
	"fp16": false,
	"generate_samples": false,
	"generation_do_sample": true,
	"generation_max_new_tokens": 50,
	"generation_prompt_ratio": 0.5,
	"generation_temperature": 0.7,
	"gradient_accumulation_steps": 16,
	"gradient_checkpointing": true,
	"gradient_checkpointing_kwargs": {
	"use_reentrant": true
	},
	"include_tkps": true,
	"is_llama_derived_model": true,
	"layer_offloading": false,
	"learning_rate": 1.25e-05,
	"lisa_layers_attribute": "model.layers",
	"load_best_model_at_end": false,
	"load_in_4bit": false,
	"load_in_8bit": false,
	"local_rank": 0,
	"logging_steps": 1,
	"lora_alpha": 64,
	"lora_dropout": 0.0,
	"lora_mlp_kernel": false,
	"lora_o_kernel": false,
	"lora_qkv_kernel": false,
	"lora_r": 32,
	"lora_target_linear": true,
	"loraplus_lr_embedding": 1e-06,
	"loraplus_lr_ratio": 16.0,
	"lr_scheduler": "constant_with_warmup",
	"max_grad_norm": 1.0,
	"mean_resizing_embeddings": false,
	"merge_lora": true,
	"merge_method": "memory_efficient",
	"micro_batch_size": 1,
	"model_config_type": "nemotron-nas",
	"num_epochs": 3.0,
	"num_generation_samples": 3,
	"optimizer": "paged_adamw_8bit",
	"otel_metrics_host": "localhost",
	"otel_metrics_port": 8000,
	"output_dir": "./Writer-Stage-1",
	"pad_to_sequence_len": true,
	"pretrain_multipack_attn": true,
	"profiler_steps_start": 0,
	"qlora_sharded_model_loading": false,
	"quantize_moe_experts": false,
	"ray_num_workers": 1,
	"resources_per_worker": {
	"GPU": 1
	},
	"sample_packing": true,
	"sample_packing_bin_size": 200,
	"sample_packing_group_size": 100000,
	"save_only_model": false,
	"save_safetensors": true,
	"save_strategy": "no",
	"seed": 42,
	"sequence_len": 5120,
	"shuffle_before_merging_datasets": false,
	"shuffle_merged_datasets": true,
	"skip_prepare_dataset": false,
	"streaming_multipack_buffer_size": 10000,
	"strict": false,
	"tensor_parallel_size": 1,
	"tf32": true,
	"tiled_mlp_use_original_mlp": true,
	"tokenizer_config": "nvidia/Llama-3_3-Nemotron-Super-49B-v1_5",
	"tokenizer_save_jinja_files": true,
	"torch_dtype": "torch.bfloat16",
	"train_on_inputs": false,
	"trl": {
	"async_prefetch": false,
	"log_completions": false,
	"mask_truncated_completions": false,
	"ref_model_mixup_alpha": 0.9,
	"ref_model_sync_steps": 64,
	"replay_buffer_size": 0,
	"replay_recompute_logps": true,
	"reroll_max_groups": 1,
	"reroll_start_fraction": 1.0,
	"reward_num_workers": 1,
	"scale_rewards": true,
	"skip_zero_advantage_batches": true,
	"sync_ref_model": false,
	"use_data_producer": false,
	"use_vllm": false,
	"vllm_lora_sync": false,
	"vllm_server_host": "0.0.0.0",
	"vllm_server_port": 8000
	},
	"trust_remote_code": true,
	"use_otel_metrics": false,
	"use_ray": false,
	"use_tensorboard": true,
	"val_set_size": 0.0,
	"vllm": {
	"device": "auto",
	"dtype": "auto",
	"gpu_memory_utilization": 0.9,
	"host": "0.0.0.0",
	"port": 8000
	},
	"warmup_ratio": 0.05,
	"weight_decay": 0.0,
	"world_size": 1
	}
	[2026-03-31 02:46:14,760] [DEBUG] [axolotl.cli.merge_lora.do_merge_lora:32] [PID:10906] Using memory-efficient LoRA merging method...
	[2026-03-31 02:46:14,760] [DEBUG] [axolotl.cli.merge_lora._do_merge_lora_efficient:79] [PID:10906] Using memory-efficient LoRA merging method...
	Downloading (incomplete total...): 0.00B [00:00, ?B/s]
	Fetching 47 files: 0%\| \| 0/47 [00:00<?, ?it/s][A Downloading (incomplete total...): 0%\| \| 0.00/183k [00:00<?, ?B/s] Downloading (incomplete total...): 12%\|███████▊ \| 21.3k/183k [00:00<00:03, 40.6kB/s] Downloading (incomplete total...): 21%\|██████████████▍ \| 39.4k/183k [00:00<00:03, 40.6kB/s] Downloading (incomplete total...): 22%\|██████████████▌ \| 39.9k/184k [00:00<00:03, 40.6kB/s]
	Fetching 47 files: 2%\|█▊ \| 1/47 [00:00<00:25, 1.82it/s][A Downloading (incomplete total...): 23%\|███████████████▏ \| 41.8k/184k [00:00<00:02, 58.5kB/s] Downloading (incomplete total...): 225kB [00:01, 170kB/s] Downloading (incomplete total...): 225kB [00:01, 170kB/s]
	Fetching 47 files: 6%\|█████▌ \| 3/47 [00:04<01:13, 1.67s/it][A Fetching 47 files: 100%\|███████████████████████████████████████████████████████████████████████████████████████\| 47/47 [00:04<00:00, 10.04it/s]
	Download complete: : 229kB [00:04, 170kB/s] [2026-03-31 02:46:19,620] [DEBUG] [axolotl.cli.utils.lora_merge.merge_lora_sharded_efficient:838] [PID:10906] LoRA scale factor: 2.0 (rslora=False)
	[2026-03-31 02:46:19,620] [DEBUG] [axolotl.cli.utils.lora_merge.merge_lora_sharded_efficient:854] [PID:10906] Loading LoRA weights from Writer-Stage-1/adapter_model.safetensors
	[2026-03-31 02:46:19,633] [DEBUG] [axolotl.cli.utils.lora_merge.merge_lora_sharded_efficient:860] [PID:10906] Keeping LoRA weights on CPU; will move per-tensor during merge
	[2026-03-31 02:46:19,633] [DEBUG] [axolotl.cli.utils.lora_merge.merge_lora_sharded_efficient:866] [PID:10906] Found 21 model shards in /workspace/data/huggingface-cache/hub/models--nvidia--Llama-3_3-Nemotron-Super-49B-v1_5/snapshots/420ba7d28211abf116b8b103ab700d92619daf98
	[2026-03-31 02:46:19,633] [INFO] [axolotl.cli.utils.lora_merge.copy_non_model_files:303] [PID:10906] Copying non-model files to output directory...
	[2026-03-31 02:46:19,633] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying config.json to output
	[2026-03-31 02:46:19,633] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying configuration_decilm.py to output
	[2026-03-31 02:46:19,633] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__configuration_llama.py to output
	[2026-03-31 02:46:19,634] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__modeling_rope_utils.py to output
	[2026-03-31 02:46:19,634] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying block_config.py to output
	[2026-03-31 02:46:19,634] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying tokenizer_config.json to output
	[2026-03-31 02:46:19,634] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying tokenizer.json to output
	[2026-03-31 02:46:19,638] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying special_tokens_map.json to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying modeling_decilm.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__modeling_outputs.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__cache_utils.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__pytorch_utils.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__activations.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying variable_cache.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py to output
	[2026-03-31 02:46:19,639] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying transformers_4_44_2__modeling_attn_mask_utils.py to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying generation_config.json to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying llama_nemotron_toolcall_parser_no_streaming.py to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying README.md to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying PRIVACY.md to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying BIAS.md to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying .gitattributes to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying accuracy_chart.png to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying SAFETY&SECURITY.md to output
	[2026-03-31 02:46:19,640] [DEBUG] [axolotl.cli.utils.lora_merge.copy_non_model_files:324] [PID:10906] Copying EXPLAINABILITY.md to output

	Merging shards: 0%\| \| 0/21 [00:00<?, ?it/s][A[2026-03-31 02:46:19,642] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.mlp.down_proj.weight: torch.Size([32, 14336]), torch.Size([8192, 32])
	Download complete: : 229kB [00:04, 46.6kB/s]
	[2026-03-31 02:46:20,696] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:46:21,426] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:46:22,225] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:22,280] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:22,820] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:23,341] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.0.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:23,394] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:46:24,838] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:26,250] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:27,647] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:27,699] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:28,154] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:28,618] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.1.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:28,670] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:28,722] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:29,202] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 5%\|████▎ \| 1/21 [00:13<04:33, 13.68s/it][A[2026-03-31 02:46:33,348] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:46:34,816] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:36,246] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:37,651] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.2.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:38,131] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:46:39,614] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:41,043] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:42,447] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:42,497] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:42,956] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:43,452] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.3.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:43,505] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:46:44,942] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:46,427] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:47,935] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:47,987] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:48,458] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:48,915] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.4.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 10%\|████████▋ \| 2/21 [00:33<05:29, 17.32s/it][A[2026-03-31 02:46:53,184] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:46:54,686] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:56,164] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:46:57,648] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:57,687] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:58,223] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:46:58,718] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.5.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:46:58,764] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.6.mlp.down_proj.weight: torch.Size([32, 14336]), torch.Size([8192, 32])
	[2026-03-31 02:46:59,519] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.6.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:47:00,309] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.6.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:47:01,112] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.7.mlp.down_proj.weight: torch.Size([32, 14336]), torch.Size([8192, 32])
	[2026-03-31 02:47:01,813] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.7.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:47:02,515] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.7.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([14336, 32])
	[2026-03-31 02:47:03,290] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:04,711] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:06,150] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:07,654] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:07,706] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:08,159] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:08,635] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.8.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:08,687] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:08,782] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:09,296] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 14%\|█████████████ \| 3/21 [00:53<05:36, 18.69s/it][A[2026-03-31 02:47:13,528] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:14,969] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:16,429] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:17,853] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:17,888] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:18,394] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:18,937] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.10.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:18,982] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.11.mlp.down_proj.weight: torch.Size([32, 17920]), torch.Size([8192, 32])
	[2026-03-31 02:47:19,927] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.11.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([17920, 32])
	[2026-03-31 02:47:20,862] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.11.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([17920, 32])
	[2026-03-31 02:47:21,841] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:23,267] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:23,318] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:23,760] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:24,253] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:24,310] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:25,835] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:27,317] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:28,733] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.9.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])

	Merging shards: 19%\|█████████████████▎ \| 4/21 [01:13<05:25, 19.14s/it][A[2026-03-31 02:47:33,357] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:34,842] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.12.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:36,252] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:37,651] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:39,044] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:40,529] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:40,576] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:41,056] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:41,533] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.13.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:41,585] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:43,009] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:44,442] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:45,912] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:45,965] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:46,456] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:46,904] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.14.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:46,961] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:47:47,017] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:47,457] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:47:47,951] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 24%\|█████████████████████▋ \| 5/21 [01:32<05:04, 19.04s/it][A[2026-03-31 02:47:52,218] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:53,756] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:55,227] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.15.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:56,651] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:47:58,125] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:47:59,605] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:01,046] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:01,081] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:01,556] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:02,015] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.16.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:02,051] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:03,496] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:04,865] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:06,337] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:06,373] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:06,859] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:07,346] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.17.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:07,382] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:07,482] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:08,034] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 29%\|██████████████████████████ \| 6/21 [01:52<04:52, 19.50s/it][A[2026-03-31 02:48:12,606] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:14,139] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:15,617] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:17,054] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.18.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:17,529] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:18,954] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:20,434] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:21,897] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:21,935] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:22,358] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:22,851] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.19.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:22,904] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:24,335] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:25,757] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:27,225] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:27,282] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:27,819] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:28,336] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.20.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 33%\|██████████████████████████████▎ \| 7/21 [02:13<04:36, 19.73s/it][A[2026-03-31 02:48:32,791] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:34,285] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:35,749] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:37,242] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:37,284] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:37,791] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:38,257] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.21.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:38,294] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:39,730] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:41,243] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:42,655] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:42,694] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:43,157] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:43,653] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.22.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:43,706] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:45,154] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:46,559] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:46,612] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:47,125] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:47,652] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 38%\|██████████████████████████████████▋ \| 8/21 [02:32<04:13, 19.49s/it][A[2026-03-31 02:48:51,769] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.23.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:53,245] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:48:54,731] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:56,186] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:48:57,635] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:57,671] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:58,217] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:48:58,732] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.24.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:48:58,785] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:00,244] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:01,692] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:03,112] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:03,169] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:03,656] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:04,140] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.25.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:04,195] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:05,648] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:05,706] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:06,253] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:06,752] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 43%\|███████████████████████████████████████ \| 9/21 [02:50<03:51, 19.29s/it][A[2026-03-31 02:49:10,644] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:12,102] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.26.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:13,563] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:15,043] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:16,455] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:17,848] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:17,894] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:18,356] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:18,837] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.27.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:18,887] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:20,329] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:21,814] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:23,243] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:23,295] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:23,830] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:24,322] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.28.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:24,376] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:24,429] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:24,936] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:25,455] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])

	Merging shards: 48%\|██████████████████████████████████████████▊ \| 10/21 [03:10<03:31, 19.24s/it][A[2026-03-31 02:49:29,772] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:31,229] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:32,646] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.29.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:34,113] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:35,536] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:36,945] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:38,428] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:38,474] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:38,953] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:39,444] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.30.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:39,496] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.mlp.down_proj.weight: torch.Size([32, 28672]), torch.Size([8192, 32])
	[2026-03-31 02:49:40,842] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.mlp.gate_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:42,246] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.mlp.up_proj.weight: torch.Size([32, 8192]), torch.Size([28672, 32])
	[2026-03-31 02:49:43,648] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:43,706] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.self_attn.o_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:44,155] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:44,646] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.31.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:44,701] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.32.self_attn.k_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	[2026-03-31 02:49:44,781] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.32.self_attn.q_proj.weight: torch.Size([32, 8192]), torch.Size([8192, 32])
	[2026-03-31 02:49:45,256] [DEBUG] [axolotl.cli.utils.lora_merge._merge_tensor_with_lora:411] [PID:10906] Merging LoRA for model.layers.32.self_attn.v_proj.weight: torch.Size([32, 8192]), torch.Size([1024, 32])
	Merging shards: 48%\|██████████████████████████████████████████▊ \| 10/21 [03:30<03:51, 21.01s/it]
	[2026-03-31 02:49:49,768] [ERROR] [axolotl.telemetry.errors.wrapper:158] [PID:10906] Error captured in telemetry. Run ID: 77193302-fa43-4dfd-ab04-45c91b8c4748
	Traceback (most recent call last):
	File "/root/miniconda3/envs/py3.11/bin/axolotl", line 6, in <module>
	sys.exit(main())
	^^^^^^
	File "/workspace/axolotl/src/axolotl/cli/main.py", line 347, in main
	cli()
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/click/core.py", line 1485, in __call__
	return self.main(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/click/core.py", line 1406, in main
	rv = self.invoke(ctx)
	^^^^^^^^^^^^^^^^
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/click/core.py", line 1873, in invoke
	return _process_result(sub_ctx.command.invoke(sub_ctx))
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/click/core.py", line 1269, in invoke
	return ctx.invoke(self.callback, **ctx.params)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/click/core.py", line 824, in invoke
	return callback(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/workspace/axolotl/src/axolotl/cli/utils/args.py", line 48, in wrapper
	return func(args, *filtered_kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/workspace/axolotl/src/axolotl/cli/main.py", line 293, in merge_lora
	do_cli(config=config, **kwargs)
	File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 169, in do_cli
	do_merge_lora(cfg=parsed_cfg)
	File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
	return func(args, *kwargs)
	^^^^^^^^^^^^^^^^^^^^^
	File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 33, in do_merge_lora
	_do_merge_lora_efficient(cfg=cfg)
	File "/workspace/axolotl/src/axolotl/cli/merge_lora.py", line 108, in _do_merge_lora_efficient
	merge_lora_sharded_efficient(
	File "/workspace/axolotl/src/axolotl/cli/utils/lora_merge.py", line 940, in merge_lora_sharded_efficient
	safetensors.torch.save_file(
	File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/safetensors/torch.py", line 307, in save_file
	serialize_file(_flatten(tensors), filename, metadata=metadata)
	safetensors_rust.SafetensorError: Error while serializing: I/O error: No space left on device (os error 28)