Instructions to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2")

Transformers

How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2")
model = AutoModelForMultimodalLM.from_pretrained("felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2

SGLang

How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with Docker Model Runner:
```
docker model run hf.co/felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2
```

Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 / debug.log

felixwangg

Upload folder using huggingface_hub

c500ace verified about 2 months ago

raw

history blame contribute delete

135 kB

	[2026-04-13 02:17:40,735] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:406990] baseline 0.000GB ()
	[2026-04-13 02:17:40,736] [INFO] [axolotl.cli.config.load_cfg:259] [PID:406990] config:
	{
	"activation_offloading": false,
	"adapter": "lora",
	"axolotl_config_path": "./axolotl_configs/Qwen2.5-coder-7b-instruct/primevul/lora-primevul-plus-token-diff-mask-alpha-2-ctx3.yaml",
	"base_model": "Qwen/Qwen2.5-Coder-7B-Instruct",
	"base_model_config": "Qwen/Qwen2.5-Coder-7B-Instruct",
	"batch_size": 64,
	"bf16": true,
	"capabilities": {
	"bf16": true,
	"compute_capability": "sm_120",
	"fp8": true,
	"n_gpu": 2,
	"n_node": 1
	},
	"context_parallel_size": 1,
	"dataloader_num_workers": 2,
	"dataloader_pin_memory": true,
	"dataloader_prefetch_factor": 256,
	"dataset_num_proc": 60,
	"dataset_prepared_path": "/u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2",
	"datasets": [
	{
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "felixwangg/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2",
	"split": "train",
	"trust_remote_code": false,
	"type": "pretokenized"
	}
	],
	"ddp": true,
	"device": "cuda:0",
	"device_map": {
	"": 0
	},
	"dion_rank_fraction": 1.0,
	"dion_rank_multiple_of": 1,
	"early_stopping_patience": 1000,
	"env_capabilities": {
	"torch_version": "2.10.0"
	},
	"eval_batch_size": 4,
	"eval_causal_lm_metrics": [
	"sacrebleu",
	"comet",
	"ter",
	"chrf"
	],
	"eval_max_new_tokens": 128,
	"eval_sample_packing": false,
	"eval_steps": 15,
	"eval_table_size": 0,
	"experimental_skip_move_to_device": true,
	"flash_attention": true,
	"fp16": false,
	"gradient_accumulation_steps": 8,
	"gradient_checkpointing": true,
	"gradient_checkpointing_kwargs": {
	"use_reentrant": true
	},
	"include_tkps": true,
	"is_falcon_derived_model": false,
	"is_llama_derived_model": false,
	"is_mistral_derived_model": false,
	"learning_rate": 4e-05,
	"lisa_layers_attribute": "model.layers",
	"load_best_model_at_end": true,
	"load_in_4bit": false,
	"load_in_8bit": false,
	"local_rank": 0,
	"logging_steps": 1,
	"lora_alpha": 16,
	"lora_dropout": 0.05,
	"lora_r": 16,
	"lora_target_linear": true,
	"loraplus_lr_embedding": 1e-06,
	"lr_scheduler": "cosine",
	"mean_resizing_embeddings": false,
	"merge_lora": true,
	"micro_batch_size": 4,
	"model_config_type": "qwen2",
	"num_epochs": 1.0,
	"optimizer": "adamw_torch",
	"otel_metrics_host": "localhost",
	"otel_metrics_port": 8000,
	"output_dir": "/u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2",
	"pad_to_sequence_len": true,
	"plugins": [
	"diff_mask_trainer.plugin.DiffMaskPlugin"
	],
	"pretrain_multipack_attn": true,
	"profiler_steps_start": 0,
	"qlora_sharded_model_loading": false,
	"ray_num_workers": 1,
	"resources_per_worker": {
	"GPU": 1
	},
	"sample_packing": false,
	"sample_packing_bin_size": 200,
	"sample_packing_group_size": 100000,
	"save_only_model": false,
	"save_safetensors": true,
	"save_steps": 15,
	"save_total_limit": 1000,
	"sequence_len": 4096,
	"shuffle_before_merging_datasets": false,
	"shuffle_merged_datasets": true,
	"skip_prepare_dataset": false,
	"streaming_multipack_buffer_size": 10000,
	"strict": false,
	"tensor_parallel_size": 1,
	"test_datasets": [
	{
	"message_property_mappings": {
	"content": "content",
	"role": "role"
	},
	"path": "felixwangg/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2",
	"split": "validation",
	"trust_remote_code": false,
	"type": "pretokenized"
	}
	],
	"tf32": false,
	"tiled_mlp_use_original_mlp": true,
	"tokenizer_config": "Qwen/Qwen2.5-Coder-7B-Instruct",
	"tokenizer_save_jinja_files": true,
	"tokenizer_type": "AutoTokenizer",
	"torch_dtype": "torch.bfloat16",
	"train_on_inputs": false,
	"trl": {
	"log_completions": false,
	"mask_truncated_completions": false,
	"ref_model_mixup_alpha": 0.9,
	"ref_model_sync_steps": 64,
	"scale_rewards": true,
	"sync_ref_model": false,
	"use_vllm": false,
	"vllm_server_host": "0.0.0.0",
	"vllm_server_port": 8000
	},
	"type_of_model": "Qwen2ForCausalLM",
	"use_otel_metrics": false,
	"use_ray": false,
	"use_wandb": true,
	"val_set_size": 0.0,
	"vllm": {
	"device": "auto",
	"dtype": "auto",
	"gpu_memory_utilization": 0.9,
	"host": "0.0.0.0",
	"port": 8000
	},
	"wandb_entity": "wtkuan",
	"wandb_log_model": "false",
	"wandb_name": "Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2",
	"wandb_project": "diff-mask-sft-primevul-ctx-3",
	"wandb_watch": "false",
	"warmup_ratio": 0.1,
	"weight_decay": 0.02,
	"world_size": 2
	}
	[2026-04-13 02:17:41,352] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:406990] EOS: 151645 / <\|im_end\|>
	[2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:406990] BOS: None / None
	[2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:406990] PAD: 151643 / <\|endoftext\|>
	[2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:406990] UNK: None / None
	[2026-04-13 02:17:41,355] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:475] [PID:406990] Loading prepared dataset from disk at /u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2/8754561ab695e953202410ad43299050...
	[2026-04-13 02:17:41,376] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:475] [PID:406990] Loading prepared dataset from disk at /u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2/7a3c1ae28a530da625e3f3b6a296b5d3...
	[2026-04-13 02:17:41,400] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:417] [PID:406990] total_num_tokens: 3_735_150
	[2026-04-13 02:17:41,445] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:435] [PID:406990] `total_supervised_tokens: 2_946_329`
	[2026-04-13 02:17:41,445] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:533] [PID:406990] total_num_steps: 57
	[2026-04-13 02:17:41,445] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:406990] Maximum number of steps set at 57
	[2026-04-13 02:17:41,473] [DEBUG] [axolotl.train.setup_model_and_tokenizer:70] [PID:406990] loading tokenizer... Qwen/Qwen2.5-Coder-7B-Instruct
	[2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:406990] EOS: 151645 / <\|im_end\|>
	[2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:406990] BOS: None / None
	[2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:406990] PAD: 151643 / <\|endoftext\|>
	[2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:406990] UNK: None / None
	[2026-04-13 02:17:42,082] [DEBUG] [axolotl.train.setup_model_and_tokenizer:82] [PID:406990] Loading model
	[2026-04-13 02:17:42,140] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:406990] Patched Trainer.evaluation_loop with nanmean loss calculation
	[2026-04-13 02:17:42,142] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:406990] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation
	Loading checkpoint shards: 0%\| \| 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 25%\|████████████████ \| 1/4 [00:00<00:01, 1.59it/s] Loading checkpoint shards: 50%\|████████████████████████████████ \| 2/4 [00:01<00:01, 1.61it/s] Loading checkpoint shards: 75%\|████████████████████████████████████████████████ \| 3/4 [00:01<00:00, 1.74it/s] Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████\| 4/4 [00:01<00:00, 2.45it/s] Loading checkpoint shards: 100%\|████████████████████████████████████████████████████████████████\| 4/4 [00:01<00:00, 2.09it/s]
	[2026-04-13 02:17:44,689] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:347] [PID:406990] Converting modules to torch.bfloat16
	[2026-04-13 02:17:44,691] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:406990] Memory usage after model load 17.233GB (+17.233GB allocated, +18.252GB reserved)
	[2026-04-13 02:17:44,691] [INFO] [axolotl.loaders.adapter.load_lora:81] [PID:406990] found linear modules: ['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj']
	trainable params: 40,370,176 \|\| all params: 7,655,986,688 \|\| trainable%: 0.5273
	[2026-04-13 02:17:44,878] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:406990] after adapters 14.337GB (+14.337GB allocated, +18.330GB reserved)
	[2026-04-13 02:17:47,019] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/trl/extras/vllm_client.py:37: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2, 0.12.0. You have version 0.19.0 installed. We recommend installing a supported version to avoid compatibility issues.
	if is_vllm_available():

	[2026-04-13 02:17:47,116] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py:105: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2, 0.12.0. You have version 0.19.0 installed. We recommend installing a supported version to avoid compatibility issues.
	if is_vllm_available():

	DiffMaskPlugin: patching trainer with alpha=2.0
	DiffMaskPlugin: compute_loss and prediction_step patched
	[2026-04-13 02:17:51,606] [INFO] [axolotl.train.save_initial_configs:413] [PID:406990] Pre-saving adapter config to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2...
	[2026-04-13 02:17:51,607] [INFO] [axolotl.train.save_initial_configs:417] [PID:406990] Pre-saving tokenizer to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2...
	[2026-04-13 02:17:51,745] [INFO] [axolotl.train.save_initial_configs:422] [PID:406990] Pre-saving model config to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2...
	[2026-04-13 02:17:51,749] [INFO] [axolotl.train.execute_training:212] [PID:406990] Starting trainer...
	[34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /u901/t577wang/.netrc.
	[34m[1mwandb[0m: Currently logged in as: [33mwtkuan[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
	[34m[1mwandb[0m: [38;5;178m⢿[0m Waiting for wandb.init()...
	[Am[2K [34m[1mwandb[0m: [38;5;178m⣻[0m Waiting for wandb.init()...
	[Am[2K [34m[1mwandb[0m: Tracking run with wandb version 0.24.0
	[34m[1mwandb[0m: Run data is saved locally in [35m[1m/u901/t577wang/SecSteer/wandb/run-20260413_021753-v6k1ysas[0m
	[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
	[34m[1mwandb[0m: Syncing run [33mQwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2[0m
	[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/wtkuan/diff-mask-sft-primevul-ctx-3[0m
	[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/wtkuan/diff-mask-sft-primevul-ctx-3/runs/v6k1ysas[0m
	[34m[1mwandb[0m: Detected [huggingface_hub.inference, mcp, openai] in use.
	[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
	[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/
	[34m[1mwandb[0m: [33mWARNING[0m Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt")
	[34m[1mwandb[0m: [33mWARNING[0m Symlinked 1 file into the W&B run directory; call wandb.save again to sync new files.
	[2026-04-13 02:17:54,858] [INFO] [axolotl.utils.callbacks.on_train_begin:757] [PID:406990] The Axolotl config has been saved to the WandB run under files.
	0%\| \| 0/57 [00:00<?, ?it/s][2026-04-13 02:17:54,861] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step...

	0%\| \| 0/50 [00:00<?, ?it/s][A
	4%\|███▌ \| 2/50 [00:01<00:30, 1.56it/s][A
	6%\|█████▍ \| 3/50 [00:02<00:46, 1.02it/s][A
	8%\|███████▏ \| 4/50 [00:04<00:53, 1.16s/it][A
	10%\|█████████ \| 5/50 [00:05<00:57, 1.27s/it][A
	12%\|██████████▊ \| 6/50 [00:07<00:58, 1.34s/it][A
	14%\|████████████▌ \| 7/50 [00:08<00:59, 1.38s/it][A
	16%\|██████████████▍ \| 8/50 [00:10<00:58, 1.40s/it][A
	18%\|████████████████▏ \| 9/50 [00:11<00:59, 1.44s/it][A
	20%\|█████████████████▊ \| 10/50 [00:13<00:58, 1.46s/it][A
	22%\|███████████████████▌ \| 11/50 [00:14<00:57, 1.46s/it][A
	24%\|█████████████████████▎ \| 12/50 [00:16<00:55, 1.46s/it][A
	26%\|███████████████████████▏ \| 13/50 [00:17<00:54, 1.47s/it][A
	28%\|████████████████████████▉ \| 14/50 [00:18<00:53, 1.47s/it][A
	30%\|██████████████████████████▋ \| 15/50 [00:20<00:51, 1.47s/it][A
	32%\|████████████████████████████▍ \| 16/50 [00:21<00:50, 1.47s/it][A
	34%\|██████████████████████████████▎ \| 17/50 [00:23<00:49, 1.49s/it][A
	36%\|████████████████████████████████ \| 18/50 [00:24<00:47, 1.49s/it][A
	38%\|█████████████████████████████████▊ \| 19/50 [00:26<00:45, 1.48s/it][A
	40%\|███████████████████████████████████▌ \| 20/50 [00:27<00:44, 1.49s/it][A
	42%\|█████████████████████████████████████▍ \| 21/50 [00:29<00:43, 1.49s/it][A
	44%\|███████████████████████████████████████▏ \| 22/50 [00:30<00:41, 1.49s/it][A
	46%\|████████████████████████████████████████▉ \| 23/50 [00:32<00:40, 1.49s/it][A
	48%\|██████████████████████████████████████████▋ \| 24/50 [00:33<00:38, 1.49s/it][A
	50%\|████████████████████████████████████████████▌ \| 25/50 [00:35<00:37, 1.50s/it][A
	52%\|██████████████████████████████████████████████▎ \| 26/50 [00:36<00:36, 1.50s/it][A
	54%\|████████████████████████████████████████████████ \| 27/50 [00:38<00:34, 1.50s/it][A
	56%\|█████████████████████████████████████████████████▊ \| 28/50 [00:39<00:32, 1.50s/it][A
	58%\|███████████████████████████████████████████████████▌ \| 29/50 [00:41<00:31, 1.49s/it][A
	60%\|█████████████████████████████████████████████████████▍ \| 30/50 [00:42<00:29, 1.49s/it][A
	62%\|███████████████████████████████████████████████████████▏ \| 31/50 [00:44<00:28, 1.49s/it][A
	64%\|████████████████████████████████████████████████████████▉ \| 32/50 [00:45<00:26, 1.49s/it][A
	66%\|██████████████████████████████████████████████████████████▋ \| 33/50 [00:47<00:25, 1.51s/it][A
	68%\|████████████████████████████████████████████████████████████▌ \| 34/50 [00:48<00:24, 1.50s/it][A
	70%\|██████████████████████████████████████████████████████████████▎ \| 35/50 [00:50<00:22, 1.50s/it][A
	72%\|████████████████████████████████████████████████████████████████ \| 36/50 [00:51<00:21, 1.51s/it][A
	74%\|█████████████████████████████████████████████████████████████████▊ \| 37/50 [00:53<00:19, 1.51s/it][A
	76%\|███████████████████████████████████████████████████████████████████▋ \| 38/50 [00:54<00:18, 1.50s/it][A
	78%\|█████████████████████████████████████████████████████████████████████▍ \| 39/50 [00:56<00:16, 1.50s/it][A
	80%\|███████████████████████████████████████████████████████████████████████▏ \| 40/50 [00:57<00:14, 1.50s/it][A
	82%\|████████████████████████████████████████████████████████████████████████▉ \| 41/50 [00:59<00:13, 1.51s/it][A
	84%\|██████████████████████████████████████████████████████████████████████████▊ \| 42/50 [01:00<00:12, 1.50s/it][A
	86%\|████████████████████████████████████████████████████████████████████████████▌ \| 43/50 [01:02<00:10, 1.50s/it][A
	88%\|██████████████████████████████████████████████████████████████████████████████▎ \| 44/50 [01:03<00:09, 1.51s/it][A
	90%\|████████████████████████████████████████████████████████████████████████████████ \| 45/50 [01:05<00:07, 1.50s/it][A
	92%\|█████████████████████████████████████████████████████████████████████████████████▉ \| 46/50 [01:06<00:06, 1.50s/it][A
	94%\|███████████████████████████████████████████████████████████████████████████████████▋ \| 47/50 [01:08<00:04, 1.50s/it][A
	96%\|█████████████████████████████████████████████████████████████████████████████████████▍ \| 48/50 [01:09<00:03, 1.51s/it][A
	98%\|███████████████████████████████████████████████████████████████████████████████████████▏ \| 49/50 [01:11<00:01, 1.52s/it][ATraceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-wpexx418'
	Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-145k1as8'

	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:13<00:00, 1.54s/it][A
	[A{'eval_loss': 0.8691890239715576, 'eval_runtime': 75.3604, 'eval_samples_per_second': 5.308, 'eval_steps_per_second': 0.663, 'eval_ppl': 2.38498, 'memory/max_active (GiB)': 42.34, 'memory/max_allocated (GiB)': 42.34, 'memory/device_reserved (GiB)': 51.15, 'epoch': 0}
	0%\| \| 0/57 [01:15<?, ?it/s]
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:13<00:00, 1.54s/it][A
	[A 2%\|█▌ \| 1/57 [01:56<1:48:32, 116.29s/it] {'loss': 7.0251, 'grad_norm': 0.5795251131057739, 'learning_rate': 0.0, 'ppl': 1124.507, 'memory/max_active (GiB)': 45.79, 'memory/max_allocated (GiB)': 45.79, 'memory/device_reserved (GiB)': 60.59, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.02}
	2%\|█▌ \| 1/57 [01:56<1:48:32, 116.29s/it] 4%\|███ \| 2/57 [02:35<1:04:57, 70.87s/it] {'loss': 7.4605, 'grad_norm': 0.5027868151664734, 'learning_rate': 8.000000000000001e-06, 'ppl': 1738.01685, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.04}
	4%\|███ \| 2/57 [02:35<1:04:57, 70.87s/it] 5%\|████▋ \| 3/57 [03:14<50:49, 56.48s/it] {'loss': 7.0151, 'grad_norm': 0.5032415986061096, 'learning_rate': 1.6000000000000003e-05, 'ppl': 1113.31797, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.05}
	5%\|████▋ \| 3/57 [03:14<50:49, 56.48s/it] 7%\|██████▎ \| 4/57 [03:54<43:58, 49.78s/it] {'loss': 6.8756, 'grad_norm': 0.5377703309059143, 'learning_rate': 2.4e-05, 'ppl': 968.35621, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.07}
	7%\|██████▎ \| 4/57 [03:54<43:58, 49.78s/it] 9%\|███████▉ \| 5/57 [04:33<39:52, 46.01s/it] {'loss': 7.184, 'grad_norm': 0.5669687390327454, 'learning_rate': 3.2000000000000005e-05, 'ppl': 1318.17041, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.09}
	9%\|███████▉ \| 5/57 [04:33<39:52, 46.01s/it] 11%\|█████████▍ \| 6/57 [05:12<37:11, 43.76s/it] {'loss': 7.214, 'grad_norm': 0.6096590161323547, 'learning_rate': 4e-05, 'ppl': 1358.31467, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.11}
	11%\|█████████▍ \| 6/57 [05:12<37:11, 43.76s/it] 12%\|███████████ \| 7/57 [05:52<35:17, 42.35s/it] {'loss': 6.9833, 'grad_norm': 0.5615111589431763, 'learning_rate': 3.996351108446635e-05, 'ppl': 1078.47146, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.12}
	12%\|███████████ \| 7/57 [05:52<35:17, 42.35s/it] 14%\|████████████▋ \| 8/57 [06:31<33:52, 41.47s/it] {'loss': 6.7278, 'grad_norm': 0.5981941223144531, 'learning_rate': 3.985417748196108e-05, 'ppl': 835.30757, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.14}
	14%\|████████████▋ \| 8/57 [06:31<33:52, 41.47s/it] 16%\|██████████████▏ \| 9/57 [07:11<32:39, 40.82s/it] {'loss': 7.0391, 'grad_norm': 0.6475759148597717, 'learning_rate': 3.967239813894288e-05, 'ppl': 1140.36082, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.16}
	16%\|██████████████▏ \| 9/57 [07:11<32:39, 40.82s/it] 18%\|███████████████▌ \| 10/57 [07:50<31:38, 40.38s/it] {'loss': 6.9283, 'grad_norm': 0.6419268846511841, 'learning_rate': 3.9418836348521045e-05, 'ppl': 1020.75722, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.18}
	18%\|███████████████▌ \| 10/57 [07:50<31:38, 40.38s/it] 19%\|█████████████████▏ \| 11/57 [08:30<30:41, 40.04s/it] {'loss': 6.3315, 'grad_norm': 0.6758131980895996, 'learning_rate': 3.909441733017092e-05, 'ppl': 561.99896, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.19}
	19%\|█████████████████▏ \| 11/57 [08:30<30:41, 40.04s/it] 21%\|██████████████████▋ \| 12/57 [09:09<29:54, 39.88s/it] {'loss': 7.2004, 'grad_norm': 0.6126517653465271, 'learning_rate': 3.8700324853708304e-05, 'ppl': 1339.96664, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.21}
	21%\|██████████████████▋ \| 12/57 [09:09<29:54, 39.88s/it] 23%\|████████████████████▎ \| 13/57 [09:49<29:09, 39.77s/it] {'loss': 6.6677, 'grad_norm': 0.654209554195404, 'learning_rate': 3.82379969198418e-05, 'ppl': 786.58438, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.23}
	23%\|████████████████████▎ \| 13/57 [09:49<29:09, 39.77s/it] 25%\|█████████████████████▊ \| 14/57 [10:28<28:24, 39.64s/it] {'loss': 6.9996, 'grad_norm': 0.6837294101715088, 'learning_rate': 3.7709120513064196e-05, 'ppl': 1096.19459, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.25}
	25%\|█████████████████████▊ \| 14/57 [10:28<28:24, 39.64s/it] 26%\|███████████████████████▍ \| 15/57 [11:07<27:39, 39.51s/it] {'loss': 6.6326, 'grad_norm': 0.8170902132987976, 'learning_rate': 3.711562544602895e-05, 'ppl': 759.45419, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.26}
	26%\|███████████████████████▍ \| 15/57 [11:07<27:39, 39.51s/it][2026-04-13 02:29:02,473] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step...

	0%\| \| 0/50 [00:00<?, ?it/s][A
	4%\|███▌ \| 2/50 [00:01<00:35, 1.34it/s][A
	6%\|█████▍ \| 3/50 [00:03<00:49, 1.06s/it][A
	8%\|███████▏ \| 4/50 [00:04<00:56, 1.23s/it][A
	10%\|█████████ \| 5/50 [00:06<00:59, 1.33s/it][A
	12%\|██████████▊ \| 6/50 [00:07<01:01, 1.39s/it][A
	14%\|████████████▌ \| 7/50 [00:09<01:01, 1.43s/it][A
	16%\|██████████████▍ \| 8/50 [00:10<01:01, 1.45s/it][A
	18%\|████████████████▏ \| 9/50 [00:12<01:01, 1.49s/it][A
	20%\|█████████████████▊ \| 10/50 [00:13<01:00, 1.50s/it][A
	22%\|███████████████████▌ \| 11/50 [00:15<00:58, 1.51s/it][A
	24%\|█████████████████████▎ \| 12/50 [00:16<00:57, 1.51s/it][A
	26%\|███████████████████████▏ \| 13/50 [00:18<00:55, 1.51s/it][A
	28%\|████████████████████████▉ \| 14/50 [00:19<00:54, 1.51s/it][A
	30%\|██████████████████████████▋ \| 15/50 [00:21<00:52, 1.51s/it][A
	32%\|████████████████████████████▍ \| 16/50 [00:22<00:51, 1.51s/it][A
	34%\|██████████████████████████████▎ \| 17/50 [00:24<00:50, 1.52s/it][A
	36%\|████████████████████████████████ \| 18/50 [00:25<00:48, 1.52s/it][A
	38%\|█████████████████████████████████▊ \| 19/50 [00:27<00:46, 1.51s/it][A
	40%\|███████████████████████████████████▌ \| 20/50 [00:28<00:45, 1.52s/it][A
	42%\|█████████████████████████████████████▍ \| 21/50 [00:30<00:43, 1.52s/it][A
	44%\|███████████████████████████████████████▏ \| 22/50 [00:31<00:42, 1.51s/it][A
	46%\|████████████████████████████████████████▉ \| 23/50 [00:33<00:40, 1.51s/it][A
	48%\|██████████████████████████████████████████▋ \| 24/50 [00:34<00:39, 1.51s/it][A
	50%\|████████████████████████████████████████████▌ \| 25/50 [00:36<00:38, 1.52s/it][A
	52%\|██████████████████████████████████████████████▎ \| 26/50 [00:37<00:36, 1.52s/it][A
	54%\|████████████████████████████████████████████████ \| 27/50 [00:39<00:34, 1.52s/it][A
	56%\|█████████████████████████████████████████████████▊ \| 28/50 [00:40<00:33, 1.51s/it][A
	58%\|███████████████████████████████████████████████████▌ \| 29/50 [00:42<00:31, 1.51s/it][A
	60%\|█████████████████████████████████████████████████████▍ \| 30/50 [00:43<00:30, 1.51s/it][A
	62%\|███████████████████████████████████████████████████████▏ \| 31/50 [00:45<00:28, 1.51s/it][A
	64%\|████████████████████████████████████████████████████████▉ \| 32/50 [00:46<00:27, 1.51s/it][A
	66%\|██████████████████████████████████████████████████████████▋ \| 33/50 [00:48<00:25, 1.52s/it][A
	68%\|████████████████████████████████████████████████████████████▌ \| 34/50 [00:50<00:24, 1.52s/it][A
	70%\|██████████████████████████████████████████████████████████████▎ \| 35/50 [00:51<00:22, 1.51s/it][A
	72%\|████████████████████████████████████████████████████████████████ \| 36/50 [00:53<00:21, 1.52s/it][A
	74%\|█████████████████████████████████████████████████████████████████▊ \| 37/50 [00:54<00:19, 1.52s/it][A
	76%\|███████████████████████████████████████████████████████████████████▋ \| 38/50 [00:56<00:18, 1.51s/it][A
	78%\|█████████████████████████████████████████████████████████████████████▍ \| 39/50 [00:57<00:16, 1.51s/it][A
	80%\|███████████████████████████████████████████████████████████████████████▏ \| 40/50 [00:59<00:15, 1.51s/it][A
	82%\|████████████████████████████████████████████████████████████████████████▉ \| 41/50 [01:00<00:13, 1.52s/it][A
	84%\|██████████████████████████████████████████████████████████████████████████▊ \| 42/50 [01:02<00:12, 1.52s/it][A
	86%\|████████████████████████████████████████████████████████████████████████████▌ \| 43/50 [01:03<00:10, 1.52s/it][A
	88%\|██████████████████████████████████████████████████████████████████████████████▎ \| 44/50 [01:05<00:09, 1.52s/it][A
	90%\|████████████████████████████████████████████████████████████████████████████████ \| 45/50 [01:06<00:07, 1.51s/it][A
	92%\|█████████████████████████████████████████████████████████████████████████████████▉ \| 46/50 [01:08<00:06, 1.51s/it][A
	94%\|███████████████████████████████████████████████████████████████████████████████████▋ \| 47/50 [01:09<00:04, 1.51s/it][A
	96%\|█████████████████████████████████████████████████████████████████████████████████████▍ \| 48/50 [01:11<00:03, 1.52s/it][A
	98%\|███████████████████████████████████████████████████████████████████████████████████████▏ \| 49/50 [01:12<00:01, 1.53s/it][ATraceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-w4mb852g'
	Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-5tq6sxwg'

	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.54s/it][A
	[A{'eval_loss': 0.8203887939453125, 'eval_runtime': 76.2715, 'eval_samples_per_second': 5.244, 'eval_steps_per_second': 0.656, 'eval_ppl': 2.27138, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 60.61, 'epoch': 0.26, 'tokens/train_per_sec_per_gpu': 0.0}
	26%\|███████████████████████▍ \| 15/57 [12:23<27:39, 39.51s/it]
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.54s/it][A
	[A[2026-04-13 02:30:18,756] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-15
	[2026-04-13 02:30:19,318] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
	return func(args, *kwargs)

	28%\|████████████████████████▉ \| 16/57 [13:04<42:50, 62.70s/it] {'loss': 6.134, 'grad_norm': 0.6509190797805786, 'learning_rate': 3.645967731787313e-05, 'ppl': 461.27759, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.28}
	28%\|████████████████████████▉ \| 16/57 [13:04<42:50, 62.70s/it] 30%\|██████████████████████████▌ \| 17/57 [13:43<37:08, 55.72s/it] {'loss': 7.0539, 'grad_norm': 0.6493154764175415, 'learning_rate': 3.5743669612181004e-05, 'ppl': 1157.36367, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.3}
	30%\|██████████████████████████▌ \| 17/57 [13:43<37:08, 55.72s/it] 32%\|████████████████████████████ \| 18/57 [14:22<32:59, 50.75s/it] {'loss': 6.1112, 'grad_norm': 0.5940355658531189, 'learning_rate': 3.497021496342203e-05, 'ppl': 450.87945, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.32}
	32%\|████████████████████████████ \| 18/57 [14:22<32:59, 50.75s/it] 33%\|█████████████████████████████▋ \| 19/57 [15:02<30:00, 47.37s/it] {'loss': 6.5471, 'grad_norm': 0.5027147531509399, 'learning_rate': 3.4142135623730954e-05, 'ppl': 697.2193, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.33}
	33%\|█████████████████████████████▋ \| 19/57 [15:02<30:00, 47.37s/it] 35%\|███████████████████████████████▏ \| 20/57 [15:42<27:47, 45.07s/it] {'loss': 6.3782, 'grad_norm': 0.5283752083778381, 'learning_rate': 3.326245316481591e-05, 'ppl': 588.86679, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.35}
	35%\|███████████████████████████████▏ \| 20/57 [15:42<27:47, 45.07s/it] 37%\|████████████████████████████████▊ \| 21/57 [16:21<26:00, 43.36s/it] {'loss': 6.7009, 'grad_norm': 0.5615221858024597, 'learning_rate': 3.2334377452570866e-05, 'ppl': 813.13732, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.37}
	37%\|████████████████████████████████▊ \| 21/57 [16:21<26:00, 43.36s/it] 39%\|██████████████████████████████████▎ \| 22/57 [17:00<24:35, 42.16s/it] {'loss': 5.7217, 'grad_norm': 0.537100613117218, 'learning_rate': 3.136129493462312e-05, 'ppl': 305.4237, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.39}
	39%\|██████████████████████████████████▎ \| 22/57 [17:00<24:35, 42.16s/it] 40%\|███████████████████████████████████▉ \| 23/57 [17:40<23:25, 41.33s/it] {'loss': 5.8115, 'grad_norm': 0.47874653339385986, 'learning_rate': 3.0346756283553138e-05, 'ppl': 334.11993, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.4}
	40%\|███████████████████████████████████▉ \| 23/57 [17:40<23:25, 41.33s/it] 42%\|█████████████████████████████████████▍ \| 24/57 [18:19<22:28, 40.85s/it] {'loss': 6.0961, 'grad_norm': 0.43611156940460205, 'learning_rate': 2.9294463440875375e-05, 'ppl': 444.12231, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.42}
	42%\|█████████████████████████████████████▍ \| 24/57 [18:19<22:28, 40.85s/it] 44%\|███████████████████████████████████████ \| 25/57 [18:59<21:33, 40.43s/it] {'loss': 6.2161, 'grad_norm': 0.43894726037979126, 'learning_rate': 2.820825610905514e-05, 'ppl': 500.74651, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.44}
	44%\|███████████████████████████████████████ \| 25/57 [18:59<21:33, 40.43s/it] 46%\|████████████████████████████████████████▌ \| 26/57 [19:38<20:43, 40.13s/it] {'loss': 6.3839, 'grad_norm': 0.46272194385528564, 'learning_rate': 2.7092097740850712e-05, 'ppl': 592.23292, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.46}
	46%\|████████████████████████████████████████▌ \| 26/57 [19:38<20:43, 40.13s/it] 47%\|██████████████████████████████████████████▏ \| 27/57 [20:18<19:56, 39.90s/it] {'loss': 5.8903, 'grad_norm': 0.4597170650959015, 'learning_rate': 2.595006107710406e-05, 'ppl': 361.51372, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.47}
	47%\|██████████████████████████████████████████▏ \| 27/57 [20:18<19:56, 39.90s/it] 49%\|███████████████████████████████████████████▋ \| 28/57 [20:57<19:15, 39.84s/it] {'loss': 6.4172, 'grad_norm': 0.4441160261631012, 'learning_rate': 2.4786313285751158e-05, 'ppl': 612.28631, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.49}
	49%\|███████████████████████████████████████████▋ \| 28/57 [20:57<19:15, 39.84s/it] 51%\|█████████████████████████████████████████████▎ \| 29/57 [21:37<18:34, 39.80s/it] {'loss': 7.06, 'grad_norm': 0.41745519638061523, 'learning_rate': 2.360510075627812e-05, 'ppl': 1164.44517, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.51}
	51%\|█████████████████████████████████████████████▎ \| 29/57 [21:37<18:34, 39.80s/it] 53%\|██████████████████████████████████████████████▊ \| 30/57 [22:17<17:51, 39.70s/it] {'loss': 6.5415, 'grad_norm': 0.4563206434249878, 'learning_rate': 2.2410733605106462e-05, 'ppl': 693.32579, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.53}
	53%\|██████████████████████████████████████████████▊ \| 30/57 [22:17<17:51, 39.70s/it][2026-04-13 02:40:11,875] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step...

	0%\| \| 0/50 [00:00<?, ?it/s][A
	4%\|███▌ \| 2/50 [00:01<00:35, 1.33it/s][A
	6%\|█████▍ \| 3/50 [00:03<00:50, 1.06s/it][A
	8%\|███████▏ \| 4/50 [00:04<00:56, 1.23s/it][A
	10%\|█████████ \| 5/50 [00:06<00:59, 1.33s/it][A
	12%\|██████████▊ \| 6/50 [00:07<01:01, 1.39s/it][A
	14%\|████████████▌ \| 7/50 [00:09<01:01, 1.43s/it][A
	16%\|██████████████▍ \| 8/50 [00:10<01:00, 1.45s/it][A
	18%\|████████████████▏ \| 9/50 [00:12<01:01, 1.50s/it][A
	20%\|█████████████████▊ \| 10/50 [00:13<01:00, 1.50s/it][A
	22%\|███████████████████▌ \| 11/50 [00:15<00:58, 1.51s/it][A
	24%\|█████████████████████▎ \| 12/50 [00:16<00:57, 1.51s/it][A
	26%\|███████████████████████▏ \| 13/50 [00:18<00:55, 1.51s/it][A
	28%\|████████████████████████▉ \| 14/50 [00:19<00:54, 1.51s/it][A
	30%\|██████████████████████████▋ \| 15/50 [00:21<00:52, 1.51s/it][A
	32%\|████████████████████████████▍ \| 16/50 [00:22<00:51, 1.51s/it][A
	34%\|██████████████████████████████▎ \| 17/50 [00:24<00:50, 1.52s/it][A
	36%\|████████████████████████████████ \| 18/50 [00:25<00:48, 1.52s/it][A
	38%\|█████████████████████████████████▊ \| 19/50 [00:27<00:46, 1.51s/it][A
	40%\|███████████████████████████████████▌ \| 20/50 [00:28<00:45, 1.52s/it][A
	42%\|█████████████████████████████████████▍ \| 21/50 [00:30<00:44, 1.52s/it][A
	44%\|███████████████████████████████████████▏ \| 22/50 [00:31<00:42, 1.52s/it][A
	46%\|████████████████████████████████████████▉ \| 23/50 [00:33<00:40, 1.51s/it][A
	48%\|██████████████████████████████████████████▋ \| 24/50 [00:34<00:39, 1.51s/it][A
	50%\|████████████████████████████████████████████▌ \| 25/50 [00:36<00:38, 1.52s/it][A
	52%\|██████████████████████████████████████████████▎ \| 26/50 [00:37<00:36, 1.53s/it][A
	54%\|████████████████████████████████████████████████ \| 27/50 [00:39<00:34, 1.52s/it][A
	56%\|█████████████████████████████████████████████████▊ \| 28/50 [00:40<00:33, 1.52s/it][A
	58%\|███████████████████████████████████████████████████▌ \| 29/50 [00:42<00:31, 1.52s/it][A
	60%\|█████████████████████████████████████████████████████▍ \| 30/50 [00:43<00:30, 1.52s/it][A
	62%\|███████████████████████████████████████████████████████▏ \| 31/50 [00:45<00:28, 1.51s/it][A
	64%\|████████████████████████████████████████████████████████▉ \| 32/50 [00:47<00:27, 1.51s/it][A
	66%\|██████████████████████████████████████████████████████████▋ \| 33/50 [00:48<00:25, 1.53s/it][A
	68%\|████████████████████████████████████████████████████████████▌ \| 34/50 [00:50<00:24, 1.52s/it][A
	70%\|██████████████████████████████████████████████████████████████▎ \| 35/50 [00:51<00:22, 1.52s/it][A
	72%\|████████████████████████████████████████████████████████████████ \| 36/50 [00:53<00:21, 1.52s/it][A
	74%\|█████████████████████████████████████████████████████████████████▊ \| 37/50 [00:54<00:19, 1.52s/it][A
	76%\|███████████████████████████████████████████████████████████████████▋ \| 38/50 [00:56<00:18, 1.52s/it][A
	78%\|█████████████████████████████████████████████████████████████████████▍ \| 39/50 [00:57<00:16, 1.52s/it][A
	80%\|███████████████████████████████████████████████████████████████████████▏ \| 40/50 [00:59<00:15, 1.51s/it][A
	82%\|████████████████████████████████████████████████████████████████████████▉ \| 41/50 [01:00<00:13, 1.52s/it][A
	84%\|██████████████████████████████████████████████████████████████████████████▊ \| 42/50 [01:02<00:12, 1.52s/it][A
	86%\|████████████████████████████████████████████████████████████████████████████▌ \| 43/50 [01:03<00:10, 1.52s/it][A
	88%\|██████████████████████████████████████████████████████████████████████████████▎ \| 44/50 [01:05<00:09, 1.52s/it][A
	90%\|████████████████████████████████████████████████████████████████████████████████ \| 45/50 [01:06<00:07, 1.52s/it][A
	92%\|█████████████████████████████████████████████████████████████████████████████████▉ \| 46/50 [01:08<00:06, 1.52s/it][A
	94%\|███████████████████████████████████████████████████████████████████████████████████▋ \| 47/50 [01:09<00:04, 1.51s/it][A
	96%\|█████████████████████████████████████████████████████████████████████████████████████▍ \| 48/50 [01:11<00:03, 1.52s/it][A
	98%\|███████████████████████████████████████████████████████████████████████████████████████▏ \| 49/50 [01:12<00:01, 1.53s/it][ATraceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-sbrhfmc2'
	Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-2ddl4foh'

	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.55s/it][A
	[A{'eval_loss': 0.7725631594657898, 'eval_runtime': 76.3564, 'eval_samples_per_second': 5.239, 'eval_steps_per_second': 0.655, 'eval_ppl': 2.16531, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 62.34, 'epoch': 0.53, 'tokens/train_per_sec_per_gpu': 0.0}
	53%\|██████████████████████████████████████████████▊ \| 30/57 [23:33<17:51, 39.70s/it]
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.55s/it][A
	[A[2026-04-13 02:41:28,242] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-30
	[2026-04-13 02:41:28,776] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
	return func(args, *kwargs)

	54%\|████████████████████████████████████████████████▍ \| 31/57 [24:13<27:12, 62.80s/it] {'loss': 5.4361, 'grad_norm': 0.3696332275867462, 'learning_rate': 2.1207569948445724e-05, 'ppl': 229.54521, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.54}
	54%\|████████████████████████████████████████████████▍ \| 31/57 [24:13<27:12, 62.80s/it] 56%\|█████████████████████████████████████████████████▉ \| 32/57 [24:53<23:16, 55.86s/it] {'loss': 6.3518, 'grad_norm': 0.3885038495063782, 'learning_rate': 2e-05, 'ppl': 573.52412, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.56}
	56%\|█████████████████████████████████████████████████▉ \| 32/57 [24:53<23:16, 55.86s/it] 58%\|███████████████████████████████████████████████████▌ \| 33/57 [25:32<20:23, 50.98s/it] {'loss': 5.9664, 'grad_norm': 0.35446420311927795, 'learning_rate': 1.879243005155428e-05, 'ppl': 390.09878, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.58}
	58%\|███████████████████████████████████████████████████▌ \| 33/57 [25:32<20:23, 50.98s/it] 60%\|█████████████████████████████████████████████████████ \| 34/57 [26:12<18:14, 47.58s/it] {'loss': 6.0508, 'grad_norm': 0.3991977274417877, 'learning_rate': 1.758926639489354e-05, 'ppl': 424.45246, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.6}
	60%\|█████████████████████████████████████████████████████ \| 34/57 [26:12<18:14, 47.58s/it] 61%\|██████████████████████████████████████████████████████▋ \| 35/57 [26:51<16:32, 45.12s/it] {'loss': 6.5703, 'grad_norm': 0.3999829590320587, 'learning_rate': 1.6394899243721887e-05, 'ppl': 713.58389, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.61}
	61%\|██████████████████████████████████████████████████████▋ \| 35/57 [26:51<16:32, 45.12s/it] 63%\|████████████████████████████████████████████████████████▏ \| 36/57 [27:31<15:10, 43.37s/it] {'loss': 5.9417, 'grad_norm': 0.3948379158973694, 'learning_rate': 1.5213686714248852e-05, 'ppl': 380.58137, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.63}
	63%\|████████████████████████████████████████████████████████▏ \| 36/57 [27:31<15:10, 43.37s/it] 65%\|█████████████████████████████████████████████████████████▊ \| 37/57 [28:10<14:03, 42.19s/it] {'loss': 6.4769, 'grad_norm': 0.3924844264984131, 'learning_rate': 1.4049938922895945e-05, 'ppl': 649.95297, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.65}
	65%\|█████████████████████████████████████████████████████████▊ \| 37/57 [28:10<14:03, 42.19s/it] 67%\|███████████████████████████████████████████████████████████▎ \| 38/57 [28:50<13:06, 41.41s/it] {'loss': 6.6282, 'grad_norm': 0.4368893504142761, 'learning_rate': 1.2907902259149287e-05, 'ppl': 756.11993, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.67}
	67%\|███████████████████████████████████████████████████████████▎ \| 38/57 [28:50<13:06, 41.41s/it] 68%\|████████████████████████████████████████████████████████████▉ \| 39/57 [29:29<12:15, 40.87s/it] {'loss': 5.3714, 'grad_norm': 0.36707866191864014, 'learning_rate': 1.1791743890944869e-05, 'ppl': 215.16389, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.68}
	68%\|████████████████████████████████████████████████████████████▉ \| 39/57 [29:29<12:15, 40.87s/it] 70%\|██████████████████████████████████████████████████████████████▍ \| 40/57 [30:09<11:26, 40.39s/it] {'loss': 5.8953, 'grad_norm': 0.41011226177215576, 'learning_rate': 1.070553655912463e-05, 'ppl': 363.32582, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.7}
	70%\|██████████████████████████████████████████████████████████████▍ \| 40/57 [30:09<11:26, 40.39s/it] 72%\|████████████████████████████████████████████████████████████████ \| 41/57 [30:48<10:42, 40.17s/it] {'loss': 6.8336, 'grad_norm': 0.3589271903038025, 'learning_rate': 9.653243716446862e-06, 'ppl': 928.5275, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.72}
	72%\|████████████████████████████████████████████████████████████████ \| 41/57 [30:48<10:42, 40.17s/it] 74%\|█████████████████████████████████████████████████████████████████▌ \| 42/57 [31:28<09:59, 39.97s/it] {'loss': 6.0634, 'grad_norm': 0.383444607257843, 'learning_rate': 8.638705065376887e-06, 'ppl': 429.83439, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.74}
	74%\|█████████████████████████████████████████████████████████████████▌ \| 42/57 [31:28<09:59, 39.97s/it] 75%\|███████████████████████████████████████████████████████████████████▏ \| 43/57 [32:07<09:17, 39.84s/it] {'loss': 6.6285, 'grad_norm': 0.3992387652397156, 'learning_rate': 7.665622547429139e-06, 'ppl': 756.3468, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.75}
	75%\|███████████████████████████████████████████████████████████████████▏ \| 43/57 [32:07<09:17, 39.84s/it] 77%\|████████████████████████████████████████████████████████████████████▋ \| 44/57 [32:47<08:36, 39.72s/it] {'loss': 5.9547, 'grad_norm': 0.42180073261260986, 'learning_rate': 6.737546835184101e-06, 'ppl': 385.56122, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.77}
	77%\|████████████████████████████████████████████████████████████████████▋ \| 44/57 [32:47<08:36, 39.72s/it] 79%\|██████████████████████████████████████████████████████████████████████▎ \| 45/57 [33:26<07:56, 39.70s/it] {'loss': 6.0193, 'grad_norm': 0.4660789668560028, 'learning_rate': 5.857864376269051e-06, 'ppl': 411.29059, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.79}
	79%\|██████████████████████████████████████████████████████████████████████▎ \| 45/57 [33:26<07:56, 39.70s/it][2026-04-13 02:51:21,837] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step...

	0%\| \| 0/50 [00:00<?, ?it/s][A
	4%\|███▌ \| 2/50 [00:01<00:35, 1.34it/s][A
	6%\|█████▍ \| 3/50 [00:03<00:50, 1.06s/it][A
	8%\|███████▏ \| 4/50 [00:04<00:56, 1.23s/it][A
	10%\|█████████ \| 5/50 [00:06<01:00, 1.33s/it][A
	12%\|██████████▊ \| 6/50 [00:07<01:01, 1.39s/it][A
	14%\|████████████▌ \| 7/50 [00:09<01:01, 1.43s/it][A
	16%\|██████████████▍ \| 8/50 [00:10<01:00, 1.45s/it][A
	18%\|████████████████▏ \| 9/50 [00:12<01:03, 1.55s/it][A
	20%\|█████████████████▊ \| 10/50 [00:13<01:01, 1.55s/it][A
	22%\|███████████████████▌ \| 11/50 [00:15<00:59, 1.54s/it][A
	24%\|█████████████████████▎ \| 12/50 [00:16<00:58, 1.53s/it][A
	26%\|███████████████████████▏ \| 13/50 [00:18<00:56, 1.53s/it][A
	28%\|████████████████████████▉ \| 14/50 [00:19<00:54, 1.52s/it][A
	30%\|██████████████████████████▋ \| 15/50 [00:21<00:53, 1.52s/it][A
	32%\|████████████████████████████▍ \| 16/50 [00:22<00:51, 1.51s/it][A
	34%\|██████████████████████████████▎ \| 17/50 [00:24<00:50, 1.53s/it][A
	36%\|████████████████████████████████ \| 18/50 [00:25<00:48, 1.52s/it][A
	38%\|█████████████████████████████████▊ \| 19/50 [00:27<00:46, 1.52s/it][A
	40%\|███████████████████████████████████▌ \| 20/50 [00:29<00:45, 1.52s/it][A
	42%\|█████████████████████████████████████▍ \| 21/50 [00:30<00:44, 1.52s/it][A
	44%\|███████████████████████████████████████▏ \| 22/50 [00:32<00:42, 1.52s/it][A
	46%\|████████████████████████████████████████▉ \| 23/50 [00:33<00:40, 1.52s/it][A
	48%\|██████████████████████████████████████████▋ \| 24/50 [00:35<00:39, 1.51s/it][A
	50%\|████████████████████████████████████████████▌ \| 25/50 [00:36<00:38, 1.52s/it][A
	52%\|██████████████████████████████████████████████▎ \| 26/50 [00:38<00:36, 1.53s/it][A
	54%\|████████████████████████████████████████████████ \| 27/50 [00:39<00:34, 1.52s/it][A
	56%\|█████████████████████████████████████████████████▊ \| 28/50 [00:41<00:33, 1.52s/it][A
	58%\|███████████████████████████████████████████████████▌ \| 29/50 [00:42<00:31, 1.51s/it][A
	60%\|█████████████████████████████████████████████████████▍ \| 30/50 [00:44<00:30, 1.51s/it][A
	62%\|███████████████████████████████████████████████████████▏ \| 31/50 [00:45<00:28, 1.51s/it][A
	64%\|████████████████████████████████████████████████████████▉ \| 32/50 [00:47<00:27, 1.51s/it][A
	66%\|██████████████████████████████████████████████████████████▋ \| 33/50 [00:48<00:25, 1.53s/it][A
	68%\|████████████████████████████████████████████████████████████▌ \| 34/50 [00:50<00:24, 1.52s/it][A
	70%\|██████████████████████████████████████████████████████████████▎ \| 35/50 [00:51<00:22, 1.52s/it][A
	72%\|████████████████████████████████████████████████████████████████ \| 36/50 [00:53<00:21, 1.52s/it][A
	74%\|█████████████████████████████████████████████████████████████████▊ \| 37/50 [00:54<00:19, 1.52s/it][A
	76%\|███████████████████████████████████████████████████████████████████▋ \| 38/50 [00:56<00:18, 1.52s/it][A
	78%\|█████████████████████████████████████████████████████████████████████▍ \| 39/50 [00:57<00:16, 1.51s/it][A
	80%\|███████████████████████████████████████████████████████████████████████▏ \| 40/50 [00:59<00:15, 1.51s/it][A
	82%\|████████████████████████████████████████████████████████████████████████▉ \| 41/50 [01:00<00:13, 1.52s/it][A
	84%\|██████████████████████████████████████████████████████████████████████████▊ \| 42/50 [01:02<00:12, 1.52s/it][A
	86%\|████████████████████████████████████████████████████████████████████████████▌ \| 43/50 [01:03<00:10, 1.52s/it][A
	88%\|██████████████████████████████████████████████████████████████████████████████▎ \| 44/50 [01:05<00:09, 1.52s/it][A
	90%\|████████████████████████████████████████████████████████████████████████████████ \| 45/50 [01:06<00:07, 1.51s/it][A
	92%\|█████████████████████████████████████████████████████████████████████████████████▉ \| 46/50 [01:08<00:06, 1.52s/it][A
	94%\|███████████████████████████████████████████████████████████████████████████████████▋ \| 47/50 [01:09<00:04, 1.51s/it][A
	96%\|█████████████████████████████████████████████████████████████████████████████████████▍ \| 48/50 [01:11<00:03, 1.52s/it][A
	98%\|███████████████████████████████████████████████████████████████████████████████████████▏ \| 49/50 [01:13<00:01, 1.53s/it][ATraceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-ci_q2hzt'
	Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-4chbls79'

	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.56s/it][A
	[A{'eval_loss': 0.7636247277259827, 'eval_runtime': 76.556, 'eval_samples_per_second': 5.225, 'eval_steps_per_second': 0.653, 'eval_ppl': 2.14604, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 62.34, 'epoch': 0.79, 'tokens/train_per_sec_per_gpu': 0.0}
	79%\|██████████████████████████████████████████████████████████████████████▎ \| 45/57 [34:43<07:56, 39.70s/it]
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 50/50 [01:14<00:00, 1.56s/it][A
	[A[2026-04-13 02:52:38,402] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-45
	[2026-04-13 02:52:39,006] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
	return func(args, *kwargs)

	81%\|███████████████████████████████████████████████████████████████████████▊ \| 46/57 [35:23<11:31, 62.84s/it] {'loss': 6.1743, 'grad_norm': 0.419697105884552, 'learning_rate': 5.029785036577976e-06, 'ppl': 480.24673, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.81}
	81%\|███████████████████████████████████████████████████████████████████████▊ \| 46/57 [35:23<11:31, 62.84s/it] 82%\|█████████████████████████████████████████████████████████████████████████▍ \| 47/57 [36:03<09:18, 55.81s/it] {'loss': 6.3119, 'grad_norm': 0.3812112510204315, 'learning_rate': 4.256330387818999e-06, 'ppl': 551.09103, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.82}
	82%\|█████████████████████████████████████████████████████████████████████████▍ \| 47/57 [36:03<09:18, 55.81s/it] 84%\|██████████████████████████████████████████████████████████████████████████▉ \| 48/57 [36:42<07:37, 50.88s/it] {'loss': 6.1627, 'grad_norm': 0.45529961585998535, 'learning_rate': 3.5403226821268734e-06, 'ppl': 474.70806, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.84}
	84%\|██████████████████████████████████████████████████████████████████████████▉ \| 48/57 [36:42<07:37, 50.88s/it] 86%\|████████████████████████████████████████████████████████████████████████████▌ \| 49/57 [37:22<06:19, 47.46s/it] {'loss': 6.3451, 'grad_norm': 0.3951869010925293, 'learning_rate': 2.8843745539710523e-06, 'ppl': 569.69436, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.86}
	86%\|████████████████████████████████████████████████████████████████████████████▌ \| 49/57 [37:22<06:19, 47.46s/it] 88%\|██████████████████████████████████████████████████████████████████████████████ \| 50/57 [38:01<05:15, 45.05s/it] {'loss': 5.8601, 'grad_norm': 0.40012675523757935, 'learning_rate': 2.2908794869358044e-06, 'ppl': 350.75922, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.88}
	88%\|██████████████████████████████████████████████████████████████████████████████ \| 50/57 [38:01<05:15, 45.05s/it] 89%\|███████████████████████████████████████████████████████████████████████████████▋ \| 51/57 [38:40<04:20, 43.35s/it] {'loss': 6.1964, 'grad_norm': 0.3871590495109558, 'learning_rate': 1.7620030801581988e-06, 'ppl': 490.97833, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.89}
	89%\|███████████████████████████████████████████████████████████████████████████████▋ \| 51/57 [38:40<04:20, 43.35s/it] 91%\|█████████████████████████████████████████████████████████████████████████████████▏ \| 52/57 [39:20<03:30, 42.19s/it] {'loss': 5.8941, 'grad_norm': 0.33590954542160034, 'learning_rate': 1.2996751462917057e-06, 'ppl': 362.89009, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.91}
	91%\|█████████████████████████████████████████████████████████████████████████████████▏ \| 52/57 [39:20<03:30, 42.19s/it] 93%\|██████████████████████████████████████████████████████████████████████████████████▊ \| 53/57 [40:00<02:45, 41.44s/it] {'loss': 6.1674, 'grad_norm': 0.3889252543449402, 'learning_rate': 9.055826698290881e-07, 'ppl': 476.94444, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.93}
	93%\|██████████████████████████████████████████████████████████████████████████████████▊ \| 53/57 [40:00<02:45, 41.44s/it] 95%\|████████████████████████████████████████████████████████████████████████████████████▎ \| 54/57 [40:39<02:02, 40.85s/it] {'loss': 5.5078, 'grad_norm': 0.3783589005470276, 'learning_rate': 5.811636514789598e-07, 'ppl': 246.60799, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.95}
	95%\|████████████████████████████████████████████████████████████████████████████████████▎ \| 54/57 [40:39<02:02, 40.85s/it] 96%\|█████████████████████████████████████████████████████████████████████████████████████▉ \| 55/57 [41:18<01:20, 40.35s/it] {'loss': 5.9907, 'grad_norm': 0.496366024017334, 'learning_rate': 3.2760186105712964e-07, 'ppl': 399.6943, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.96}
	96%\|█████████████████████████████████████████████████████████████████████████████████████▉ \| 55/57 [41:18<01:20, 40.35s/it] 98%\|███████████████████████████████████████████████████████████████████████████████████████▍ \| 56/57 [41:58<00:40, 40.07s/it] {'loss': 5.8207, 'grad_norm': 0.40123143792152405, 'learning_rate': 1.4582251803892055e-07, 'ppl': 337.20802, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.98}
	98%\|███████████████████████████████████████████████████████████████████████████████████████▍ \| 56/57 [41:58<00:40, 40.07s/it]Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-xjzy2ttf'
	Traceback (most recent call last):
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers
	finalizer()
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__
	res = self._callback(self._args, *self._kwargs)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir
	rmtree(tempdir, onerror=onerror)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree
	_rmtree_safe_fd(stack, onexc)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
	onexc(func, path, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
	onexc(os.unlink, fullname, err)
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc
	return onerror(func, path, exc_info)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
	os.unlink(entry.name, dir_fd=topfd)
	OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-zeou0am4'
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 57/57 [42:37<00:00, 39.92s/it] {'loss': 6.1692, 'grad_norm': 0.4002166986465454, 'learning_rate': 3.648891553365008e-08, 'ppl': 477.80371, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 1.0}
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 57/57 [42:37<00:00, 39.92s/it][2026-04-13 03:00:32,592] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-57
	[2026-04-13 03:00:33,231] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
	return func(args, *kwargs)

	{'train_runtime': 2560.8721, 'train_samples_per_second': 1.425, 'train_steps_per_second': 0.022, 'train_loss': 6.388372555113675, 'memory/max_active (GiB)': 14.96, 'memory/max_allocated (GiB)': 14.96, 'memory/device_reserved (GiB)': 62.34, 'epoch': 1.0, 'tokens/train_per_sec_per_gpu': 0.0}
	100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 57/57 [42:38<00:00, 39.92s/it] 100%\|█████████████████████████████████████████████████████████████████████████████████████████\| 57/57 [42:38<00:00, 44.89s/it]
	[2026-04-13 03:00:33,786] [INFO] [axolotl.train.save_trained_model:233] [PID:406990] Training completed! Saving trained model to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2.
	[2026-04-13 03:00:34,238] [INFO] [axolotl.train.save_trained_model:351] [PID:406990] Model successfully saved to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2