Text Generation
PEFT
Safetensors
Transformers
qwen2
axolotl
lora
conversational
text-generation-inference
Instructions to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") model = PeftModel.from_pretrained(base_model, "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2") - Transformers
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2") model = AutoModelForMultimodalLM.from_pretrained("felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2
- SGLang
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 with Docker Model Runner:
docker model run hf.co/felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2
| [2026-04-13 02:17:40,735] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:406990] baseline 0.000GB () | |
| [2026-04-13 02:17:40,736] [INFO] [axolotl.cli.config.load_cfg:259] [PID:406990] config: | |
| { | |
| "activation_offloading": false, | |
| "adapter": "lora", | |
| "axolotl_config_path": "./axolotl_configs/Qwen2.5-coder-7b-instruct/primevul/lora-primevul-plus-token-diff-mask-alpha-2-ctx3.yaml", | |
| "base_model": "Qwen/Qwen2.5-Coder-7B-Instruct", | |
| "base_model_config": "Qwen/Qwen2.5-Coder-7B-Instruct", | |
| "batch_size": 64, | |
| "bf16": true, | |
| "capabilities": { | |
| "bf16": true, | |
| "compute_capability": "sm_120", | |
| "fp8": true, | |
| "n_gpu": 2, | |
| "n_node": 1 | |
| }, | |
| "context_parallel_size": 1, | |
| "dataloader_num_workers": 2, | |
| "dataloader_pin_memory": true, | |
| "dataloader_prefetch_factor": 256, | |
| "dataset_num_proc": 60, | |
| "dataset_prepared_path": "/u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2", | |
| "datasets": [ | |
| { | |
| "message_property_mappings": { | |
| "content": "content", | |
| "role": "role" | |
| }, | |
| "path": "felixwangg/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2", | |
| "split": "train", | |
| "trust_remote_code": false, | |
| "type": "pretokenized" | |
| } | |
| ], | |
| "ddp": true, | |
| "device": "cuda:0", | |
| "device_map": { | |
| "": 0 | |
| }, | |
| "dion_rank_fraction": 1.0, | |
| "dion_rank_multiple_of": 1, | |
| "early_stopping_patience": 1000, | |
| "env_capabilities": { | |
| "torch_version": "2.10.0" | |
| }, | |
| "eval_batch_size": 4, | |
| "eval_causal_lm_metrics": [ | |
| "sacrebleu", | |
| "comet", | |
| "ter", | |
| "chrf" | |
| ], | |
| "eval_max_new_tokens": 128, | |
| "eval_sample_packing": false, | |
| "eval_steps": 15, | |
| "eval_table_size": 0, | |
| "experimental_skip_move_to_device": true, | |
| "flash_attention": true, | |
| "fp16": false, | |
| "gradient_accumulation_steps": 8, | |
| "gradient_checkpointing": true, | |
| "gradient_checkpointing_kwargs": { | |
| "use_reentrant": true | |
| }, | |
| "include_tkps": true, | |
| "is_falcon_derived_model": false, | |
| "is_llama_derived_model": false, | |
| "is_mistral_derived_model": false, | |
| "learning_rate": 4e-05, | |
| "lisa_layers_attribute": "model.layers", | |
| "load_best_model_at_end": true, | |
| "load_in_4bit": false, | |
| "load_in_8bit": false, | |
| "local_rank": 0, | |
| "logging_steps": 1, | |
| "lora_alpha": 16, | |
| "lora_dropout": 0.05, | |
| "lora_r": 16, | |
| "lora_target_linear": true, | |
| "loraplus_lr_embedding": 1e-06, | |
| "lr_scheduler": "cosine", | |
| "mean_resizing_embeddings": false, | |
| "merge_lora": true, | |
| "micro_batch_size": 4, | |
| "model_config_type": "qwen2", | |
| "num_epochs": 1.0, | |
| "optimizer": "adamw_torch", | |
| "otel_metrics_host": "localhost", | |
| "otel_metrics_port": 8000, | |
| "output_dir": "/u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2", | |
| "pad_to_sequence_len": true, | |
| "plugins": [ | |
| "diff_mask_trainer.plugin.DiffMaskPlugin" | |
| ], | |
| "pretrain_multipack_attn": true, | |
| "profiler_steps_start": 0, | |
| "qlora_sharded_model_loading": false, | |
| "ray_num_workers": 1, | |
| "resources_per_worker": { | |
| "GPU": 1 | |
| }, | |
| "sample_packing": false, | |
| "sample_packing_bin_size": 200, | |
| "sample_packing_group_size": 100000, | |
| "save_only_model": false, | |
| "save_safetensors": true, | |
| "save_steps": 15, | |
| "save_total_limit": 1000, | |
| "sequence_len": 4096, | |
| "shuffle_before_merging_datasets": false, | |
| "shuffle_merged_datasets": true, | |
| "skip_prepare_dataset": false, | |
| "streaming_multipack_buffer_size": 10000, | |
| "strict": false, | |
| "tensor_parallel_size": 1, | |
| "test_datasets": [ | |
| { | |
| "message_property_mappings": { | |
| "content": "content", | |
| "role": "role" | |
| }, | |
| "path": "felixwangg/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2", | |
| "split": "validation", | |
| "trust_remote_code": false, | |
| "type": "pretokenized" | |
| } | |
| ], | |
| "tf32": false, | |
| "tiled_mlp_use_original_mlp": true, | |
| "tokenizer_config": "Qwen/Qwen2.5-Coder-7B-Instruct", | |
| "tokenizer_save_jinja_files": true, | |
| "tokenizer_type": "AutoTokenizer", | |
| "torch_dtype": "torch.bfloat16", | |
| "train_on_inputs": false, | |
| "trl": { | |
| "log_completions": false, | |
| "mask_truncated_completions": false, | |
| "ref_model_mixup_alpha": 0.9, | |
| "ref_model_sync_steps": 64, | |
| "scale_rewards": true, | |
| "sync_ref_model": false, | |
| "use_vllm": false, | |
| "vllm_server_host": "0.0.0.0", | |
| "vllm_server_port": 8000 | |
| }, | |
| "type_of_model": "Qwen2ForCausalLM", | |
| "use_otel_metrics": false, | |
| "use_ray": false, | |
| "use_wandb": true, | |
| "val_set_size": 0.0, | |
| "vllm": { | |
| "device": "auto", | |
| "dtype": "auto", | |
| "gpu_memory_utilization": 0.9, | |
| "host": "0.0.0.0", | |
| "port": 8000 | |
| }, | |
| "wandb_entity": "wtkuan", | |
| "wandb_log_model": "false", | |
| "wandb_name": "Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2", | |
| "wandb_project": "diff-mask-sft-primevul-ctx-3", | |
| "wandb_watch": "false", | |
| "warmup_ratio": 0.1, | |
| "weight_decay": 0.02, | |
| "world_size": 2 | |
| } | |
| [2026-04-13 02:17:41,352] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:406990] EOS: 151645 / <|im_end|> | |
| [2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:406990] BOS: None / None | |
| [2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:406990] PAD: 151643 / <|endoftext|> | |
| [2026-04-13 02:17:41,353] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:406990] UNK: None / None | |
| [2026-04-13 02:17:41,355] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:475] [PID:406990] Loading prepared dataset from disk at /u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2/8754561ab695e953202410ad43299050... | |
| [2026-04-13 02:17:41,376] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:475] [PID:406990] Loading prepared dataset from disk at /u901/t577wang/SecSteer/axolotl-datasets/lora/Qwen2.5-Coder-7B/prime_vul_plus_splitted_token_diff_mask_skip_indent_ctx3_chat_v2_alpha2/7a3c1ae28a530da625e3f3b6a296b5d3... | |
| [2026-04-13 02:17:41,400] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:417] [PID:406990] total_num_tokens: 3_735_150 | |
| [2026-04-13 02:17:41,445] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:435] [PID:406990] `total_supervised_tokens: 2_946_329` | |
| [2026-04-13 02:17:41,445] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:533] [PID:406990] total_num_steps: 57 | |
| [2026-04-13 02:17:41,445] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:406990] Maximum number of steps set at 57 | |
| [2026-04-13 02:17:41,473] [DEBUG] [axolotl.train.setup_model_and_tokenizer:70] [PID:406990] loading tokenizer... Qwen/Qwen2.5-Coder-7B-Instruct | |
| [2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:406990] EOS: 151645 / <|im_end|> | |
| [2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:406990] BOS: None / None | |
| [2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:406990] PAD: 151643 / <|endoftext|> | |
| [2026-04-13 02:17:42,082] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:406990] UNK: None / None | |
| [2026-04-13 02:17:42,082] [DEBUG] [axolotl.train.setup_model_and_tokenizer:82] [PID:406990] Loading model | |
| [2026-04-13 02:17:42,140] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:406990] Patched Trainer.evaluation_loop with nanmean loss calculation | |
| [2026-04-13 02:17:42,142] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:406990] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation | |
| Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 25%|ββββββββββββββββ | 1/4 [00:00<00:01, 1.59it/s] Loading checkpoint shards: 50%|ββββββββββββββββββββββββββββββββ | 2/4 [00:01<00:01, 1.61it/s] Loading checkpoint shards: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 3/4 [00:01<00:00, 1.74it/s] Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:01<00:00, 2.45it/s] Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:01<00:00, 2.09it/s] | |
| [2026-04-13 02:17:44,689] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:347] [PID:406990] Converting modules to torch.bfloat16 | |
| [2026-04-13 02:17:44,691] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:406990] Memory usage after model load 17.233GB (+17.233GB allocated, +18.252GB reserved) | |
| [2026-04-13 02:17:44,691] [INFO] [axolotl.loaders.adapter.load_lora:81] [PID:406990] found linear modules: ['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj'] | |
| trainable params: 40,370,176 || all params: 7,655,986,688 || trainable%: 0.5273 | |
| [2026-04-13 02:17:44,878] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:406990] after adapters 14.337GB (+14.337GB allocated, +18.330GB reserved) | |
| [2026-04-13 02:17:47,019] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/trl/extras/vllm_client.py:37: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2, 0.12.0. You have version 0.19.0 installed. We recommend installing a supported version to avoid compatibility issues. | |
| if is_vllm_available(): | |
| [2026-04-13 02:17:47,116] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py:105: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2, 0.12.0. You have version 0.19.0 installed. We recommend installing a supported version to avoid compatibility issues. | |
| if is_vllm_available(): | |
| DiffMaskPlugin: patching trainer with alpha=2.0 | |
| DiffMaskPlugin: compute_loss and prediction_step patched | |
| [2026-04-13 02:17:51,606] [INFO] [axolotl.train.save_initial_configs:413] [PID:406990] Pre-saving adapter config to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2... | |
| [2026-04-13 02:17:51,607] [INFO] [axolotl.train.save_initial_configs:417] [PID:406990] Pre-saving tokenizer to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2... | |
| [2026-04-13 02:17:51,745] [INFO] [axolotl.train.save_initial_configs:422] [PID:406990] Pre-saving model config to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2... | |
| [2026-04-13 02:17:51,749] [INFO] [axolotl.train.execute_training:212] [PID:406990] Starting trainer... | |
| [34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /u901/t577wang/.netrc. | |
| [34m[1mwandb[0m: Currently logged in as: [33mwtkuan[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin | |
| [34m[1mwandb[0m: [38;5;178mβ’Ώ[0m Waiting for wandb.init()... | |
| [Am[2K [34m[1mwandb[0m: [38;5;178mβ£»[0m Waiting for wandb.init()... | |
| [Am[2K [34m[1mwandb[0m: Tracking run with wandb version 0.24.0 | |
| [34m[1mwandb[0m: Run data is saved locally in [35m[1m/u901/t577wang/SecSteer/wandb/run-20260413_021753-v6k1ysas[0m | |
| [34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing. | |
| [34m[1mwandb[0m: Syncing run [33mQwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2[0m | |
| [34m[1mwandb[0m: βοΈ View project at [34m[4mhttps://wandb.ai/wtkuan/diff-mask-sft-primevul-ctx-3[0m | |
| [34m[1mwandb[0m: π View run at [34m[4mhttps://wandb.ai/wtkuan/diff-mask-sft-primevul-ctx-3/runs/v6k1ysas[0m | |
| [34m[1mwandb[0m: Detected [huggingface_hub.inference, mcp, openai] in use. | |
| [34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script. | |
| [34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/ | |
| [34m[1mwandb[0m: [33mWARNING[0m Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt") | |
| [34m[1mwandb[0m: [33mWARNING[0m Symlinked 1 file into the W&B run directory; call wandb.save again to sync new files. | |
| [2026-04-13 02:17:54,858] [INFO] [axolotl.utils.callbacks.on_train_begin:757] [PID:406990] The Axolotl config has been saved to the WandB run under files. | |
| 0%| | 0/57 [00:00<?, ?it/s][2026-04-13 02:17:54,861] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step... | |
| 0%| | 0/50 [00:00<?, ?it/s][A | |
| 4%|ββββ | 2/50 [00:01<00:30, 1.56it/s][A | |
| 6%|ββββββ | 3/50 [00:02<00:46, 1.02it/s][A | |
| 8%|ββββββββ | 4/50 [00:04<00:53, 1.16s/it][A | |
| 10%|βββββββββ | 5/50 [00:05<00:57, 1.27s/it][A | |
| 12%|βββββββββββ | 6/50 [00:07<00:58, 1.34s/it][A | |
| 14%|βββββββββββββ | 7/50 [00:08<00:59, 1.38s/it][A | |
| 16%|βββββββββββββββ | 8/50 [00:10<00:58, 1.40s/it][A | |
| 18%|βββββββββββββββββ | 9/50 [00:11<00:59, 1.44s/it][A | |
| 20%|ββββββββββββββββββ | 10/50 [00:13<00:58, 1.46s/it][A | |
| 22%|ββββββββββββββββββββ | 11/50 [00:14<00:57, 1.46s/it][A | |
| 24%|ββββββββββββββββββββββ | 12/50 [00:16<00:55, 1.46s/it][A | |
| 26%|ββββββββββββββββββββββββ | 13/50 [00:17<00:54, 1.47s/it][A | |
| 28%|βββββββββββββββββββββββββ | 14/50 [00:18<00:53, 1.47s/it][A | |
| 30%|βββββββββββββββββββββββββββ | 15/50 [00:20<00:51, 1.47s/it][A | |
| 32%|βββββββββββββββββββββββββββββ | 16/50 [00:21<00:50, 1.47s/it][A | |
| 34%|βββββββββββββββββββββββββββββββ | 17/50 [00:23<00:49, 1.49s/it][A | |
| 36%|ββββββββββββββββββββββββββββββββ | 18/50 [00:24<00:47, 1.49s/it][A | |
| 38%|ββββββββββββββββββββββββββββββββββ | 19/50 [00:26<00:45, 1.48s/it][A | |
| 40%|ββββββββββββββββββββββββββββββββββββ | 20/50 [00:27<00:44, 1.49s/it][A | |
| 42%|ββββββββββββββββββββββββββββββββββββββ | 21/50 [00:29<00:43, 1.49s/it][A | |
| 44%|ββββββββββββββββββββββββββββββββββββββββ | 22/50 [00:30<00:41, 1.49s/it][A | |
| 46%|βββββββββββββββββββββββββββββββββββββββββ | 23/50 [00:32<00:40, 1.49s/it][A | |
| 48%|βββββββββββββββββββββββββββββββββββββββββββ | 24/50 [00:33<00:38, 1.49s/it][A | |
| 50%|βββββββββββββββββββββββββββββββββββββββββββββ | 25/50 [00:35<00:37, 1.50s/it][A | |
| 52%|βββββββββββββββββββββββββββββββββββββββββββββββ | 26/50 [00:36<00:36, 1.50s/it][A | |
| 54%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 27/50 [00:38<00:34, 1.50s/it][A | |
| 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/50 [00:39<00:32, 1.50s/it][A | |
| 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/50 [00:41<00:31, 1.49s/it][A | |
| 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 30/50 [00:42<00:29, 1.49s/it][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/50 [00:44<00:28, 1.49s/it][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/50 [00:45<00:26, 1.49s/it][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/50 [00:47<00:25, 1.51s/it][A | |
| 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/50 [00:48<00:24, 1.50s/it][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/50 [00:50<00:22, 1.50s/it][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/50 [00:51<00:21, 1.51s/it][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/50 [00:53<00:19, 1.51s/it][A | |
| 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/50 [00:54<00:18, 1.50s/it][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/50 [00:56<00:16, 1.50s/it][A | |
| 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/50 [00:57<00:14, 1.50s/it][A | |
| 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/50 [00:59<00:13, 1.51s/it][A | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/50 [01:00<00:12, 1.50s/it][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/50 [01:02<00:10, 1.50s/it][A | |
| 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/50 [01:03<00:09, 1.51s/it][A | |
| 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/50 [01:05<00:07, 1.50s/it][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/50 [01:06<00:06, 1.50s/it][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/50 [01:08<00:04, 1.50s/it][A | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/50 [01:09<00:03, 1.51s/it][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/50 [01:11<00:01, 1.52s/it][ATraceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-wpexx418' | |
| Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-145k1as8' | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:13<00:00, 1.54s/it][A | |
| [A{'eval_loss': 0.8691890239715576, 'eval_runtime': 75.3604, 'eval_samples_per_second': 5.308, 'eval_steps_per_second': 0.663, 'eval_ppl': 2.38498, 'memory/max_active (GiB)': 42.34, 'memory/max_allocated (GiB)': 42.34, 'memory/device_reserved (GiB)': 51.15, 'epoch': 0} | |
| 0%| | 0/57 [01:15<?, ?it/s] | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:13<00:00, 1.54s/it][A | |
| [A 2%|ββ | 1/57 [01:56<1:48:32, 116.29s/it] {'loss': 7.0251, 'grad_norm': 0.5795251131057739, 'learning_rate': 0.0, 'ppl': 1124.507, 'memory/max_active (GiB)': 45.79, 'memory/max_allocated (GiB)': 45.79, 'memory/device_reserved (GiB)': 60.59, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.02} | |
| 2%|ββ | 1/57 [01:56<1:48:32, 116.29s/it] 4%|βββ | 2/57 [02:35<1:04:57, 70.87s/it] {'loss': 7.4605, 'grad_norm': 0.5027868151664734, 'learning_rate': 8.000000000000001e-06, 'ppl': 1738.01685, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.04} | |
| 4%|βββ | 2/57 [02:35<1:04:57, 70.87s/it] 5%|βββββ | 3/57 [03:14<50:49, 56.48s/it] {'loss': 7.0151, 'grad_norm': 0.5032415986061096, 'learning_rate': 1.6000000000000003e-05, 'ppl': 1113.31797, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.05} | |
| 5%|βββββ | 3/57 [03:14<50:49, 56.48s/it] 7%|βββββββ | 4/57 [03:54<43:58, 49.78s/it] {'loss': 6.8756, 'grad_norm': 0.5377703309059143, 'learning_rate': 2.4e-05, 'ppl': 968.35621, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.07} | |
| 7%|βββββββ | 4/57 [03:54<43:58, 49.78s/it] 9%|ββββββββ | 5/57 [04:33<39:52, 46.01s/it] {'loss': 7.184, 'grad_norm': 0.5669687390327454, 'learning_rate': 3.2000000000000005e-05, 'ppl': 1318.17041, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.09} | |
| 9%|ββββββββ | 5/57 [04:33<39:52, 46.01s/it] 11%|ββββββββββ | 6/57 [05:12<37:11, 43.76s/it] {'loss': 7.214, 'grad_norm': 0.6096590161323547, 'learning_rate': 4e-05, 'ppl': 1358.31467, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.11} | |
| 11%|ββββββββββ | 6/57 [05:12<37:11, 43.76s/it] 12%|βββββββββββ | 7/57 [05:52<35:17, 42.35s/it] {'loss': 6.9833, 'grad_norm': 0.5615111589431763, 'learning_rate': 3.996351108446635e-05, 'ppl': 1078.47146, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.12} | |
| 12%|βββββββββββ | 7/57 [05:52<35:17, 42.35s/it] 14%|βββββββββββββ | 8/57 [06:31<33:52, 41.47s/it] {'loss': 6.7278, 'grad_norm': 0.5981941223144531, 'learning_rate': 3.985417748196108e-05, 'ppl': 835.30757, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.14} | |
| 14%|βββββββββββββ | 8/57 [06:31<33:52, 41.47s/it] 16%|βββββββββββββββ | 9/57 [07:11<32:39, 40.82s/it] {'loss': 7.0391, 'grad_norm': 0.6475759148597717, 'learning_rate': 3.967239813894288e-05, 'ppl': 1140.36082, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.16} | |
| 16%|βββββββββββββββ | 9/57 [07:11<32:39, 40.82s/it] 18%|ββββββββββββββββ | 10/57 [07:50<31:38, 40.38s/it] {'loss': 6.9283, 'grad_norm': 0.6419268846511841, 'learning_rate': 3.9418836348521045e-05, 'ppl': 1020.75722, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.18} | |
| 18%|ββββββββββββββββ | 10/57 [07:50<31:38, 40.38s/it] 19%|ββββββββββββββββββ | 11/57 [08:30<30:41, 40.04s/it] {'loss': 6.3315, 'grad_norm': 0.6758131980895996, 'learning_rate': 3.909441733017092e-05, 'ppl': 561.99896, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.19} | |
| 19%|ββββββββββββββββββ | 11/57 [08:30<30:41, 40.04s/it] 21%|βββββββββββββββββββ | 12/57 [09:09<29:54, 39.88s/it] {'loss': 7.2004, 'grad_norm': 0.6126517653465271, 'learning_rate': 3.8700324853708304e-05, 'ppl': 1339.96664, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.21} | |
| 21%|βββββββββββββββββββ | 12/57 [09:09<29:54, 39.88s/it] 23%|βββββββββββββββββββββ | 13/57 [09:49<29:09, 39.77s/it] {'loss': 6.6677, 'grad_norm': 0.654209554195404, 'learning_rate': 3.82379969198418e-05, 'ppl': 786.58438, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.23} | |
| 23%|βββββββββββββββββββββ | 13/57 [09:49<29:09, 39.77s/it] 25%|ββββββββββββββββββββββ | 14/57 [10:28<28:24, 39.64s/it] {'loss': 6.9996, 'grad_norm': 0.6837294101715088, 'learning_rate': 3.7709120513064196e-05, 'ppl': 1096.19459, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.25} | |
| 25%|ββββββββββββββββββββββ | 14/57 [10:28<28:24, 39.64s/it] 26%|ββββββββββββββββββββββββ | 15/57 [11:07<27:39, 39.51s/it] {'loss': 6.6326, 'grad_norm': 0.8170902132987976, 'learning_rate': 3.711562544602895e-05, 'ppl': 759.45419, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 60.61, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.26} | |
| 26%|ββββββββββββββββββββββββ | 15/57 [11:07<27:39, 39.51s/it][2026-04-13 02:29:02,473] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step... | |
| 0%| | 0/50 [00:00<?, ?it/s][A | |
| 4%|ββββ | 2/50 [00:01<00:35, 1.34it/s][A | |
| 6%|ββββββ | 3/50 [00:03<00:49, 1.06s/it][A | |
| 8%|ββββββββ | 4/50 [00:04<00:56, 1.23s/it][A | |
| 10%|βββββββββ | 5/50 [00:06<00:59, 1.33s/it][A | |
| 12%|βββββββββββ | 6/50 [00:07<01:01, 1.39s/it][A | |
| 14%|βββββββββββββ | 7/50 [00:09<01:01, 1.43s/it][A | |
| 16%|βββββββββββββββ | 8/50 [00:10<01:01, 1.45s/it][A | |
| 18%|βββββββββββββββββ | 9/50 [00:12<01:01, 1.49s/it][A | |
| 20%|ββββββββββββββββββ | 10/50 [00:13<01:00, 1.50s/it][A | |
| 22%|ββββββββββββββββββββ | 11/50 [00:15<00:58, 1.51s/it][A | |
| 24%|ββββββββββββββββββββββ | 12/50 [00:16<00:57, 1.51s/it][A | |
| 26%|ββββββββββββββββββββββββ | 13/50 [00:18<00:55, 1.51s/it][A | |
| 28%|βββββββββββββββββββββββββ | 14/50 [00:19<00:54, 1.51s/it][A | |
| 30%|βββββββββββββββββββββββββββ | 15/50 [00:21<00:52, 1.51s/it][A | |
| 32%|βββββββββββββββββββββββββββββ | 16/50 [00:22<00:51, 1.51s/it][A | |
| 34%|βββββββββββββββββββββββββββββββ | 17/50 [00:24<00:50, 1.52s/it][A | |
| 36%|ββββββββββββββββββββββββββββββββ | 18/50 [00:25<00:48, 1.52s/it][A | |
| 38%|ββββββββββββββββββββββββββββββββββ | 19/50 [00:27<00:46, 1.51s/it][A | |
| 40%|ββββββββββββββββββββββββββββββββββββ | 20/50 [00:28<00:45, 1.52s/it][A | |
| 42%|ββββββββββββββββββββββββββββββββββββββ | 21/50 [00:30<00:43, 1.52s/it][A | |
| 44%|ββββββββββββββββββββββββββββββββββββββββ | 22/50 [00:31<00:42, 1.51s/it][A | |
| 46%|βββββββββββββββββββββββββββββββββββββββββ | 23/50 [00:33<00:40, 1.51s/it][A | |
| 48%|βββββββββββββββββββββββββββββββββββββββββββ | 24/50 [00:34<00:39, 1.51s/it][A | |
| 50%|βββββββββββββββββββββββββββββββββββββββββββββ | 25/50 [00:36<00:38, 1.52s/it][A | |
| 52%|βββββββββββββββββββββββββββββββββββββββββββββββ | 26/50 [00:37<00:36, 1.52s/it][A | |
| 54%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 27/50 [00:39<00:34, 1.52s/it][A | |
| 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/50 [00:40<00:33, 1.51s/it][A | |
| 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/50 [00:42<00:31, 1.51s/it][A | |
| 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 30/50 [00:43<00:30, 1.51s/it][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/50 [00:45<00:28, 1.51s/it][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/50 [00:46<00:27, 1.51s/it][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/50 [00:48<00:25, 1.52s/it][A | |
| 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/50 [00:50<00:24, 1.52s/it][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/50 [00:51<00:22, 1.51s/it][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/50 [00:53<00:21, 1.52s/it][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/50 [00:54<00:19, 1.52s/it][A | |
| 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/50 [00:56<00:18, 1.51s/it][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/50 [00:57<00:16, 1.51s/it][A | |
| 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/50 [00:59<00:15, 1.51s/it][A | |
| 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/50 [01:00<00:13, 1.52s/it][A | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/50 [01:02<00:12, 1.52s/it][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/50 [01:03<00:10, 1.52s/it][A | |
| 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/50 [01:05<00:09, 1.52s/it][A | |
| 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/50 [01:06<00:07, 1.51s/it][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/50 [01:08<00:06, 1.51s/it][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/50 [01:09<00:04, 1.51s/it][A | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/50 [01:11<00:03, 1.52s/it][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/50 [01:12<00:01, 1.53s/it][ATraceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-w4mb852g' | |
| Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-5tq6sxwg' | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.54s/it][A | |
| [A{'eval_loss': 0.8203887939453125, 'eval_runtime': 76.2715, 'eval_samples_per_second': 5.244, 'eval_steps_per_second': 0.656, 'eval_ppl': 2.27138, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 60.61, 'epoch': 0.26, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 26%|ββββββββββββββββββββββββ | 15/57 [12:23<27:39, 39.51s/it] | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.54s/it][A | |
| [A[2026-04-13 02:30:18,756] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-15 | |
| [2026-04-13 02:30:19,318] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. | |
| return func(*args, **kwargs) | |
| 28%|βββββββββββββββββββββββββ | 16/57 [13:04<42:50, 62.70s/it] {'loss': 6.134, 'grad_norm': 0.6509190797805786, 'learning_rate': 3.645967731787313e-05, 'ppl': 461.27759, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.28} | |
| 28%|βββββββββββββββββββββββββ | 16/57 [13:04<42:50, 62.70s/it] 30%|βββββββββββββββββββββββββββ | 17/57 [13:43<37:08, 55.72s/it] {'loss': 7.0539, 'grad_norm': 0.6493154764175415, 'learning_rate': 3.5743669612181004e-05, 'ppl': 1157.36367, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.3} | |
| 30%|βββββββββββββββββββββββββββ | 17/57 [13:43<37:08, 55.72s/it] 32%|ββββββββββββββββββββββββββββ | 18/57 [14:22<32:59, 50.75s/it] {'loss': 6.1112, 'grad_norm': 0.5940355658531189, 'learning_rate': 3.497021496342203e-05, 'ppl': 450.87945, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.32} | |
| 32%|ββββββββββββββββββββββββββββ | 18/57 [14:22<32:59, 50.75s/it] 33%|ββββββββββββββββββββββββββββββ | 19/57 [15:02<30:00, 47.37s/it] {'loss': 6.5471, 'grad_norm': 0.5027147531509399, 'learning_rate': 3.4142135623730954e-05, 'ppl': 697.2193, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.33} | |
| 33%|ββββββββββββββββββββββββββββββ | 19/57 [15:02<30:00, 47.37s/it] 35%|ββββββββββββββββββββββββββββββββ | 20/57 [15:42<27:47, 45.07s/it] {'loss': 6.3782, 'grad_norm': 0.5283752083778381, 'learning_rate': 3.326245316481591e-05, 'ppl': 588.86679, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.35} | |
| 35%|ββββββββββββββββββββββββββββββββ | 20/57 [15:42<27:47, 45.07s/it] 37%|βββββββββββββββββββββββββββββββββ | 21/57 [16:21<26:00, 43.36s/it] {'loss': 6.7009, 'grad_norm': 0.5615221858024597, 'learning_rate': 3.2334377452570866e-05, 'ppl': 813.13732, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.37} | |
| 37%|βββββββββββββββββββββββββββββββββ | 21/57 [16:21<26:00, 43.36s/it] 39%|βββββββββββββββββββββββββββββββββββ | 22/57 [17:00<24:35, 42.16s/it] {'loss': 5.7217, 'grad_norm': 0.537100613117218, 'learning_rate': 3.136129493462312e-05, 'ppl': 305.4237, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.39} | |
| 39%|βββββββββββββββββββββββββββββββββββ | 22/57 [17:00<24:35, 42.16s/it] 40%|ββββββββββββββββββββββββββββββββββββ | 23/57 [17:40<23:25, 41.33s/it] {'loss': 5.8115, 'grad_norm': 0.47874653339385986, 'learning_rate': 3.0346756283553138e-05, 'ppl': 334.11993, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.4} | |
| 40%|ββββββββββββββββββββββββββββββββββββ | 23/57 [17:40<23:25, 41.33s/it] 42%|ββββββββββββββββββββββββββββββββββββββ | 24/57 [18:19<22:28, 40.85s/it] {'loss': 6.0961, 'grad_norm': 0.43611156940460205, 'learning_rate': 2.9294463440875375e-05, 'ppl': 444.12231, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.42} | |
| 42%|ββββββββββββββββββββββββββββββββββββββ | 24/57 [18:19<22:28, 40.85s/it] 44%|βββββββββββββββββββββββββββββββββββββββ | 25/57 [18:59<21:33, 40.43s/it] {'loss': 6.2161, 'grad_norm': 0.43894726037979126, 'learning_rate': 2.820825610905514e-05, 'ppl': 500.74651, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.44} | |
| 44%|βββββββββββββββββββββββββββββββββββββββ | 25/57 [18:59<21:33, 40.43s/it] 46%|βββββββββββββββββββββββββββββββββββββββββ | 26/57 [19:38<20:43, 40.13s/it] {'loss': 6.3839, 'grad_norm': 0.46272194385528564, 'learning_rate': 2.7092097740850712e-05, 'ppl': 592.23292, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.46} | |
| 46%|βββββββββββββββββββββββββββββββββββββββββ | 26/57 [19:38<20:43, 40.13s/it] 47%|βββββββββββββββββββββββββββββββββββββββββββ | 27/57 [20:18<19:56, 39.90s/it] {'loss': 5.8903, 'grad_norm': 0.4597170650959015, 'learning_rate': 2.595006107710406e-05, 'ppl': 361.51372, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.47} | |
| 47%|βββββββββββββββββββββββββββββββββββββββββββ | 27/57 [20:18<19:56, 39.90s/it] 49%|ββββββββββββββββββββββββββββββββββββββββββββ | 28/57 [20:57<19:15, 39.84s/it] {'loss': 6.4172, 'grad_norm': 0.4441160261631012, 'learning_rate': 2.4786313285751158e-05, 'ppl': 612.28631, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.49} | |
| 49%|ββββββββββββββββββββββββββββββββββββββββββββ | 28/57 [20:57<19:15, 39.84s/it] 51%|ββββββββββββββββββββββββββββββββββββββββββββββ | 29/57 [21:37<18:34, 39.80s/it] {'loss': 7.06, 'grad_norm': 0.41745519638061523, 'learning_rate': 2.360510075627812e-05, 'ppl': 1164.44517, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.51} | |
| 51%|ββββββββββββββββββββββββββββββββββββββββββββββ | 29/57 [21:37<18:34, 39.80s/it] 53%|βββββββββββββββββββββββββββββββββββββββββββββββ | 30/57 [22:17<17:51, 39.70s/it] {'loss': 6.5415, 'grad_norm': 0.4563206434249878, 'learning_rate': 2.2410733605106462e-05, 'ppl': 693.32579, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.53} | |
| 53%|βββββββββββββββββββββββββββββββββββββββββββββββ | 30/57 [22:17<17:51, 39.70s/it][2026-04-13 02:40:11,875] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step... | |
| 0%| | 0/50 [00:00<?, ?it/s][A | |
| 4%|ββββ | 2/50 [00:01<00:35, 1.33it/s][A | |
| 6%|ββββββ | 3/50 [00:03<00:50, 1.06s/it][A | |
| 8%|ββββββββ | 4/50 [00:04<00:56, 1.23s/it][A | |
| 10%|βββββββββ | 5/50 [00:06<00:59, 1.33s/it][A | |
| 12%|βββββββββββ | 6/50 [00:07<01:01, 1.39s/it][A | |
| 14%|βββββββββββββ | 7/50 [00:09<01:01, 1.43s/it][A | |
| 16%|βββββββββββββββ | 8/50 [00:10<01:00, 1.45s/it][A | |
| 18%|βββββββββββββββββ | 9/50 [00:12<01:01, 1.50s/it][A | |
| 20%|ββββββββββββββββββ | 10/50 [00:13<01:00, 1.50s/it][A | |
| 22%|ββββββββββββββββββββ | 11/50 [00:15<00:58, 1.51s/it][A | |
| 24%|ββββββββββββββββββββββ | 12/50 [00:16<00:57, 1.51s/it][A | |
| 26%|ββββββββββββββββββββββββ | 13/50 [00:18<00:55, 1.51s/it][A | |
| 28%|βββββββββββββββββββββββββ | 14/50 [00:19<00:54, 1.51s/it][A | |
| 30%|βββββββββββββββββββββββββββ | 15/50 [00:21<00:52, 1.51s/it][A | |
| 32%|βββββββββββββββββββββββββββββ | 16/50 [00:22<00:51, 1.51s/it][A | |
| 34%|βββββββββββββββββββββββββββββββ | 17/50 [00:24<00:50, 1.52s/it][A | |
| 36%|ββββββββββββββββββββββββββββββββ | 18/50 [00:25<00:48, 1.52s/it][A | |
| 38%|ββββββββββββββββββββββββββββββββββ | 19/50 [00:27<00:46, 1.51s/it][A | |
| 40%|ββββββββββββββββββββββββββββββββββββ | 20/50 [00:28<00:45, 1.52s/it][A | |
| 42%|ββββββββββββββββββββββββββββββββββββββ | 21/50 [00:30<00:44, 1.52s/it][A | |
| 44%|ββββββββββββββββββββββββββββββββββββββββ | 22/50 [00:31<00:42, 1.52s/it][A | |
| 46%|βββββββββββββββββββββββββββββββββββββββββ | 23/50 [00:33<00:40, 1.51s/it][A | |
| 48%|βββββββββββββββββββββββββββββββββββββββββββ | 24/50 [00:34<00:39, 1.51s/it][A | |
| 50%|βββββββββββββββββββββββββββββββββββββββββββββ | 25/50 [00:36<00:38, 1.52s/it][A | |
| 52%|βββββββββββββββββββββββββββββββββββββββββββββββ | 26/50 [00:37<00:36, 1.53s/it][A | |
| 54%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 27/50 [00:39<00:34, 1.52s/it][A | |
| 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/50 [00:40<00:33, 1.52s/it][A | |
| 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/50 [00:42<00:31, 1.52s/it][A | |
| 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 30/50 [00:43<00:30, 1.52s/it][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/50 [00:45<00:28, 1.51s/it][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/50 [00:47<00:27, 1.51s/it][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/50 [00:48<00:25, 1.53s/it][A | |
| 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/50 [00:50<00:24, 1.52s/it][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/50 [00:51<00:22, 1.52s/it][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/50 [00:53<00:21, 1.52s/it][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/50 [00:54<00:19, 1.52s/it][A | |
| 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/50 [00:56<00:18, 1.52s/it][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/50 [00:57<00:16, 1.52s/it][A | |
| 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/50 [00:59<00:15, 1.51s/it][A | |
| 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/50 [01:00<00:13, 1.52s/it][A | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/50 [01:02<00:12, 1.52s/it][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/50 [01:03<00:10, 1.52s/it][A | |
| 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/50 [01:05<00:09, 1.52s/it][A | |
| 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/50 [01:06<00:07, 1.52s/it][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/50 [01:08<00:06, 1.52s/it][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/50 [01:09<00:04, 1.51s/it][A | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/50 [01:11<00:03, 1.52s/it][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/50 [01:12<00:01, 1.53s/it][ATraceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-sbrhfmc2' | |
| Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-2ddl4foh' | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.55s/it][A | |
| [A{'eval_loss': 0.7725631594657898, 'eval_runtime': 76.3564, 'eval_samples_per_second': 5.239, 'eval_steps_per_second': 0.655, 'eval_ppl': 2.16531, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 62.34, 'epoch': 0.53, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 53%|βββββββββββββββββββββββββββββββββββββββββββββββ | 30/57 [23:33<17:51, 39.70s/it] | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.55s/it][A | |
| [A[2026-04-13 02:41:28,242] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-30 | |
| [2026-04-13 02:41:28,776] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. | |
| return func(*args, **kwargs) | |
| 54%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 31/57 [24:13<27:12, 62.80s/it] {'loss': 5.4361, 'grad_norm': 0.3696332275867462, 'learning_rate': 2.1207569948445724e-05, 'ppl': 229.54521, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.54} | |
| 54%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 31/57 [24:13<27:12, 62.80s/it] 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/57 [24:53<23:16, 55.86s/it] {'loss': 6.3518, 'grad_norm': 0.3885038495063782, 'learning_rate': 2e-05, 'ppl': 573.52412, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.56} | |
| 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/57 [24:53<23:16, 55.86s/it] 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/57 [25:32<20:23, 50.98s/it] {'loss': 5.9664, 'grad_norm': 0.35446420311927795, 'learning_rate': 1.879243005155428e-05, 'ppl': 390.09878, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.58} | |
| 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/57 [25:32<20:23, 50.98s/it] 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/57 [26:12<18:14, 47.58s/it] {'loss': 6.0508, 'grad_norm': 0.3991977274417877, 'learning_rate': 1.758926639489354e-05, 'ppl': 424.45246, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.6} | |
| 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/57 [26:12<18:14, 47.58s/it] 61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/57 [26:51<16:32, 45.12s/it] {'loss': 6.5703, 'grad_norm': 0.3999829590320587, 'learning_rate': 1.6394899243721887e-05, 'ppl': 713.58389, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.61} | |
| 61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/57 [26:51<16:32, 45.12s/it] 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/57 [27:31<15:10, 43.37s/it] {'loss': 5.9417, 'grad_norm': 0.3948379158973694, 'learning_rate': 1.5213686714248852e-05, 'ppl': 380.58137, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.63} | |
| 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/57 [27:31<15:10, 43.37s/it] 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/57 [28:10<14:03, 42.19s/it] {'loss': 6.4769, 'grad_norm': 0.3924844264984131, 'learning_rate': 1.4049938922895945e-05, 'ppl': 649.95297, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.65} | |
| 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/57 [28:10<14:03, 42.19s/it] 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/57 [28:50<13:06, 41.41s/it] {'loss': 6.6282, 'grad_norm': 0.4368893504142761, 'learning_rate': 1.2907902259149287e-05, 'ppl': 756.11993, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.67} | |
| 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/57 [28:50<13:06, 41.41s/it] 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/57 [29:29<12:15, 40.87s/it] {'loss': 5.3714, 'grad_norm': 0.36707866191864014, 'learning_rate': 1.1791743890944869e-05, 'ppl': 215.16389, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.68} | |
| 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/57 [29:29<12:15, 40.87s/it] 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/57 [30:09<11:26, 40.39s/it] {'loss': 5.8953, 'grad_norm': 0.41011226177215576, 'learning_rate': 1.070553655912463e-05, 'ppl': 363.32582, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.7} | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/57 [30:09<11:26, 40.39s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/57 [30:48<10:42, 40.17s/it] {'loss': 6.8336, 'grad_norm': 0.3589271903038025, 'learning_rate': 9.653243716446862e-06, 'ppl': 928.5275, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.72} | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/57 [30:48<10:42, 40.17s/it] 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/57 [31:28<09:59, 39.97s/it] {'loss': 6.0634, 'grad_norm': 0.383444607257843, 'learning_rate': 8.638705065376887e-06, 'ppl': 429.83439, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.74} | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/57 [31:28<09:59, 39.97s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/57 [32:07<09:17, 39.84s/it] {'loss': 6.6285, 'grad_norm': 0.3992387652397156, 'learning_rate': 7.665622547429139e-06, 'ppl': 756.3468, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.75} | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/57 [32:07<09:17, 39.84s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/57 [32:47<08:36, 39.72s/it] {'loss': 5.9547, 'grad_norm': 0.42180073261260986, 'learning_rate': 6.737546835184101e-06, 'ppl': 385.56122, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.77} | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/57 [32:47<08:36, 39.72s/it] 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/57 [33:26<07:56, 39.70s/it] {'loss': 6.0193, 'grad_norm': 0.4660789668560028, 'learning_rate': 5.857864376269051e-06, 'ppl': 411.29059, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.79} | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/57 [33:26<07:56, 39.70s/it][2026-04-13 02:51:21,837] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:406990] Running evaluation step... | |
| 0%| | 0/50 [00:00<?, ?it/s][A | |
| 4%|ββββ | 2/50 [00:01<00:35, 1.34it/s][A | |
| 6%|ββββββ | 3/50 [00:03<00:50, 1.06s/it][A | |
| 8%|ββββββββ | 4/50 [00:04<00:56, 1.23s/it][A | |
| 10%|βββββββββ | 5/50 [00:06<01:00, 1.33s/it][A | |
| 12%|βββββββββββ | 6/50 [00:07<01:01, 1.39s/it][A | |
| 14%|βββββββββββββ | 7/50 [00:09<01:01, 1.43s/it][A | |
| 16%|βββββββββββββββ | 8/50 [00:10<01:00, 1.45s/it][A | |
| 18%|βββββββββββββββββ | 9/50 [00:12<01:03, 1.55s/it][A | |
| 20%|ββββββββββββββββββ | 10/50 [00:13<01:01, 1.55s/it][A | |
| 22%|ββββββββββββββββββββ | 11/50 [00:15<00:59, 1.54s/it][A | |
| 24%|ββββββββββββββββββββββ | 12/50 [00:16<00:58, 1.53s/it][A | |
| 26%|ββββββββββββββββββββββββ | 13/50 [00:18<00:56, 1.53s/it][A | |
| 28%|βββββββββββββββββββββββββ | 14/50 [00:19<00:54, 1.52s/it][A | |
| 30%|βββββββββββββββββββββββββββ | 15/50 [00:21<00:53, 1.52s/it][A | |
| 32%|βββββββββββββββββββββββββββββ | 16/50 [00:22<00:51, 1.51s/it][A | |
| 34%|βββββββββββββββββββββββββββββββ | 17/50 [00:24<00:50, 1.53s/it][A | |
| 36%|ββββββββββββββββββββββββββββββββ | 18/50 [00:25<00:48, 1.52s/it][A | |
| 38%|ββββββββββββββββββββββββββββββββββ | 19/50 [00:27<00:46, 1.52s/it][A | |
| 40%|ββββββββββββββββββββββββββββββββββββ | 20/50 [00:29<00:45, 1.52s/it][A | |
| 42%|ββββββββββββββββββββββββββββββββββββββ | 21/50 [00:30<00:44, 1.52s/it][A | |
| 44%|ββββββββββββββββββββββββββββββββββββββββ | 22/50 [00:32<00:42, 1.52s/it][A | |
| 46%|βββββββββββββββββββββββββββββββββββββββββ | 23/50 [00:33<00:40, 1.52s/it][A | |
| 48%|βββββββββββββββββββββββββββββββββββββββββββ | 24/50 [00:35<00:39, 1.51s/it][A | |
| 50%|βββββββββββββββββββββββββββββββββββββββββββββ | 25/50 [00:36<00:38, 1.52s/it][A | |
| 52%|βββββββββββββββββββββββββββββββββββββββββββββββ | 26/50 [00:38<00:36, 1.53s/it][A | |
| 54%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 27/50 [00:39<00:34, 1.52s/it][A | |
| 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 28/50 [00:41<00:33, 1.52s/it][A | |
| 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 29/50 [00:42<00:31, 1.51s/it][A | |
| 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 30/50 [00:44<00:30, 1.51s/it][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 31/50 [00:45<00:28, 1.51s/it][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 32/50 [00:47<00:27, 1.51s/it][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 33/50 [00:48<00:25, 1.53s/it][A | |
| 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 34/50 [00:50<00:24, 1.52s/it][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 35/50 [00:51<00:22, 1.52s/it][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 36/50 [00:53<00:21, 1.52s/it][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 37/50 [00:54<00:19, 1.52s/it][A | |
| 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 38/50 [00:56<00:18, 1.52s/it][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 39/50 [00:57<00:16, 1.51s/it][A | |
| 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 40/50 [00:59<00:15, 1.51s/it][A | |
| 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 41/50 [01:00<00:13, 1.52s/it][A | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 42/50 [01:02<00:12, 1.52s/it][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 43/50 [01:03<00:10, 1.52s/it][A | |
| 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 44/50 [01:05<00:09, 1.52s/it][A | |
| 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/50 [01:06<00:07, 1.51s/it][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/50 [01:08<00:06, 1.52s/it][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/50 [01:09<00:04, 1.51s/it][A | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/50 [01:11<00:03, 1.52s/it][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/50 [01:13<00:01, 1.53s/it][ATraceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-ci_q2hzt' | |
| Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-4chbls79' | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.56s/it][A | |
| [A{'eval_loss': 0.7636247277259827, 'eval_runtime': 76.556, 'eval_samples_per_second': 5.225, 'eval_steps_per_second': 0.653, 'eval_ppl': 2.14604, 'memory/max_active (GiB)': 42.65, 'memory/max_allocated (GiB)': 42.65, 'memory/device_reserved (GiB)': 62.34, 'epoch': 0.79, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 45/57 [34:43<07:56, 39.70s/it] | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [01:14<00:00, 1.56s/it][A | |
| [A[2026-04-13 02:52:38,402] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-45 | |
| [2026-04-13 02:52:39,006] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. | |
| return func(*args, **kwargs) | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/57 [35:23<11:31, 62.84s/it] {'loss': 6.1743, 'grad_norm': 0.419697105884552, 'learning_rate': 5.029785036577976e-06, 'ppl': 480.24673, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.81} | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 46/57 [35:23<11:31, 62.84s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/57 [36:03<09:18, 55.81s/it] {'loss': 6.3119, 'grad_norm': 0.3812112510204315, 'learning_rate': 4.256330387818999e-06, 'ppl': 551.09103, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.82} | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 47/57 [36:03<09:18, 55.81s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/57 [36:42<07:37, 50.88s/it] {'loss': 6.1627, 'grad_norm': 0.45529961585998535, 'learning_rate': 3.5403226821268734e-06, 'ppl': 474.70806, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.84} | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 48/57 [36:42<07:37, 50.88s/it] 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/57 [37:22<06:19, 47.46s/it] {'loss': 6.3451, 'grad_norm': 0.3951869010925293, 'learning_rate': 2.8843745539710523e-06, 'ppl': 569.69436, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.86} | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 49/57 [37:22<06:19, 47.46s/it] 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 50/57 [38:01<05:15, 45.05s/it] {'loss': 5.8601, 'grad_norm': 0.40012675523757935, 'learning_rate': 2.2908794869358044e-06, 'ppl': 350.75922, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.88} | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 50/57 [38:01<05:15, 45.05s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 51/57 [38:40<04:20, 43.35s/it] {'loss': 6.1964, 'grad_norm': 0.3871590495109558, 'learning_rate': 1.7620030801581988e-06, 'ppl': 490.97833, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.89} | |
| 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 51/57 [38:40<04:20, 43.35s/it] 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 52/57 [39:20<03:30, 42.19s/it] {'loss': 5.8941, 'grad_norm': 0.33590954542160034, 'learning_rate': 1.2996751462917057e-06, 'ppl': 362.89009, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.91} | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 52/57 [39:20<03:30, 42.19s/it] 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 53/57 [40:00<02:45, 41.44s/it] {'loss': 6.1674, 'grad_norm': 0.3889252543449402, 'learning_rate': 9.055826698290881e-07, 'ppl': 476.94444, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.93} | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 53/57 [40:00<02:45, 41.44s/it] 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 54/57 [40:39<02:02, 40.85s/it] {'loss': 5.5078, 'grad_norm': 0.3783589005470276, 'learning_rate': 5.811636514789598e-07, 'ppl': 246.60799, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.95} | |
| 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 54/57 [40:39<02:02, 40.85s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 55/57 [41:18<01:20, 40.35s/it] {'loss': 5.9907, 'grad_norm': 0.496366024017334, 'learning_rate': 3.2760186105712964e-07, 'ppl': 399.6943, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.96} | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 55/57 [41:18<01:20, 40.35s/it] 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 56/57 [41:58<00:40, 40.07s/it] {'loss': 5.8207, 'grad_norm': 0.40123143792152405, 'learning_rate': 1.4582251803892055e-07, 'ppl': 337.20802, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 0.98} | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 56/57 [41:58<00:40, 40.07s/it]Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-xjzy2ttf' | |
| Traceback (most recent call last): | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 303, in _run_finalizers | |
| finalizer() | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 227, in __call__ | |
| res = self._callback(*self._args, **self._kwargs) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/util.py", line 136, in _remove_temp_dir | |
| rmtree(tempdir, onerror=onerror) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 759, in rmtree | |
| _rmtree_safe_fd(stack, onexc) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd | |
| onexc(func, path, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd | |
| onexc(os.unlink, fullname, err) | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 750, in onexc | |
| return onerror(func, path, exc_info) | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| File "/u901/t577wang/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd | |
| os.unlink(entry.name, dir_fd=topfd) | |
| OSError: [Errno 16] Device or resource busy: '/u901/t577wang/.cache/tmp/pymp-zeou0am4' | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [42:37<00:00, 39.92s/it] {'loss': 6.1692, 'grad_norm': 0.4002166986465454, 'learning_rate': 3.648891553365008e-08, 'ppl': 477.80371, 'memory/max_active (GiB)': 46.09, 'memory/max_allocated (GiB)': 46.09, 'memory/device_reserved (GiB)': 62.34, 'tokens/train_per_sec_per_gpu': 0.0, 'tokens/total': 0, 'tokens/trainable': 0, 'epoch': 1.0} | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [42:37<00:00, 39.92s/it][2026-04-13 03:00:32,592] [INFO] [axolotl.core.trainers.base._save:721] [PID:406990] Saving model checkpoint to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2/checkpoint-57 | |
| [2026-04-13 03:00:33,231] [WARNING] [py.warnings._showwarnmsg:112] [PID:406990] /u901/t577wang/SecSteer/.venv/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. | |
| return func(*args, **kwargs) | |
| {'train_runtime': 2560.8721, 'train_samples_per_second': 1.425, 'train_steps_per_second': 0.022, 'train_loss': 6.388372555113675, 'memory/max_active (GiB)': 14.96, 'memory/max_allocated (GiB)': 14.96, 'memory/device_reserved (GiB)': 62.34, 'epoch': 1.0, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [42:38<00:00, 39.92s/it] 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [42:38<00:00, 44.89s/it] | |
| [2026-04-13 03:00:33,786] [INFO] [axolotl.train.save_trained_model:233] [PID:406990] Training completed! Saving trained model to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2. | |
| [2026-04-13 03:00:34,238] [INFO] [axolotl.train.save_trained_model:351] [PID:406990] Model successfully saved to /u901/t577wang/SecSteer/axolotl-outputs/lora/Qwen2.5-Coder-7B-sft-plus-alpha-2-token-diff-ctx3-v2 | |