sam25kat's picture
Upload sample_run.log with huggingface_hub
08039c3 verified
============================================================
SecureReview SFT Training
Model : unsloth/Qwen2.5-7B-Instruct-bnb-4bit
Task : migration_review
Epochs: 3
============================================================
[1/6] Checking environment connection...
Health: {'status': 'healthy'}
[2/6] Loading model...
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2026.4.8: Fast Qwen2 patching. Transformers: 5.5.0.
\\ /| NVIDIA L40S. Num GPUs = 1. Max memory: 44.392 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.10.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.6.0
\ / Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
model.safetensors: 0%| | 0.00/5.55G [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/5.55G [00:01<?, ?B/s]
model.safetensors: 20%|█▉ | 1.09G/5.55G [00:02<00:04, 942MB/s]
model.safetensors: 44%|████▍ | 2.46G/5.55G [00:03<00:02, 1.06GB/s]
model.safetensors: 66%|██████▌ | 3.67G/5.55G [00:04<00:01, 964MB/s] 
model.safetensors: 93%|█████████▎| 5.14G/5.55G [00:06<00:00, 942MB/s]
model.safetensors: 100%|██████████| 5.55G/5.55G [00:07<00:00, 790MB/s]
Loading weights: 0%| | 0/339 [00:00<?, ?it/s]
Loading weights: 100%|██████████| 339/339 [00:00<00:00, 523.44it/s]
generation_config.json: 0%| | 0.00/271 [00:00<?, ?B/s]
generation_config.json: 100%|██████████| 271/271 [00:00<00:00, 1.42MB/s]
tokenizer_config.json: 0%| | 0.00/7.36k [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 7.36k/7.36k [00:00<00:00, 40.9MB/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
vocab.json: 100%|██████████| 2.78M/2.78M [00:00<00:00, 75.9MB/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
merges.txt: 100%|██████████| 1.67M/1.67M [00:00<00:00, 69.0MB/s]
tokenizer.json: 0%| | 0.00/11.4M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 11.4M/11.4M [00:00<00:00, 54.5MB/s]
added_tokens.json: 0%| | 0.00/605 [00:00<?, ?B/s]
added_tokens.json: 100%|██████████| 605/605 [00:00<00:00, 3.87MB/s]
special_tokens_map.json: 0%| | 0.00/614 [00:00<?, ?B/s]
special_tokens_map.json: 100%|██████████| 614/614 [00:00<00:00, 4.47MB/s]
unsloth/Qwen2.5-7B-Instruct-bnb-4bit does not have a padding token! Will use pad_token = <|PAD_TOKEN|>.
Unsloth 2026.4.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
trainable params: 40,370,176 || all params: 7,655,986,688 || trainable%: 0.5273
[3/6] Building SFT dataset from ground-truth findings...
ground_truth.json: 0%| | 0.00/2.82k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 2.82k/2.82k [00:00<00:00, 14.2MB/s]
Loaded migration_002 (4 findings)
ground_truth.json: 0%| | 0.00/3.64k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 3.64k/3.64k [00:00<00:00, 26.0MB/s]
Loaded migration_006 (6 findings)
ground_truth.json: 0%| | 0.00/3.17k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 3.17k/3.17k [00:00<00:00, 22.2MB/s]
Loaded migration_007 (5 findings)
ground_truth.json: 0%| | 0.00/3.39k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 3.39k/3.39k [00:00<00:00, 25.9MB/s]
Loaded migration_009 (5 findings)
ground_truth.json: 0%| | 0.00/3.76k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 3.76k/3.76k [00:00<00:00, 29.5MB/s]
Loaded migration_012 (6 findings)
ground_truth.json: 0%| | 0.00/4.35k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.35k/4.35k [00:00<00:00, 28.9MB/s]
Loaded migration_017 (6 findings)
ground_truth.json: 0%| | 0.00/4.44k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.44k/4.44k [00:00<00:00, 28.1MB/s]
Loaded migration_018 (6 findings)
ground_truth.json: 0%| | 0.00/3.98k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 3.98k/3.98k [00:00<00:00, 28.0MB/s]
Loaded migration_022 (6 findings)
ground_truth.json: 0%| | 0.00/4.05k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.05k/4.05k [00:00<00:00, 33.1MB/s]
Loaded migration_023 (6 findings)
ground_truth.json: 0%| | 0.00/4.49k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.49k/4.49k [00:00<00:00, 29.5MB/s]
Loaded migration_024 (6 findings)
ground_truth.json: 0%| | 0.00/4.20k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.20k/4.20k [00:00<00:00, 29.4MB/s]
Loaded migration_025 (6 findings)
ground_truth.json: 0%| | 0.00/4.00k [00:00<?, ?B/s]
ground_truth.json: 100%|██████████| 4.00k/4.00k [00:00<00:00, 28.7MB/s]
Loaded migration_028 (6 findings)
Dataset: 12 examples
[4/6] Baseline evaluation (before SFT)...
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:71: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:281: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:71: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:281: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
[before] migration_002: 0.300
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:71: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:281: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
[before] migration_006: 0.520
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_007: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_009: 0.260
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_012: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_017: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_018: 0.290
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_022: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_023: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_024: 0.280
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_025: 0.060
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[before] migration_028: 0.030
Baseline mean: 0.170
[5/6] SFT training...
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
/app/unsloth_compiled_cache/UnslothSFTTrainer.py:915: UserWarning: Padding-free training is enabled, but the attention implementation is not set to 'flash_attention_2'. Padding-free training flattens batches into a single sequence, and 'flash_attention_2' is the only known attention mechanism that reliably supports this. Using other implementations may lead to unexpected behavior. To ensure compatibility, set `attn_implementation='flash_attention_2'` in the model configuration, or verify that your attention mechanism can handle flattened sequences.
warnings.warn(
Unsloth: Tokenizing ["text"] (num_proc=12): 0%| | 0/12 [00:00<?, ? examples/s]
Unsloth: Tokenizing ["text"] (num_proc=12): 8%|▊ | 1/12 [00:02<00:23, 2.17s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 17%|█▋ | 2/12 [00:03<00:16, 1.69s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 25%|██▌ | 3/12 [00:04<00:13, 1.52s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 33%|███▎ | 4/12 [00:06<00:11, 1.46s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 42%|████▏ | 5/12 [00:07<00:09, 1.40s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 50%|█████ | 6/12 [00:08<00:08, 1.38s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 58%|█████▊ | 7/12 [00:10<00:06, 1.37s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 67%|██████▋ | 8/12 [00:11<00:05, 1.37s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 75%|███████▌ | 9/12 [00:12<00:04, 1.36s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 83%|████████▎ | 10/12 [00:14<00:02, 1.35s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 92%|█████████▏| 11/12 [00:15<00:01, 1.34s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 100%|██████████| 12/12 [00:16<00:00, 1.33s/ examples]
Unsloth: Tokenizing ["text"] (num_proc=12): 100%|██████████| 12/12 [00:17<00:00, 1.43s/ examples]
🦥 Unsloth: Padding-free auto-enabled, enabling faster training.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 12 | Num Epochs = 3 | Total steps = 18
O^O/ \_/ \ Batch size per device = 1 | Gradient accumulation steps = 2
\ / Data Parallel GPUs = 1 | Total batch size (1 x 2 x 1) = 2
"-____-" Trainable parameters = 40,370,176 of 7,655,986,688 (0.53% trained)
0%| | 0/18 [00:00<?, ?it/s]`use_return_dict` is deprecated! Use `return_dict` instead!
6%|▌ | 1/18 [00:03<01:07, 3.98s/it]
{'loss': '1.994', 'grad_norm': '0.5622', 'learning_rate': '5e-05', 'epoch': '0.3333'}
11%|█ | 2/18 [00:04<01:03, 3.98s/it]
17%|█▋ | 3/18 [00:05<00:26, 1.75s/it]
22%|██▏ | 4/18 [00:07<00:21, 1.54s/it]
{'loss': '1.965', 'grad_norm': '0.4948', 'learning_rate': '4.831e-05', 'epoch': '0.6667'}
22%|██▏ | 4/18 [00:07<00:21, 1.54s/it]
28%|██▊ | 5/18 [00:08<00:17, 1.36s/it]
{'loss': '1.788', 'grad_norm': '0.469', 'learning_rate': '4.348e-05', 'epoch': '1'}
33%|███▎ | 6/18 [00:09<00:16, 1.36s/it]
39%|███▉ | 7/18 [00:10<00:13, 1.20s/it]
{'loss': '1.795', 'grad_norm': '0.4098', 'learning_rate': '3.614e-05', 'epoch': '1.333'}
44%|████▍ | 8/18 [00:11<00:11, 1.20s/it]
50%|█████ | 9/18 [00:11<00:09, 1.07s/it]
56%|█████▌ | 10/18 [00:12<00:08, 1.07s/it]
{'loss': '1.72', 'grad_norm': '0.37', 'learning_rate': '2.731e-05', 'epoch': '1.667'}
56%|█████▌ | 10/18 [00:12<00:08, 1.07s/it]
67%|██████▋ | 12/18 [00:14<00:06, 1.04s/it]
{'loss': '1.659', 'grad_norm': '0.3249', 'learning_rate': '1.816e-05', 'epoch': '2'}
67%|██████▋ | 12/18 [00:14<00:06, 1.04s/it]
72%|███████▏ | 13/18 [00:16<00:05, 1.04s/it]
78%|███████▊ | 14/18 [00:17<00:04, 1.03s/it]
{'loss': '1.68', 'grad_norm': '0.3444', 'learning_rate': '9.934e-06', 'epoch': '2.333'}
78%|███████▊ | 14/18 [00:17<00:04, 1.03s/it]
89%|████████▉ | 16/18 [00:18<00:02, 1.02s/it]
{'loss': '1.66', 'grad_norm': '0.2995', 'learning_rate': '3.745e-06', 'epoch': '2.667'}
89%|████████▉ | 16/18 [00:18<00:02, 1.02s/it]
100%|██████████| 18/18 [00:20<00:00, 1.02it/s]
{'loss': '1.625', 'grad_norm': '0.3541', 'learning_rate': '4.257e-07', 'epoch': '3'}
100%|██████████| 18/18 [00:20<00:00, 1.02it/s]Unsloth: Restored added_tokens_decoder metadata in ./securereview-sft/checkpoint-18/tokenizer_config.json.
{'train_runtime': '21.32', 'train_samples_per_second': '1.688', 'train_steps_per_second': '0.844', 'train_loss': '1.765', 'epoch': '3'}
100%|██████████| 18/18 [00:21<00:00, 1.02it/s]
100%|██████████| 18/18 [00:21<00:00, 1.18s/it]
[6/6] Post-SFT evaluation...
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:71: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
/root/.pyenv/versions/3.13.13/lib/python3.13/site-packages/transformers/modeling_attn_mask_utils.py:281: FutureWarning: The attention mask API under `transformers.modeling_attn_mask_utils` (`AttentionMaskConverter`) is deprecated and will be removed in Transformers v5.10. Please use the new API in `transformers.masking_utils`.
warnings.warn(DEPRECATION_MESSAGE, FutureWarning)
[after] migration_002: 0.300
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_006: 0.640
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_007: 0.610
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_009: 0.200
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_012: 0.470
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_017: 0.520
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_018: 0.520
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_022: 0.440
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_023: 0.440
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_024: 0.330
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_025: 0.640
Both `max_new_tokens` (=600) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
[after] migration_028: 0.470
Trained mean: 0.465
=== Improvement Summary ===
migration_002: 0.300 0.300 +0.000
migration_006: 0.520 0.640 +0.120
migration_007: 0.060 0.610 +0.550
migration_009: 0.260 0.200 -0.060
migration_012: 0.060 0.470 +0.410
migration_017: 0.060 0.520 +0.460
migration_018: 0.290 0.520 +0.230
migration_022: 0.060 0.440 +0.380
migration_023: 0.060 0.440 +0.380
migration_024: 0.280 0.330 +0.050
migration_025: 0.060 0.640 +0.580
migration_028: 0.030 0.470 +0.440
Saved ./plots/reward_curve.png
Saved ./plots/before_after.png
============================================================
DONE Mean 0.170 0.465
============================================================