| Welcome to your vast.ai container! This session is running in `tmux`. |
| To disconnect without closing your processes, press ctrl+b, release, then d. |
| To disable auto-tmux, run `touch ~/.no_auto_tmux` and reconnect. See also https://tmuxcheatsheet.com/ |
| Activated conda/uv virtual environment at /venv/main |
| (main) root@C.31890549:/workspace$ cd p-vector-LFM2.5/ |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | flash |
| -bash: flash: command not found |
| Using Python 3.12.12 environment at: /venv/main |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | grep flash |
| Using Python 3.12.12 environment at: /venv/main |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | grep tor |
| Using Python 3.12.12 environment at: /venv/main |
| decorator==5.2.1 |
| torch==2.10.0+cu130 |
| torchaudio==2.10.0+cu130 |
| torchcodec==0.10.0 |
| torchdata==0.10.0 |
| torchtext==0.6.0 |
| torchvision==0.25.0+cu130 |
| tornado==6.5.4 |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip uninstall torch |
| Using Python 3.12.12 environment at: /venv/main |
| Uninstalled 1 package in 1.02s |
| - torch==2.10.0+cu130 |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install torch==2.8 |
| Using Python 3.12.12 environment at: /venv/main |
| Resolved 25 packages in 232ms |
| Prepared 16 packages in 41.44s |
| Uninstalled 1 package in 77ms |
| Installed 16 packages in 2.61s |
| + nvidia-cublas-cu12==12.8.4.1 |
| + nvidia-cuda-cupti-cu12==12.8.90 |
| + nvidia-cuda-nvrtc-cu12==12.8.93 |
| + nvidia-cuda-runtime-cu12==12.8.90 |
| + nvidia-cudnn-cu12==9.10.2.21 |
| + nvidia-cufft-cu12==11.3.3.83 |
| + nvidia-cufile-cu12==1.13.1.3 |
| + nvidia-curand-cu12==10.3.9.90 |
| + nvidia-cusolver-cu12==11.7.3.90 |
| + nvidia-cusparse-cu12==12.5.8.93 |
| + nvidia-cusparselt-cu12==0.7.1 |
| + nvidia-nccl-cu12==2.27.3 |
| + nvidia-nvjitlink-cu12==12.8.93 |
| + nvidia-nvtx-cu12==12.8.90 |
| + torch==2.8.0 |
| - triton==3.6.0 |
| + triton==3.4.0 |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install flash-attn |
| Using Python 3.12.12 environment at: /venv/main |
| Resolved 27 packages in 1.84s |
| Γ Failed to build `flash-attn==2.8.3` |
| βββΆ The build backend returned an error |
| β°ββΆ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1) |
|
|
| [stderr] |
| /tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/_vendor/wheel/bdist_wheel.py:4: FutureWarning: The 'wheel' package is no |
| longer the canonical location of the 'bdist_wheel' command, and will be removed in a future release. Please update to setuptools v70.1 or later which |
| contains an integrated version of this command. |
| warn( |
| Traceback (most recent call last): |
| File "<string>", line 14, in <module> |
| File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel |
| return self._get_build_requires(config_settings, requirements=[]) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires |
| self.run_setup() |
| File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 520, in run_setup |
| super().run_setup(setup_script=setup_script) |
| File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup |
| exec(code, locals()) |
| File "<string>", line 22, in <module> |
| ModuleNotFoundError: No module named 'torch' |
|
|
| hint: This error likely indicates that `flash-attn@2.8.3` depends on `torch`, but doesn't declare it as a build dependency. If `flash-attn` is a |
| first-party package, consider adding `torch` to its `build-system.requires`. Otherwise, either add it to your `pyproject.toml` under: |
|
|
| [tool.uv.extra-build-dependencies] |
| flash-attn = ["torch"] |
|
|
| or `uv pip install torch` into the environment and re-run with `--no-build-isolation`. |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install flash-attn --no-build-isolation |
| Using Python 3.12.12 environment at: /venv/main |
| Resolved 27 packages in 1.89s |
| Built flash-attn==2.8.3 |
| Prepared 2 packages in 45.17s |
| Installed 2 packages in 824ms |
| + einops==0.8.2 |
| + flash-attn==2.8.3 |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ nano scripts/run_dpo_train.sh |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ ./scripts/run_dpo_train.sh --batch_size 4 --grad_accum 4 |
| ======================================== |
| DPO Full Fine-Tuning |
| ======================================== |
| Model : LiquidAI/LFM2.5-1.2B-Instruct |
| Dataset : argilla/distilabel-math-preference-dpo |
| Epochs : 3 |
| Batch size : 1 (grad_accum=16, eff=16) |
| Learning rate : 5e-7 |
| DPO beta : 0.1 |
| Reference : NF4 4-bit (pass --no_ref_4bit for bfloat16) |
| Output dir : models |
| ======================================== |
|
|
| [dpo_train] Run : dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126 |
| [dpo_train] Output : models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126 |
| [dpo_train] Loading dataset: argilla/distilabel-math-preference-dpo split=train |
| README.md: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 815/815 [00:00<00:00, 1.66MB/s] |
| data/train-00000-of-00001-f59ecdcaca8c1d(β¦): 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.86M/2.86M [00:01<00:00, 1.72MB/s] |
| Generating train split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2418/2418 [00:00<00:00, 62863.86 examples/s] |
| [dpo_train] Full size : 2,418 rows | columns: ['metadata', 'instruction', 'chosen_response', 'chosen_rating', 'rejected_response', 'rejected_rating'] |
| [dpo_train] Columns : instruction='instruction' chosen='chosen_response' rejected='rejected_response' |
| Map: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2418/2418 [00:00<00:00, 6660.14 examples/s] |
| Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2418/2418 [00:00<00:00, 42641.74 examples/s] |
| [dpo_train] After cleaning: 2,418 rows |
| [dpo_train] Train: 2,297 Eval: 121 |
| config.json: 1.22kB [00:00, 3.98MB/s] |
| tokenizer_config.json: 92.2kB [00:00, 121MB/s] |
| tokenizer.json: 4.73MB [00:00, 24.8MB/s] |
| special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 434/434 [00:00<00:00, 2.07MB/s] |
| chat_template.jinja: 1.78kB [00:00, 5.01MB/s] |
| [dpo_train] Loading policy model (bfloat16, trainable) β¦ |
| `torch_dtype` is deprecated! Use `dtype` instead! |
| model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.34G/2.34G [00:36<00:00, 64.2MB/s] |
| Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 148/148 [00:00<00:00, 243.39it/s, Materializing param=model.layers.15.operator_norm.weight] |
| generation_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 132/132 [00:00<00:00, 395kB/s] |
| [dpo_train] Loading reference model (bfloat16, frozen) β¦ |
| Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 148/148 [00:00<00:00, 220.72it/s, Materializing param=model.layers.15.operator_norm.weight] |
| [dpo_train] Policy params : 1170M (all trainable) |
| /workspace/p-vector-LFM2.5/src/dpo_train.py:241: FutureWarning: `max_prompt_length` is deprecated and will be removed in version 0.29.0. We recommend filtering o |
| ut overlong prompts from your dataset before passing it to the trainer instead of using this parameter. |
| dpo_config = DPOConfig( |
| warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead. |
| Extracting prompt in train dataset: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2297/2297 [00:00<00:00, 5125.14 examples/s] |
| Applying chat template to train dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2297/2297 [00:00<00:00, 3022.28 examples/s] |
| Tokenizing train dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2297/2297 [00:03<00:00, 737.08 examples/s] |
| Extracting prompt in eval dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 121/121 [00:00<00:00, 4787.21 examples/s] |
| Applying chat template to eval dataset: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 121/121 [00:00<00:00, 2860.94 examples/s] |
| Tokenizing eval dataset: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 121/121 [00:00<00:00, 737.29 examples/s] |
|
|
| [dpo_train] Starting DPO full fine-tuning (epochs=3 eff_batch=16) β¦ |
|
|
| {'loss': '0.6927', 'grad_norm': '44.25', 'learning_rate': '1.023e-07', 'rewards/chosen': '0.01433', 'rewards/rejected': '0.01006', 'rewards/accuracies': '0.4187' |
| , 'rewards/margins': '0.004265', 'logps/chosen': '-332.1', 'logps/rejected': '-332.9', 'logits/chosen': '-1.06', 'logits/rejected': '-1.042', 'epoch': '0.06957'} |
| {'loss': '0.7004', 'grad_norm': '50.5', 'learning_rate': '2.159e-07', 'rewards/chosen': '0.004466', 'rewards/rejected': '0.01571', 'rewards/accuracies': '0.45', |
| 'rewards/margins': '-0.01124', 'logps/chosen': '-329.2', 'logps/rejected': '-312.3', 'logits/chosen': '-1.098', 'logits/rejected': '-1.097', 'epoch': '0.1391'} |
| {'loss': '0.7032', 'grad_norm': '54.75', 'learning_rate': '3.295e-07', 'rewards/chosen': '0.03125', 'rewards/rejected': '0.04735', 'rewards/accuracies': '0.4313' |
| , 'rewards/margins': '-0.0161', 'logps/chosen': '-328.1', 'logps/rejected': '-305.9', 'logits/chosen': '-1.137', 'logits/rejected': '-1.171', 'epoch': '0.2087'} |
| {'loss': '0.6893', 'grad_norm': '50.5', 'learning_rate': '4.432e-07', 'rewards/chosen': '0.09773', 'rewards/rejected': '0.08618', 'rewards/accuracies': '0.4938', |
| 'rewards/margins': '0.01155', 'logps/chosen': '-338.8', 'logps/rejected': '-328.1', 'logits/chosen': '-1.089', 'logits/rejected': '-1.102', 'epoch': '0.2783'} |
| {'loss': '0.6982', 'grad_norm': '53.5', 'learning_rate': '4.998e-07', 'rewards/chosen': '0.1568', 'rewards/rejected': '0.161', 'rewards/accuracies': '0.45', 'rew |
| ards/margins': '-0.004219', 'logps/chosen': '-329.1', 'logps/rejected': '-323.8', 'logits/chosen': '-1.137', 'logits/rejected': '-1.147', 'epoch': '0.3478'} |
| {'loss': '0.6905', 'grad_norm': '64', 'learning_rate': '4.982e-07', 'rewards/chosen': '0.1544', 'rewards/rejected': '0.1441', 'rewards/accuracies': '0.5312', 're |
| wards/margins': '0.01032', 'logps/chosen': '-335.6', 'logps/rejected': '-329', 'logits/chosen': '-1.09', 'logits/rejected': '-1.07', 'epoch': '0.4174'} |
| {'loss': '0.7009', 'grad_norm': '72.5', 'learning_rate': '4.949e-07', 'rewards/chosen': '0.184', 'rewards/rejected': '0.194', 'rewards/accuracies': '0.475', 'rew |
| ards/margins': '-0.01001', 'logps/chosen': '-319.4', 'logps/rejected': '-317.8', 'logits/chosen': '-1.122', 'logits/rejected': '-1.114', 'epoch': '0.487'} |
| {'loss': '0.6878', 'grad_norm': '52.5', 'learning_rate': '4.9e-07', 'rewards/chosen': '0.1901', 'rewards/rejected': '0.1733', 'rewards/accuracies': '0.475', 'rew |
| ards/margins': '0.01679', 'logps/chosen': '-314.5', 'logps/rejected': '-311.9', 'logits/chosen': '-1.116', 'logits/rejected': '-1.108', 'epoch': '0.5565'} |
| {'loss': '0.6957', 'grad_norm': '47.75', 'learning_rate': '4.836e-07', 'rewards/chosen': '0.2123', 'rewards/rejected': '0.2113', 'rewards/accuracies': '0.4375', |
| 'rewards/margins': '0.0009234', 'logps/chosen': '-336.4', 'logps/rejected': '-332', 'logits/chosen': '-1.074', 'logits/rejected': '-1.058', 'epoch': '0.6261'} |
| {'loss': '0.6817', 'grad_norm': '49', 'learning_rate': '4.756e-07', 'rewards/chosen': '0.2641', 'rewards/rejected': '0.2347', 'rewards/accuracies': '0.5813', 're |
| wards/margins': '0.02945', 'logps/chosen': '-330.8', 'logps/rejected': '-323.5', 'logits/chosen': '-1.046', 'logits/rejected': '-1.038', 'epoch': '0.6957'} |
| {'eval_loss': '0.6925', 'eval_runtime': '11.31', 'eval_samples_per_second': '10.7', 'eval_steps_per_second': '2.741', 'eval_rewards/chosen': '0.2332', 'eval_rewa |
| rds/rejected': '0.2294', 'eval_rewards/accuracies': '0.4839', 'eval_rewards/margins': '0.003864', 'eval_logps/chosen': '-317.1', 'eval_logps/rejected': '-318.1', |
| 'eval_logits/chosen': '-1.071', 'eval_logits/rejected': '-1.092', 'epoch': '0.6957'} |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.36s/it] |
| {'loss': '0.6985', 'grad_norm': '47', 'learning_rate': '4.662e-07', 'rewards/chosen': '0.2238', 'rewards/rejected': '0.2288', 'rewards/accuracies': '0.4875', 're |
| wards/margins': '-0.005009', 'logps/chosen': '-315', 'logps/rejected': '-308.4', 'logits/chosen': '-1.074', 'logits/rejected': '-1.078', 'epoch': '0.7652'} |
| {'loss': '0.681', 'grad_norm': '51.25', 'learning_rate': '4.553e-07', 'rewards/chosen': '0.3184', 'rewards/rejected': '0.288', 'rewards/accuracies': '0.5938', 'r |
| ewards/margins': '0.03039', 'logps/chosen': '-332.7', 'logps/rejected': '-320', 'logits/chosen': '-1.049', 'logits/rejected': '-1.044', 'epoch': '0.8348'} |
| {'loss': '0.6885', 'grad_norm': '57.5', 'learning_rate': '4.431e-07', 'rewards/chosen': '0.3165', 'rewards/rejected': '0.3008', 'rewards/accuracies': '0.5188', ' |
| rewards/margins': '0.01573', 'logps/chosen': '-333.5', 'logps/rejected': '-321.6', 'logits/chosen': '-1.093', 'logits/rejected': '-1.091', 'epoch': '0.9043'} |
| {'loss': '0.693', 'grad_norm': '50.75', 'learning_rate': '4.296e-07', 'rewards/chosen': '0.3366', 'rewards/rejected': '0.328', 'rewards/accuracies': '0.5', 'rewa |
| rds/margins': '0.008592', 'logps/chosen': '-336.9', 'logps/rejected': '-312.6', 'logits/chosen': '-1.07', 'logits/rejected': '-1.086', 'epoch': '0.9739'} |
| {'loss': '0.6797', 'grad_norm': '58.25', 'learning_rate': '4.15e-07', 'rewards/chosen': '0.3648', 'rewards/rejected': '0.3353', 'rewards/accuracies': '0.5321', ' |
| rewards/margins': '0.02947', 'logps/chosen': '-297.7', 'logps/rejected': '-304.8', 'logits/chosen': '-1.138', 'logits/rejected': '-1.139', 'epoch': '1.042'} |
| {'loss': '0.6664', 'grad_norm': '62', 'learning_rate': '3.992e-07', 'rewards/chosen': '0.3742', 'rewards/rejected': '0.3119', 'rewards/accuracies': '0.6313', 're |
| wards/margins': '0.06225', 'logps/chosen': '-325.7', 'logps/rejected': '-314.6', 'logits/chosen': '-1.076', 'logits/rejected': '-1.074', 'epoch': '1.111'} |
| {'loss': '0.6656', 'grad_norm': '48.25', 'learning_rate': '3.825e-07', 'rewards/chosen': '0.3955', 'rewards/rejected': '0.3318', 'rewards/accuracies': '0.6438', |
| 'rewards/margins': '0.06372', 'logps/chosen': '-324.8', 'logps/rejected': '-314.8', 'logits/chosen': '-1.103', 'logits/rejected': '-1.101', 'epoch': '1.181'} |
| {'loss': '0.6683', 'grad_norm': '73.5', 'learning_rate': '3.649e-07', 'rewards/chosen': '0.4114', 'rewards/rejected': '0.3468', 'rewards/accuracies': '0.6187', ' |
| rewards/margins': '0.06465', 'logps/chosen': '-333', 'logps/rejected': '-317.7', 'logits/chosen': '-1.109', 'logits/rejected': '-1.081', 'epoch': '1.25'} |
| {'loss': '0.6815', 'grad_norm': '51.75', 'learning_rate': '3.466e-07', 'rewards/chosen': '0.4475', 'rewards/rejected': '0.4136', 'rewards/accuracies': '0.525', ' |
| rewards/margins': '0.03387', 'logps/chosen': '-342.1', 'logps/rejected': '-329.1', 'logits/chosen': '-1.028', 'logits/rejected': '-1.028', 'epoch': '1.32'} |
| {'loss': '0.6729', 'grad_norm': '43.25', 'learning_rate': '3.276e-07', 'rewards/chosen': '0.4052', 'rewards/rejected': '0.3567', 'rewards/accuracies': '0.5875', |
| 'rewards/margins': '0.04842', 'logps/chosen': '-323.9', 'logps/rejected': '-300.8', 'logits/chosen': '-1.055', 'logits/rejected': '-1.044', 'epoch': '1.39'} |
| {'eval_loss': '0.6919', 'eval_runtime': '11.27', 'eval_samples_per_second': '10.73', 'eval_steps_per_second': '2.75', 'eval_rewards/chosen': '0.4347', 'eval_rewa |
| rds/rejected': '0.423', 'eval_rewards/accuracies': '0.5484', 'eval_rewards/margins': '0.01169', 'eval_logps/chosen': '-315.1', 'eval_logps/rejected': '-316.2', ' |
| eval_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.08', 'epoch': '1.39'} |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.63s/it] |
| {'loss': '0.6761', 'grad_norm': '44.5', 'learning_rate': '3.082e-07', 'rewards/chosen': '0.4774', 'rewards/rejected': '0.4275', 'rewards/accuracies': '0.5938', ' |
| rewards/margins': '0.04995', 'logps/chosen': '-326.5', 'logps/rejected': '-321.4', 'logits/chosen': '-1.048', 'logits/rejected': '-1.049', 'epoch': '1.459'} |
| {'loss': '0.6859', 'grad_norm': '69', 'learning_rate': '2.883e-07', 'rewards/chosen': '0.4648', 'rewards/rejected': '0.4392', 'rewards/accuracies': '0.5375', 're |
| wards/margins': '0.0256', 'logps/chosen': '-323.2', 'logps/rejected': '-316.2', 'logits/chosen': '-1.064', 'logits/rejected': '-1.086', 'epoch': '1.529'} |
| {'loss': '0.6923', 'grad_norm': '51.25', 'learning_rate': '2.682e-07', 'rewards/chosen': '0.4878', 'rewards/rejected': '0.4737', 'rewards/accuracies': '0.525', ' |
| rewards/margins': '0.01413', 'logps/chosen': '-331.1', 'logps/rejected': '-318.2', 'logits/chosen': '-1.058', 'logits/rejected': '-1.07', 'epoch': '1.598'} |
| {'loss': '0.676', 'grad_norm': '44', 'learning_rate': '2.48e-07', 'rewards/chosen': '0.4827', 'rewards/rejected': '0.4337', 'rewards/accuracies': '0.5562', 'rewa |
| rds/margins': '0.04903', 'logps/chosen': '-329.4', 'logps/rejected': '-324.9', 'logits/chosen': '-1.059', 'logits/rejected': '-1.051', 'epoch': '1.668'} |
| {'loss': '0.6791', 'grad_norm': '47', 'learning_rate': '2.278e-07', 'rewards/chosen': '0.4604', 'rewards/rejected': '0.4184', 'rewards/accuracies': '0.5938', 're |
| wards/margins': '0.04207', 'logps/chosen': '-313.2', 'logps/rejected': '-302.2', 'logits/chosen': '-1.113', 'logits/rejected': '-1.119', 'epoch': '1.737'} |
| {'loss': '0.6898', 'grad_norm': '74.5', 'learning_rate': '2.077e-07', 'rewards/chosen': '0.4732', 'rewards/rejected': '0.4554', 'rewards/accuracies': '0.55', 're |
| wards/margins': '0.01779', 'logps/chosen': '-326.7', 'logps/rejected': '-328.2', 'logits/chosen': '-1.096', 'logits/rejected': '-1.085', 'epoch': '1.807'} |
| {'loss': '0.6795', 'grad_norm': '54.25', 'learning_rate': '1.879e-07', 'rewards/chosen': '0.4701', 'rewards/rejected': '0.4328', 'rewards/accuracies': '0.5375', |
| 'rewards/margins': '0.03733', 'logps/chosen': '-331.4', 'logps/rejected': '-313.8', 'logits/chosen': '-1.082', 'logits/rejected': '-1.084', 'epoch': '1.877'} |
| {'loss': '0.6856', 'grad_norm': '50.5', 'learning_rate': '1.685e-07', 'rewards/chosen': '0.4802', 'rewards/rejected': '0.4444', 'rewards/accuracies': '0.5437', ' |
| rewards/margins': '0.0358', 'logps/chosen': '-331.5', 'logps/rejected': '-321.4', 'logits/chosen': '-1.079', 'logits/rejected': '-1.094', 'epoch': '1.946'} |
| {'loss': '0.6907', 'grad_norm': '45.25', 'learning_rate': '1.497e-07', 'rewards/chosen': '0.4531', 'rewards/rejected': '0.4403', 'rewards/accuracies': '0.4872', |
| 'rewards/margins': '0.01279', 'logps/chosen': '-322.7', 'logps/rejected': '-319.2', 'logits/chosen': '-1.074', 'logits/rejected': '-1.077', 'epoch': '2.014'} |
| {'loss': '0.6699', 'grad_norm': '45.5', 'learning_rate': '1.315e-07', 'rewards/chosen': '0.4449', 'rewards/rejected': '0.3841', 'rewards/accuracies': '0.6125', ' |
| rewards/margins': '0.0608', 'logps/chosen': '-331.7', 'logps/rejected': '-313.5', 'logits/chosen': '-1.096', 'logits/rejected': '-1.12', 'epoch': '2.083'} |
| {'eval_loss': '0.6872', 'eval_runtime': '11.29', 'eval_samples_per_second': '10.72', 'eval_steps_per_second': '2.746', 'eval_rewards/chosen': '0.4224', 'eval_rew |
| ards/rejected': '0.4068', 'eval_rewards/accuracies': '0.5403', 'eval_rewards/margins': '0.01561', 'eval_logps/chosen': '-315.2', 'eval_logps/rejected': '-316.3', |
| 'eval_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.081', 'epoch': '2.083'} |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.49s/it] |
| {'loss': '0.6787', 'grad_norm': '47.5', 'learning_rate': '1.141e-07', 'rewards/chosen': '0.4379', 'rewards/rejected': '0.3979', 'rewards/accuracies': '0.5875', ' |
| rewards/margins': '0.04007', 'logps/chosen': '-317.6', 'logps/rejected': '-313.9', 'logits/chosen': '-1.095', 'logits/rejected': '-1.064', 'epoch': '2.153'} |
| {'loss': '0.6964', 'grad_norm': '75.5', 'learning_rate': '9.754e-08', 'rewards/chosen': '0.4462', 'rewards/rejected': '0.4411', 'rewards/accuracies': '0.5', 'rew |
| ards/margins': '0.005142', 'logps/chosen': '-322.7', 'logps/rejected': '-325', 'logits/chosen': '-1.037', 'logits/rejected': '-1.03', 'epoch': '2.223'} |
| {'loss': '0.6791', 'grad_norm': '50', 'learning_rate': '8.202e-08', 'rewards/chosen': '0.431', 'rewards/rejected': '0.3913', 'rewards/accuracies': '0.5813', 'rew |
| ards/margins': '0.03968', 'logps/chosen': '-314.9', 'logps/rejected': '-309.6', 'logits/chosen': '-1.077', 'logits/rejected': '-1.065', 'epoch': '2.292'} |
| {'loss': '0.6749', 'grad_norm': '48', 'learning_rate': '6.759e-08', 'rewards/chosen': '0.5079', 'rewards/rejected': '0.4567', 'rewards/accuracies': '0.625', 'rew |
| ards/margins': '0.05118', 'logps/chosen': '-337.2', 'logps/rejected': '-322.6', 'logits/chosen': '-1.071', 'logits/rejected': '-1.091', 'epoch': '2.362'} |
| {'loss': '0.6916', 'grad_norm': '60.5', 'learning_rate': '5.436e-08', 'rewards/chosen': '0.4605', 'rewards/rejected': '0.4482', 'rewards/accuracies': '0.5688', ' |
| rewards/margins': '0.01234', 'logps/chosen': '-330', 'logps/rejected': '-326', 'logits/chosen': '-1.065', 'logits/rejected': '-1.052', 'epoch': '2.431'} |
| {'loss': '0.6638', 'grad_norm': '51.75', 'learning_rate': '4.241e-08', 'rewards/chosen': '0.4647', 'rewards/rejected': '0.3956', 'rewards/accuracies': '0.6625', |
| 'rewards/margins': '0.06903', 'logps/chosen': '-336.8', 'logps/rejected': '-316', 'logits/chosen': '-1.052', 'logits/rejected': '-1.045', 'epoch': '2.501'} |
| {'loss': '0.669', 'grad_norm': '49', 'learning_rate': '3.183e-08', 'rewards/chosen': '0.4633', 'rewards/rejected': '0.4018', 'rewards/accuracies': '0.5688', 'rew |
| ards/margins': '0.06144', 'logps/chosen': '-329.5', 'logps/rejected': '-319.5', 'logits/chosen': '-1.077', 'logits/rejected': '-1.078', 'epoch': '2.57'} |
| {'loss': '0.6458', 'grad_norm': '56.5', 'learning_rate': '2.267e-08', 'rewards/chosen': '0.4771', 'rewards/rejected': '0.3655', 'rewards/accuracies': '0.675', 'r |
| ewards/margins': '0.1116', 'logps/chosen': '-328.8', 'logps/rejected': '-311.1', 'logits/chosen': '-1.109', 'logits/rejected': '-1.082', 'epoch': '2.64'} |
| {'loss': '0.6756', 'grad_norm': '51', 'learning_rate': '1.5e-08', 'rewards/chosen': '0.4508', 'rewards/rejected': '0.4025', 'rewards/accuracies': '0.5813', 'rewa |
| rds/margins': '0.04833', 'logps/chosen': '-316.7', 'logps/rejected': '-307', 'logits/chosen': '-1.132', 'logits/rejected': '-1.112', 'epoch': '2.71'} |
| {'loss': '0.6875', 'grad_norm': '77', 'learning_rate': '8.871e-09', 'rewards/chosen': '0.4579', 'rewards/rejected': '0.4322', 'rewards/accuracies': '0.5562', 're |
| wards/margins': '0.02568', 'logps/chosen': '-324', 'logps/rejected': '-318.5', 'logits/chosen': '-1.084', 'logits/rejected': '-1.093', 'epoch': '2.779'} |
| {'eval_loss': '0.6914', 'eval_runtime': '11.25', 'eval_samples_per_second': '10.76', 'eval_steps_per_second': '2.756', 'eval_rewards/chosen': '0.4316', 'eval_rew |
| ards/rejected': '0.42', 'eval_rewards/accuracies': '0.5323', 'eval_rewards/margins': '0.0116', 'eval_logps/chosen': '-315.2', 'eval_logps/rejected': '-316.2', 'e |
| val_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.08', 'epoch': '2.779'} |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.67s/it] |
| {'loss': '0.6733', 'grad_norm': '41.75', 'learning_rate': '4.323e-09', 'rewards/chosen': '0.4507', 'rewards/rejected': '0.4019', 'rewards/accuracies': '0.575', ' |
| rewards/margins': '0.04884', 'logps/chosen': '-314', 'logps/rejected': '-312', 'logits/chosen': '-1.078', 'logits/rejected': '-1.076', 'epoch': '2.849'} |
| {'loss': '0.6755', 'grad_norm': '47.25', 'learning_rate': '1.384e-09', 'rewards/chosen': '0.4595', 'rewards/rejected': '0.4095', 'rewards/accuracies': '0.575', ' |
| rewards/margins': '0.04998', 'logps/chosen': '-332.6', 'logps/rejected': '-328', 'logits/chosen': '-1.062', 'logits/rejected': '-1.051', 'epoch': '2.918'} |
| {'loss': '0.6853', 'grad_norm': '52.25', 'learning_rate': '7.375e-11', 'rewards/chosen': '0.4191', 'rewards/rejected': '0.3941', 'rewards/accuracies': '0.5625', |
| 'rewards/margins': '0.02507', 'logps/chosen': '-325.4', 'logps/rejected': '-314.9', 'logits/chosen': '-1.044', 'logits/rejected': '-1.07', 'epoch': '2.988'} |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.44s/it] |
| There were missing keys in the checkpoint model loaded: ['lm_head.weight'].ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:04<00:00, 4.44s/it] |
| {'train_runtime': '1609', 'train_samples_per_second': '4.284', 'train_steps_per_second': '0.269', 'train_loss': '0.6827', 'epoch': '3'} |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 432/432 [26:48<00:00, 3.72s/it] |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:05<00:00, 5.06s/it] |
| [dpo_train] Final model saved β models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126/final_model |
| [dpo_train] Run metadata β models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126/run_meta.json |
|
|
| [dpo_train] Done. |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ ls |
| LICENSE README.md configs models pyproject.toml results scripts src uv.lock |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ ls |
| LICENSE README.md configs models pyproject.toml results scripts src uv.lock |
| (main) root@C.31890549:/workspace/p-vector-LFM2.5$ tmux capture-pane -pS - > training_evals.txt |
|
|
|
|