p-vector / training_evals.txt
saranshagarwal2020's picture
Upload training_evals.txt with huggingface_hub
56c2a44 verified
Welcome to your vast.ai container! This session is running in `tmux`.
To disconnect without closing your processes, press ctrl+b, release, then d.
To disable auto-tmux, run `touch ~/.no_auto_tmux` and reconnect. See also https://tmuxcheatsheet.com/
Activated conda/uv virtual environment at /venv/main
(main) root@C.31890549:/workspace$ cd p-vector-LFM2.5/
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | flash
-bash: flash: command not found
Using Python 3.12.12 environment at: /venv/main
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | grep flash
Using Python 3.12.12 environment at: /venv/main
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip freeze | grep tor
Using Python 3.12.12 environment at: /venv/main
decorator==5.2.1
torch==2.10.0+cu130
torchaudio==2.10.0+cu130
torchcodec==0.10.0
torchdata==0.10.0
torchtext==0.6.0
torchvision==0.25.0+cu130
tornado==6.5.4
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip uninstall torch
Using Python 3.12.12 environment at: /venv/main
Uninstalled 1 package in 1.02s
- torch==2.10.0+cu130
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install torch==2.8
Using Python 3.12.12 environment at: /venv/main
Resolved 25 packages in 232ms
Prepared 16 packages in 41.44s
Uninstalled 1 package in 77ms
Installed 16 packages in 2.61s
+ nvidia-cublas-cu12==12.8.4.1
+ nvidia-cuda-cupti-cu12==12.8.90
+ nvidia-cuda-nvrtc-cu12==12.8.93
+ nvidia-cuda-runtime-cu12==12.8.90
+ nvidia-cudnn-cu12==9.10.2.21
+ nvidia-cufft-cu12==11.3.3.83
+ nvidia-cufile-cu12==1.13.1.3
+ nvidia-curand-cu12==10.3.9.90
+ nvidia-cusolver-cu12==11.7.3.90
+ nvidia-cusparse-cu12==12.5.8.93
+ nvidia-cusparselt-cu12==0.7.1
+ nvidia-nccl-cu12==2.27.3
+ nvidia-nvjitlink-cu12==12.8.93
+ nvidia-nvtx-cu12==12.8.90
+ torch==2.8.0
- triton==3.6.0
+ triton==3.4.0
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install flash-attn
Using Python 3.12.12 environment at: /venv/main
Resolved 27 packages in 1.84s
Γ— Failed to build `flash-attn==2.8.3`
β”œβ”€β–Ά The build backend returned an error
╰─▢ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)
[stderr]
/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/_vendor/wheel/bdist_wheel.py:4: FutureWarning: The 'wheel' package is no
longer the canonical location of the 'bdist_wheel' command, and will be removed in a future release. Please update to setuptools v70.1 or later which
contains an integrated version of this command.
warn(
Traceback (most recent call last):
File "<string>", line 14, in <module>
File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
self.run_setup()
File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 520, in run_setup
super().run_setup(setup_script=setup_script)
File "/tmp/.tmpP08CEc/builds-v0/.tmpwiMDMU/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 22, in <module>
ModuleNotFoundError: No module named 'torch'
hint: This error likely indicates that `flash-attn@2.8.3` depends on `torch`, but doesn't declare it as a build dependency. If `flash-attn` is a
first-party package, consider adding `torch` to its `build-system.requires`. Otherwise, either add it to your `pyproject.toml` under:
[tool.uv.extra-build-dependencies]
flash-attn = ["torch"]
or `uv pip install torch` into the environment and re-run with `--no-build-isolation`.
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ uv pip install flash-attn --no-build-isolation
Using Python 3.12.12 environment at: /venv/main
Resolved 27 packages in 1.89s
Built flash-attn==2.8.3
Prepared 2 packages in 45.17s
Installed 2 packages in 824ms
+ einops==0.8.2
+ flash-attn==2.8.3
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ nano scripts/run_dpo_train.sh
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ ./scripts/run_dpo_train.sh --batch_size 4 --grad_accum 4
========================================
DPO Full Fine-Tuning
========================================
Model : LiquidAI/LFM2.5-1.2B-Instruct
Dataset : argilla/distilabel-math-preference-dpo
Epochs : 3
Batch size : 1 (grad_accum=16, eff=16)
Learning rate : 5e-7
DPO beta : 0.1
Reference : NF4 4-bit (pass --no_ref_4bit for bfloat16)
Output dir : models
========================================
[dpo_train] Run : dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126
[dpo_train] Output : models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126
[dpo_train] Loading dataset: argilla/distilabel-math-preference-dpo split=train
README.md: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 815/815 [00:00<00:00, 1.66MB/s]
data/train-00000-of-00001-f59ecdcaca8c1d(…): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.86M/2.86M [00:01<00:00, 1.72MB/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2418/2418 [00:00<00:00, 62863.86 examples/s]
[dpo_train] Full size : 2,418 rows | columns: ['metadata', 'instruction', 'chosen_response', 'chosen_rating', 'rejected_response', 'rejected_rating']
[dpo_train] Columns : instruction='instruction' chosen='chosen_response' rejected='rejected_response'
Map: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2418/2418 [00:00<00:00, 6660.14 examples/s]
Filter: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2418/2418 [00:00<00:00, 42641.74 examples/s]
[dpo_train] After cleaning: 2,418 rows
[dpo_train] Train: 2,297 Eval: 121
config.json: 1.22kB [00:00, 3.98MB/s]
tokenizer_config.json: 92.2kB [00:00, 121MB/s]
tokenizer.json: 4.73MB [00:00, 24.8MB/s]
special_tokens_map.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 434/434 [00:00<00:00, 2.07MB/s]
chat_template.jinja: 1.78kB [00:00, 5.01MB/s]
[dpo_train] Loading policy model (bfloat16, trainable) …
`torch_dtype` is deprecated! Use `dtype` instead!
model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.34G/2.34G [00:36<00:00, 64.2MB/s]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 148/148 [00:00<00:00, 243.39it/s, Materializing param=model.layers.15.operator_norm.weight]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 132/132 [00:00<00:00, 395kB/s]
[dpo_train] Loading reference model (bfloat16, frozen) …
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 148/148 [00:00<00:00, 220.72it/s, Materializing param=model.layers.15.operator_norm.weight]
[dpo_train] Policy params : 1170M (all trainable)
/workspace/p-vector-LFM2.5/src/dpo_train.py:241: FutureWarning: `max_prompt_length` is deprecated and will be removed in version 0.29.0. We recommend filtering o
ut overlong prompts from your dataset before passing it to the trainer instead of using this parameter.
dpo_config = DPOConfig(
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
Extracting prompt in train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2297/2297 [00:00<00:00, 5125.14 examples/s]
Applying chat template to train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2297/2297 [00:00<00:00, 3022.28 examples/s]
Tokenizing train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2297/2297 [00:03<00:00, 737.08 examples/s]
Extracting prompt in eval dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 121/121 [00:00<00:00, 4787.21 examples/s]
Applying chat template to eval dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 121/121 [00:00<00:00, 2860.94 examples/s]
Tokenizing eval dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 121/121 [00:00<00:00, 737.29 examples/s]
[dpo_train] Starting DPO full fine-tuning (epochs=3 eff_batch=16) …
{'loss': '0.6927', 'grad_norm': '44.25', 'learning_rate': '1.023e-07', 'rewards/chosen': '0.01433', 'rewards/rejected': '0.01006', 'rewards/accuracies': '0.4187'
, 'rewards/margins': '0.004265', 'logps/chosen': '-332.1', 'logps/rejected': '-332.9', 'logits/chosen': '-1.06', 'logits/rejected': '-1.042', 'epoch': '0.06957'}
{'loss': '0.7004', 'grad_norm': '50.5', 'learning_rate': '2.159e-07', 'rewards/chosen': '0.004466', 'rewards/rejected': '0.01571', 'rewards/accuracies': '0.45',
'rewards/margins': '-0.01124', 'logps/chosen': '-329.2', 'logps/rejected': '-312.3', 'logits/chosen': '-1.098', 'logits/rejected': '-1.097', 'epoch': '0.1391'}
{'loss': '0.7032', 'grad_norm': '54.75', 'learning_rate': '3.295e-07', 'rewards/chosen': '0.03125', 'rewards/rejected': '0.04735', 'rewards/accuracies': '0.4313'
, 'rewards/margins': '-0.0161', 'logps/chosen': '-328.1', 'logps/rejected': '-305.9', 'logits/chosen': '-1.137', 'logits/rejected': '-1.171', 'epoch': '0.2087'}
{'loss': '0.6893', 'grad_norm': '50.5', 'learning_rate': '4.432e-07', 'rewards/chosen': '0.09773', 'rewards/rejected': '0.08618', 'rewards/accuracies': '0.4938',
'rewards/margins': '0.01155', 'logps/chosen': '-338.8', 'logps/rejected': '-328.1', 'logits/chosen': '-1.089', 'logits/rejected': '-1.102', 'epoch': '0.2783'}
{'loss': '0.6982', 'grad_norm': '53.5', 'learning_rate': '4.998e-07', 'rewards/chosen': '0.1568', 'rewards/rejected': '0.161', 'rewards/accuracies': '0.45', 'rew
ards/margins': '-0.004219', 'logps/chosen': '-329.1', 'logps/rejected': '-323.8', 'logits/chosen': '-1.137', 'logits/rejected': '-1.147', 'epoch': '0.3478'}
{'loss': '0.6905', 'grad_norm': '64', 'learning_rate': '4.982e-07', 'rewards/chosen': '0.1544', 'rewards/rejected': '0.1441', 'rewards/accuracies': '0.5312', 're
wards/margins': '0.01032', 'logps/chosen': '-335.6', 'logps/rejected': '-329', 'logits/chosen': '-1.09', 'logits/rejected': '-1.07', 'epoch': '0.4174'}
{'loss': '0.7009', 'grad_norm': '72.5', 'learning_rate': '4.949e-07', 'rewards/chosen': '0.184', 'rewards/rejected': '0.194', 'rewards/accuracies': '0.475', 'rew
ards/margins': '-0.01001', 'logps/chosen': '-319.4', 'logps/rejected': '-317.8', 'logits/chosen': '-1.122', 'logits/rejected': '-1.114', 'epoch': '0.487'}
{'loss': '0.6878', 'grad_norm': '52.5', 'learning_rate': '4.9e-07', 'rewards/chosen': '0.1901', 'rewards/rejected': '0.1733', 'rewards/accuracies': '0.475', 'rew
ards/margins': '0.01679', 'logps/chosen': '-314.5', 'logps/rejected': '-311.9', 'logits/chosen': '-1.116', 'logits/rejected': '-1.108', 'epoch': '0.5565'}
{'loss': '0.6957', 'grad_norm': '47.75', 'learning_rate': '4.836e-07', 'rewards/chosen': '0.2123', 'rewards/rejected': '0.2113', 'rewards/accuracies': '0.4375',
'rewards/margins': '0.0009234', 'logps/chosen': '-336.4', 'logps/rejected': '-332', 'logits/chosen': '-1.074', 'logits/rejected': '-1.058', 'epoch': '0.6261'}
{'loss': '0.6817', 'grad_norm': '49', 'learning_rate': '4.756e-07', 'rewards/chosen': '0.2641', 'rewards/rejected': '0.2347', 'rewards/accuracies': '0.5813', 're
wards/margins': '0.02945', 'logps/chosen': '-330.8', 'logps/rejected': '-323.5', 'logits/chosen': '-1.046', 'logits/rejected': '-1.038', 'epoch': '0.6957'}
{'eval_loss': '0.6925', 'eval_runtime': '11.31', 'eval_samples_per_second': '10.7', 'eval_steps_per_second': '2.741', 'eval_rewards/chosen': '0.2332', 'eval_rewa
rds/rejected': '0.2294', 'eval_rewards/accuracies': '0.4839', 'eval_rewards/margins': '0.003864', 'eval_logps/chosen': '-317.1', 'eval_logps/rejected': '-318.1',
'eval_logits/chosen': '-1.071', 'eval_logits/rejected': '-1.092', 'epoch': '0.6957'}
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.36s/it]
{'loss': '0.6985', 'grad_norm': '47', 'learning_rate': '4.662e-07', 'rewards/chosen': '0.2238', 'rewards/rejected': '0.2288', 'rewards/accuracies': '0.4875', 're
wards/margins': '-0.005009', 'logps/chosen': '-315', 'logps/rejected': '-308.4', 'logits/chosen': '-1.074', 'logits/rejected': '-1.078', 'epoch': '0.7652'}
{'loss': '0.681', 'grad_norm': '51.25', 'learning_rate': '4.553e-07', 'rewards/chosen': '0.3184', 'rewards/rejected': '0.288', 'rewards/accuracies': '0.5938', 'r
ewards/margins': '0.03039', 'logps/chosen': '-332.7', 'logps/rejected': '-320', 'logits/chosen': '-1.049', 'logits/rejected': '-1.044', 'epoch': '0.8348'}
{'loss': '0.6885', 'grad_norm': '57.5', 'learning_rate': '4.431e-07', 'rewards/chosen': '0.3165', 'rewards/rejected': '0.3008', 'rewards/accuracies': '0.5188', '
rewards/margins': '0.01573', 'logps/chosen': '-333.5', 'logps/rejected': '-321.6', 'logits/chosen': '-1.093', 'logits/rejected': '-1.091', 'epoch': '0.9043'}
{'loss': '0.693', 'grad_norm': '50.75', 'learning_rate': '4.296e-07', 'rewards/chosen': '0.3366', 'rewards/rejected': '0.328', 'rewards/accuracies': '0.5', 'rewa
rds/margins': '0.008592', 'logps/chosen': '-336.9', 'logps/rejected': '-312.6', 'logits/chosen': '-1.07', 'logits/rejected': '-1.086', 'epoch': '0.9739'}
{'loss': '0.6797', 'grad_norm': '58.25', 'learning_rate': '4.15e-07', 'rewards/chosen': '0.3648', 'rewards/rejected': '0.3353', 'rewards/accuracies': '0.5321', '
rewards/margins': '0.02947', 'logps/chosen': '-297.7', 'logps/rejected': '-304.8', 'logits/chosen': '-1.138', 'logits/rejected': '-1.139', 'epoch': '1.042'}
{'loss': '0.6664', 'grad_norm': '62', 'learning_rate': '3.992e-07', 'rewards/chosen': '0.3742', 'rewards/rejected': '0.3119', 'rewards/accuracies': '0.6313', 're
wards/margins': '0.06225', 'logps/chosen': '-325.7', 'logps/rejected': '-314.6', 'logits/chosen': '-1.076', 'logits/rejected': '-1.074', 'epoch': '1.111'}
{'loss': '0.6656', 'grad_norm': '48.25', 'learning_rate': '3.825e-07', 'rewards/chosen': '0.3955', 'rewards/rejected': '0.3318', 'rewards/accuracies': '0.6438',
'rewards/margins': '0.06372', 'logps/chosen': '-324.8', 'logps/rejected': '-314.8', 'logits/chosen': '-1.103', 'logits/rejected': '-1.101', 'epoch': '1.181'}
{'loss': '0.6683', 'grad_norm': '73.5', 'learning_rate': '3.649e-07', 'rewards/chosen': '0.4114', 'rewards/rejected': '0.3468', 'rewards/accuracies': '0.6187', '
rewards/margins': '0.06465', 'logps/chosen': '-333', 'logps/rejected': '-317.7', 'logits/chosen': '-1.109', 'logits/rejected': '-1.081', 'epoch': '1.25'}
{'loss': '0.6815', 'grad_norm': '51.75', 'learning_rate': '3.466e-07', 'rewards/chosen': '0.4475', 'rewards/rejected': '0.4136', 'rewards/accuracies': '0.525', '
rewards/margins': '0.03387', 'logps/chosen': '-342.1', 'logps/rejected': '-329.1', 'logits/chosen': '-1.028', 'logits/rejected': '-1.028', 'epoch': '1.32'}
{'loss': '0.6729', 'grad_norm': '43.25', 'learning_rate': '3.276e-07', 'rewards/chosen': '0.4052', 'rewards/rejected': '0.3567', 'rewards/accuracies': '0.5875',
'rewards/margins': '0.04842', 'logps/chosen': '-323.9', 'logps/rejected': '-300.8', 'logits/chosen': '-1.055', 'logits/rejected': '-1.044', 'epoch': '1.39'}
{'eval_loss': '0.6919', 'eval_runtime': '11.27', 'eval_samples_per_second': '10.73', 'eval_steps_per_second': '2.75', 'eval_rewards/chosen': '0.4347', 'eval_rewa
rds/rejected': '0.423', 'eval_rewards/accuracies': '0.5484', 'eval_rewards/margins': '0.01169', 'eval_logps/chosen': '-315.1', 'eval_logps/rejected': '-316.2', '
eval_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.08', 'epoch': '1.39'}
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.63s/it]
{'loss': '0.6761', 'grad_norm': '44.5', 'learning_rate': '3.082e-07', 'rewards/chosen': '0.4774', 'rewards/rejected': '0.4275', 'rewards/accuracies': '0.5938', '
rewards/margins': '0.04995', 'logps/chosen': '-326.5', 'logps/rejected': '-321.4', 'logits/chosen': '-1.048', 'logits/rejected': '-1.049', 'epoch': '1.459'}
{'loss': '0.6859', 'grad_norm': '69', 'learning_rate': '2.883e-07', 'rewards/chosen': '0.4648', 'rewards/rejected': '0.4392', 'rewards/accuracies': '0.5375', 're
wards/margins': '0.0256', 'logps/chosen': '-323.2', 'logps/rejected': '-316.2', 'logits/chosen': '-1.064', 'logits/rejected': '-1.086', 'epoch': '1.529'}
{'loss': '0.6923', 'grad_norm': '51.25', 'learning_rate': '2.682e-07', 'rewards/chosen': '0.4878', 'rewards/rejected': '0.4737', 'rewards/accuracies': '0.525', '
rewards/margins': '0.01413', 'logps/chosen': '-331.1', 'logps/rejected': '-318.2', 'logits/chosen': '-1.058', 'logits/rejected': '-1.07', 'epoch': '1.598'}
{'loss': '0.676', 'grad_norm': '44', 'learning_rate': '2.48e-07', 'rewards/chosen': '0.4827', 'rewards/rejected': '0.4337', 'rewards/accuracies': '0.5562', 'rewa
rds/margins': '0.04903', 'logps/chosen': '-329.4', 'logps/rejected': '-324.9', 'logits/chosen': '-1.059', 'logits/rejected': '-1.051', 'epoch': '1.668'}
{'loss': '0.6791', 'grad_norm': '47', 'learning_rate': '2.278e-07', 'rewards/chosen': '0.4604', 'rewards/rejected': '0.4184', 'rewards/accuracies': '0.5938', 're
wards/margins': '0.04207', 'logps/chosen': '-313.2', 'logps/rejected': '-302.2', 'logits/chosen': '-1.113', 'logits/rejected': '-1.119', 'epoch': '1.737'}
{'loss': '0.6898', 'grad_norm': '74.5', 'learning_rate': '2.077e-07', 'rewards/chosen': '0.4732', 'rewards/rejected': '0.4554', 'rewards/accuracies': '0.55', 're
wards/margins': '0.01779', 'logps/chosen': '-326.7', 'logps/rejected': '-328.2', 'logits/chosen': '-1.096', 'logits/rejected': '-1.085', 'epoch': '1.807'}
{'loss': '0.6795', 'grad_norm': '54.25', 'learning_rate': '1.879e-07', 'rewards/chosen': '0.4701', 'rewards/rejected': '0.4328', 'rewards/accuracies': '0.5375',
'rewards/margins': '0.03733', 'logps/chosen': '-331.4', 'logps/rejected': '-313.8', 'logits/chosen': '-1.082', 'logits/rejected': '-1.084', 'epoch': '1.877'}
{'loss': '0.6856', 'grad_norm': '50.5', 'learning_rate': '1.685e-07', 'rewards/chosen': '0.4802', 'rewards/rejected': '0.4444', 'rewards/accuracies': '0.5437', '
rewards/margins': '0.0358', 'logps/chosen': '-331.5', 'logps/rejected': '-321.4', 'logits/chosen': '-1.079', 'logits/rejected': '-1.094', 'epoch': '1.946'}
{'loss': '0.6907', 'grad_norm': '45.25', 'learning_rate': '1.497e-07', 'rewards/chosen': '0.4531', 'rewards/rejected': '0.4403', 'rewards/accuracies': '0.4872',
'rewards/margins': '0.01279', 'logps/chosen': '-322.7', 'logps/rejected': '-319.2', 'logits/chosen': '-1.074', 'logits/rejected': '-1.077', 'epoch': '2.014'}
{'loss': '0.6699', 'grad_norm': '45.5', 'learning_rate': '1.315e-07', 'rewards/chosen': '0.4449', 'rewards/rejected': '0.3841', 'rewards/accuracies': '0.6125', '
rewards/margins': '0.0608', 'logps/chosen': '-331.7', 'logps/rejected': '-313.5', 'logits/chosen': '-1.096', 'logits/rejected': '-1.12', 'epoch': '2.083'}
{'eval_loss': '0.6872', 'eval_runtime': '11.29', 'eval_samples_per_second': '10.72', 'eval_steps_per_second': '2.746', 'eval_rewards/chosen': '0.4224', 'eval_rew
ards/rejected': '0.4068', 'eval_rewards/accuracies': '0.5403', 'eval_rewards/margins': '0.01561', 'eval_logps/chosen': '-315.2', 'eval_logps/rejected': '-316.3',
'eval_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.081', 'epoch': '2.083'}
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.49s/it]
{'loss': '0.6787', 'grad_norm': '47.5', 'learning_rate': '1.141e-07', 'rewards/chosen': '0.4379', 'rewards/rejected': '0.3979', 'rewards/accuracies': '0.5875', '
rewards/margins': '0.04007', 'logps/chosen': '-317.6', 'logps/rejected': '-313.9', 'logits/chosen': '-1.095', 'logits/rejected': '-1.064', 'epoch': '2.153'}
{'loss': '0.6964', 'grad_norm': '75.5', 'learning_rate': '9.754e-08', 'rewards/chosen': '0.4462', 'rewards/rejected': '0.4411', 'rewards/accuracies': '0.5', 'rew
ards/margins': '0.005142', 'logps/chosen': '-322.7', 'logps/rejected': '-325', 'logits/chosen': '-1.037', 'logits/rejected': '-1.03', 'epoch': '2.223'}
{'loss': '0.6791', 'grad_norm': '50', 'learning_rate': '8.202e-08', 'rewards/chosen': '0.431', 'rewards/rejected': '0.3913', 'rewards/accuracies': '0.5813', 'rew
ards/margins': '0.03968', 'logps/chosen': '-314.9', 'logps/rejected': '-309.6', 'logits/chosen': '-1.077', 'logits/rejected': '-1.065', 'epoch': '2.292'}
{'loss': '0.6749', 'grad_norm': '48', 'learning_rate': '6.759e-08', 'rewards/chosen': '0.5079', 'rewards/rejected': '0.4567', 'rewards/accuracies': '0.625', 'rew
ards/margins': '0.05118', 'logps/chosen': '-337.2', 'logps/rejected': '-322.6', 'logits/chosen': '-1.071', 'logits/rejected': '-1.091', 'epoch': '2.362'}
{'loss': '0.6916', 'grad_norm': '60.5', 'learning_rate': '5.436e-08', 'rewards/chosen': '0.4605', 'rewards/rejected': '0.4482', 'rewards/accuracies': '0.5688', '
rewards/margins': '0.01234', 'logps/chosen': '-330', 'logps/rejected': '-326', 'logits/chosen': '-1.065', 'logits/rejected': '-1.052', 'epoch': '2.431'}
{'loss': '0.6638', 'grad_norm': '51.75', 'learning_rate': '4.241e-08', 'rewards/chosen': '0.4647', 'rewards/rejected': '0.3956', 'rewards/accuracies': '0.6625',
'rewards/margins': '0.06903', 'logps/chosen': '-336.8', 'logps/rejected': '-316', 'logits/chosen': '-1.052', 'logits/rejected': '-1.045', 'epoch': '2.501'}
{'loss': '0.669', 'grad_norm': '49', 'learning_rate': '3.183e-08', 'rewards/chosen': '0.4633', 'rewards/rejected': '0.4018', 'rewards/accuracies': '0.5688', 'rew
ards/margins': '0.06144', 'logps/chosen': '-329.5', 'logps/rejected': '-319.5', 'logits/chosen': '-1.077', 'logits/rejected': '-1.078', 'epoch': '2.57'}
{'loss': '0.6458', 'grad_norm': '56.5', 'learning_rate': '2.267e-08', 'rewards/chosen': '0.4771', 'rewards/rejected': '0.3655', 'rewards/accuracies': '0.675', 'r
ewards/margins': '0.1116', 'logps/chosen': '-328.8', 'logps/rejected': '-311.1', 'logits/chosen': '-1.109', 'logits/rejected': '-1.082', 'epoch': '2.64'}
{'loss': '0.6756', 'grad_norm': '51', 'learning_rate': '1.5e-08', 'rewards/chosen': '0.4508', 'rewards/rejected': '0.4025', 'rewards/accuracies': '0.5813', 'rewa
rds/margins': '0.04833', 'logps/chosen': '-316.7', 'logps/rejected': '-307', 'logits/chosen': '-1.132', 'logits/rejected': '-1.112', 'epoch': '2.71'}
{'loss': '0.6875', 'grad_norm': '77', 'learning_rate': '8.871e-09', 'rewards/chosen': '0.4579', 'rewards/rejected': '0.4322', 'rewards/accuracies': '0.5562', 're
wards/margins': '0.02568', 'logps/chosen': '-324', 'logps/rejected': '-318.5', 'logits/chosen': '-1.084', 'logits/rejected': '-1.093', 'epoch': '2.779'}
{'eval_loss': '0.6914', 'eval_runtime': '11.25', 'eval_samples_per_second': '10.76', 'eval_steps_per_second': '2.756', 'eval_rewards/chosen': '0.4316', 'eval_rew
ards/rejected': '0.42', 'eval_rewards/accuracies': '0.5323', 'eval_rewards/margins': '0.0116', 'eval_logps/chosen': '-315.2', 'eval_logps/rejected': '-316.2', 'e
val_logits/chosen': '-1.06', 'eval_logits/rejected': '-1.08', 'epoch': '2.779'}
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.67s/it]
{'loss': '0.6733', 'grad_norm': '41.75', 'learning_rate': '4.323e-09', 'rewards/chosen': '0.4507', 'rewards/rejected': '0.4019', 'rewards/accuracies': '0.575', '
rewards/margins': '0.04884', 'logps/chosen': '-314', 'logps/rejected': '-312', 'logits/chosen': '-1.078', 'logits/rejected': '-1.076', 'epoch': '2.849'}
{'loss': '0.6755', 'grad_norm': '47.25', 'learning_rate': '1.384e-09', 'rewards/chosen': '0.4595', 'rewards/rejected': '0.4095', 'rewards/accuracies': '0.575', '
rewards/margins': '0.04998', 'logps/chosen': '-332.6', 'logps/rejected': '-328', 'logits/chosen': '-1.062', 'logits/rejected': '-1.051', 'epoch': '2.918'}
{'loss': '0.6853', 'grad_norm': '52.25', 'learning_rate': '7.375e-11', 'rewards/chosen': '0.4191', 'rewards/rejected': '0.3941', 'rewards/accuracies': '0.5625',
'rewards/margins': '0.02507', 'logps/chosen': '-325.4', 'logps/rejected': '-314.9', 'logits/chosen': '-1.044', 'logits/rejected': '-1.07', 'epoch': '2.988'}
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.44s/it]
There were missing keys in the checkpoint model loaded: ['lm_head.weight'].β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.44s/it]
{'train_runtime': '1609', 'train_samples_per_second': '4.284', 'train_steps_per_second': '0.269', 'train_loss': '0.6827', 'epoch': '3'}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 432/432 [26:48<00:00, 3.72s/it]
Writing model shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:05<00:00, 5.06s/it]
[dpo_train] Final model saved β†’ models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126/final_model
[dpo_train] Run metadata β†’ models/dpo_fft_LFM2.5-1.2B-Instruct_argilla__distilabel-math-preference-dpo_20260222_195126/run_meta.json
[dpo_train] Done.
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ ls
LICENSE README.md configs models pyproject.toml results scripts src uv.lock
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ ls
LICENSE README.md configs models pyproject.toml results scripts src uv.lock
(main) root@C.31890549:/workspace/p-vector-LFM2.5$ tmux capture-pane -pS - > training_evals.txt