# Reference logprobs cache (DPO) This repo stores precomputed **reference-model** log-probability scalars for DPO training with [open-instruct](https://github.com/allenai/open-instruct) `dpo_tune_cache.py` / `dpo_utils.build_reference_logprobs_cache`. ## Files - `62b8d956d9260cf9.pt` — `TensorCache` on disk: dict-like payload with `chosen_logps` and `rejected_logps`, each `float32` tensor of shape `(N,)` for `N = 259922` examples. The stem (`62b8d956d9260cf9`) is the first 16 hex chars of `SHA256(config_json)` where `config_json` is built in `dpo_utils.compute_reference_cache_hash` from dataset hash, base model id, `max_seq_length` / transforms (via `dataset_config_hash`), `loss_type`, `concatenated_forward`, `use_lora`, etc. ## Usage 1. Download the `.pt` file into your reference cache directory (same basename). 2. Point the trainer at that directory: ```bash export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache # Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists ``` The training run must use the **same** tokenizer, dataset, `max_seq_length`, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored. ## Source Built for **allenai/Olmo-3-7B-Instruct-SFT** reference model, Dolci pretraining-continuation DPO JSONL (`259922` examples), `dpo_norm` loss, `max_seq_length=16384`, `concatenated_forward=false`, with LoRA enabled in the training config (reference cache hash includes `use_lora`).