Reference logprobs cache (DPO)
This repo stores precomputed reference-model log-probability scalars for DPO training with open-instruct dpo_tune_cache.py / dpo_utils.build_reference_logprobs_cache.
Files
62b8d956d9260cf9.pt—TensorCacheon disk: dict-like payload withchosen_logpsandrejected_logps, eachfloat32tensor of shape(N,)forN = 259922examples.
The stem (62b8d956d9260cf9) is the first 16 hex chars of SHA256(config_json) where config_json is built in dpo_utils.compute_reference_cache_hash from dataset hash, base model id, max_seq_length / transforms (via dataset_config_hash), loss_type, concatenated_forward, use_lora, etc.
Usage
- Download the
.ptfile into your reference cache directory (same basename). - Point the trainer at that directory:
export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
# Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists
The training run must use the same tokenizer, dataset, max_seq_length, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.
Source
Built for allenai/Olmo-3-7B-Instruct-SFT reference model, Dolci pretraining-continuation DPO JSONL (259922 examples), dpo_norm loss, max_seq_length=16384, concatenated_forward=false, with LoRA enabled in the training config (reference cache hash includes use_lora).