Reference logprobs cache (DPO)

This repo stores precomputed reference-model log-probability scalars for DPO training with open-instruct dpo_tune_cache.py / dpo_utils.build_reference_logprobs_cache.

Files

62b8d956d9260cf9.pt — TensorCache on disk: dict-like payload with chosen_logps and rejected_logps, each float32 tensor of shape (N,) for N = 259922 examples.

The stem (62b8d956d9260cf9) is the first 16 hex chars of SHA256(config_json) where config_json is built in dpo_utils.compute_reference_cache_hash from dataset hash, base model id, max_seq_length / transforms (via dataset_config_hash), loss_type, concatenated_forward, use_lora, etc.

Usage

Download the .pt file into your reference cache directory (same basename).
Point the trainer at that directory:

export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
# Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists

The training run must use the same tokenizer, dataset, max_seq_length, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.

Source

Built for allenai/Olmo-3-7B-Instruct-SFT reference model, Dolci pretraining-continuation DPO JSONL (259922 examples), dpo_norm loss, max_seq_length=16384, concatenated_forward=false, with LoRA enabled in the training config (reference cache hash includes use_lora).