# Reference logprobs cache (DPO)

This repo stores precomputed **reference-model** log-probability scalars for DPO training with [open-instruct](https://github.com/allenai/open-instruct) `dpo_tune_cache.py` / `dpo_utils.build_reference_logprobs_cache`.

## Files

- `62b8d956d9260cf9.pt` — `TensorCache` on disk: dict-like payload with `chosen_logps` and `rejected_logps`, each `float32` tensor of shape `(N,)` for `N = 259922` examples.

The stem (`62b8d956d9260cf9`) is the first 16 hex chars of `SHA256(config_json)` where `config_json` is built in `dpo_utils.compute_reference_cache_hash` from dataset hash, base model id, `max_seq_length` / transforms (via `dataset_config_hash`), `loss_type`, `concatenated_forward`, `use_lora`, etc.

## Usage

1. Download the `.pt` file into your reference cache directory (same basename).
2. Point the trainer at that directory:

```bash
export REFERENCE_LOGPROBS_CACHE_PATH=/path/to/dir/containing/cache
# Ensure /path/to/dir/containing/cache/62b8d956d9260cf9.pt exists
```

The training run must use the **same** tokenizer, dataset, `max_seq_length`, model name/revision, loss type, and LoRA flags as when the cache was built, or the hash will not match and the cache will be ignored.

## Source

Built for **allenai/Olmo-3-7B-Instruct-SFT** reference model, Dolci pretraining-continuation DPO JSONL (`259922` examples), `dpo_norm` loss, `max_seq_length=16384`, `concatenated_forward=false`, with LoRA enabled in the training config (reference cache hash includes `use_lora`).