distill-m-6a3lnzvb-code / scripts /run_sweep_rerun.sh

Commit History

fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key
eb5278f
verified

Delta-Vector commited on