Delta-Vector
/

distill-m-6a3lnzvb-code

Model card Files Files and versions

distill-m-6a3lnzvb-code / scripts /run_sweep_rerun.sh

Commit History

fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key

eb5278f
verified

Delta-Vector commited on Apr 7