Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Delta-Vector
/
distill-m-6a3lnzvb-code
like
0
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
e9ce4f0
distill-m-6a3lnzvb-code
39.6 kB
Ctrl+K
Ctrl+K
1 contributor
History:
7 commits
Delta-Vector
grow40_winning: switch student to bf16 to fit in B200 memory + 40-layer Adam state
e9ce4f0
verified
about 2 months ago
configs
grow40_winning: switch student to bf16 to fit in B200 memory + 40-layer Adam state
about 2 months ago
scripts
fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key
about 2 months ago
.gitattributes
Safe
1.52 kB
initial commit
about 2 months ago
distill.py
25.6 kB
add retry loop around load_dataset for transient HF Hub 5xx
about 2 months ago
pyproject.toml
90 Bytes
initial scaffold: distill.py + base/zero_14_17 configs + accelerate yaml
about 2 months ago
requirements.lock.txt
1.95 kB
initial scaffold: distill.py + base/zero_14_17 configs + accelerate yaml
about 2 months ago