Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Delta-Vector
/
distill-m-6a3lnzvb-code

Model card Files Files and versions
xet
Community
distill-m-6a3lnzvb-code
39.6 kB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 7 commits
Delta-Vector's picture
Delta-Vector
grow40_winning: switch student to bf16 to fit in B200 memory + 40-layer Adam state
e9ce4f0 verified about 2 months ago
  • configs
    grow40_winning: switch student to bf16 to fit in B200 memory + 40-layer Adam state about 2 months ago
  • scripts
    fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key about 2 months ago
  • .gitattributes
    1.52 kB
    initial commit about 2 months ago
  • distill.py
    25.6 kB
    add retry loop around load_dataset for transient HF Hub 5xx about 2 months ago
  • pyproject.toml
    90 Bytes
    initial scaffold: distill.py + base/zero_14_17 configs + accelerate yaml about 2 months ago
  • requirements.lock.txt
    1.95 kB
    initial scaffold: distill.py + base/zero_14_17 configs + accelerate yaml about 2 months ago