Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

LisaMegaWatts
/
JuliaGPTDistill

Text Generation
English
flux
julia
flux-jl
distillation
knowledge-distillation
llama-style
gqa
rope
rmsnorm
swiglu
bpe
philosophy
Eval Results (legacy)
Model card Files Files and versions
xet
Community
JuliaGPTDistill
125 MB
  • 1 contributor
History: 33 commits
LisaMegaWatts's picture
LisaMegaWatts
Add 2000-vocab BPE tokenizer for inference serving
61a56f0 verified 4 days ago
  • .gitattributes
    1.68 kB
    Upload final_model.jld2 (39.7 MB) 6 days ago
  • README.md
    3.03 kB
    Add proper model card: 256d/4L/4H/2KV, vocab=2000, distilled from JuliaFluxGPT 4 days ago
  • best_model.jld2
    41.7 MB
    xet
    Upload best_model.jld2 (39.7 MB) 6 days ago
  • checkpoint_latest.jld2
    41.7 MB
    xet
    Upload checkpoint_latest.jld2 (39.7 MB) 6 days ago
  • final_model.jld2
    41.7 MB
    xet
    Upload final_model.jld2 (39.7 MB) 6 days ago
  • tokenizer.json
    130 kB
    Add 2000-vocab BPE tokenizer for inference serving 4 days ago