Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
LisaMegaWatts
/
JuliaGPTDistill
like
0
Text Generation
LisaMegaWatts/philosophy-corpus
English
flux
julia
flux-jl
distillation
knowledge-distillation
llama-style
gqa
rope
rmsnorm
swiglu
bpe
philosophy
Eval Results (legacy)
License:
mit
Model card
Files
Files and versions
xet
Community
main
JuliaGPTDistill
125 MB
1 contributor
History:
33 commits
LisaMegaWatts
Add 2000-vocab BPE tokenizer for inference serving
61a56f0
verified
4 days ago
.gitattributes
Safe
1.68 kB
Upload final_model.jld2 (39.7 MB)
6 days ago
README.md
Safe
3.03 kB
Add proper model card: 256d/4L/4H/2KV, vocab=2000, distilled from JuliaFluxGPT
4 days ago
best_model.jld2
41.7 MB
xet
Upload best_model.jld2 (39.7 MB)
6 days ago
checkpoint_latest.jld2
41.7 MB
xet
Upload checkpoint_latest.jld2 (39.7 MB)
6 days ago
final_model.jld2
41.7 MB
xet
Upload final_model.jld2 (39.7 MB)
6 days ago
tokenizer.json
130 kB
Add 2000-vocab BPE tokenizer for inference serving
4 days ago