Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
LisaMegaWatts
/
JuliaGPTDistill
like
0
Text Generation
LisaMegaWatts/philosophy-corpus
English
flux
julia
flux-jl
distillation
knowledge-distillation
llama-style
gqa
rope
rmsnorm
swiglu
bpe
philosophy
Eval Results (legacy)
License:
mit
Model card
Files
Files and versions
xet
Community
b2f06c9
JuliaGPTDistill
125 MB
Ctrl+K
Ctrl+K
1 contributor
History:
32 commits
LisaMegaWatts
Add proper model card: 256d/4L/4H/2KV, vocab=2000, distilled from JuliaFluxGPT
b2f06c9
verified
2 months ago
.gitattributes
Safe
1.68 kB
Upload final_model.jld2 (39.7 MB)
2 months ago
README.md
Safe
3.03 kB
Add proper model card: 256d/4L/4H/2KV, vocab=2000, distilled from JuliaFluxGPT
2 months ago
best_model.jld2
41.7 MB
xet
Upload best_model.jld2 (39.7 MB)
2 months ago
checkpoint_latest.jld2
41.7 MB
xet
Upload checkpoint_latest.jld2 (39.7 MB)
2 months ago
final_model.jld2
41.7 MB
xet
Upload final_model.jld2 (39.7 MB)
2 months ago