Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
LisaMegaWatts
/
juliadistill-v2
like
0
Text Generation
LisaMegaWatts/philosophy-corpus
English
flux
julia
flux-jl
distillation
knowledge-distillation
llama-style
gqa
rope
rmsnorm
swiglu
bpe
philosophy
Eval Results (legacy)
License:
mit
Model card
Files
Files and versions
xet
Community
main
juliadistill-v2
193 MB
3 contributors
History:
83 commits
LisaMegaWatts
Add 4000-vocab BPE tokenizer for inference serving
dd923fa
verified
4 days ago
.gitattributes
Safe
1.75 kB
Upload checkpoint_interrupted.jld2 (45.6 MB)
5 days ago
README.md
Safe
3.78 kB
Fix model card: vocab=4000 (not 2000), correct architecture details from checkpoint
4 days ago
best_model.jld2
47.9 MB
xet
Upload best_model.jld2 (45.6 MB)
5 days ago
checkpoint_interrupted.jld2
47.8 MB
xet
Upload checkpoint_interrupted.jld2 (45.6 MB)
5 days ago
checkpoint_latest.jld2
48.2 MB
xet
Upload checkpoint_latest.jld2 (45.9 MB)
5 days ago
final_model.jld2
48.6 MB
xet
Upload final_model.jld2 (46.4 MB)
6 days ago
tokenizer.json
268 kB
Add 4000-vocab BPE tokenizer for inference serving
4 days ago