Add PyTorch weights (.pt) converted from JLD2 checkpoint a7ea2a7 verified LisaMegaWatts commited on 3 days ago
Fix model card: match actual HF checkpoint (d=512, 8L, 8Q/2KV, ~23M params, ctx=256, FFN=1344) afa692e verified LisaMegaWatts commited on 3 days ago
Fix model card: actual trained model is d=256, 4 layers, 4Q/2KV, ~4M params (was incorrectly listed as 10M) 287076b verified LisaMegaWatts commited on 3 days ago
Fix model card: context_length=256 (not 512), dropout=0.1 (not 0.0) per checkpoint 5907abe verified LisaMegaWatts commited on 4 days ago
Add model card with architecture details, provenance, and training metrics 9c956d0 verified LisaMegaWatts commited on 4 days ago
Delete julia-slm/5m-chinchilla/config.toml with huggingface_hub 91a8ddf verified LisaMegaWatts commited on 5 days ago
Delete julia-slm/5m-chinchilla/step_12000.jld2 with huggingface_hub 36bde8c verified LisaMegaWatts commited on 5 days ago
Delete julia-slm/5m-chinchilla/final.jld2 with huggingface_hub ac4afd0 verified LisaMegaWatts commited on 5 days ago
Upload julia-slm/5m-chinchilla/step_12000.jld2 with huggingface_hub c0014cd verified LisaMegaWatts commited on 5 days ago
Upload julia-slm/5m-chinchilla/final.jld2 with huggingface_hub 5667a2a verified LisaMegaWatts commited on 5 days ago
Upload julia-slm/5m-chinchilla/config.toml with huggingface_hub 01972e5 verified LisaMegaWatts commited on 5 days ago
Fix tokenizer: trim to 2000 vocab to match trained model db0e784 verified LisaMegaWatts commited on 5 days ago