LisaMegaWatts
/

JuliaFluxGPT

Text Generation

grouped-query-attention

Model card Files Files and versions

1.19 GB

Ctrl+K

Ctrl+K

1 contributor

History: 267 commits

LisaMegaWatts's picture

Add PyTorch weights (.pt) converted from JLD2 checkpoint

a7ea2a7 verified 3 months ago

.gitattributes

1.95 kB
Upload julia-slm/5m-chinchilla/step_12000.jld2 with huggingface_hub 3 months ago
README.md

6.71 kB
Fix model card: match actual HF checkpoint (d=512, 8L, 8Q/2KV, ~23M params, ctx=256, FFN=1344) 3 months ago
best_model.jld2

274 MB
xet

Upload best_model.jld2 (261.1 MB) 3 months ago
checkpoint_interrupted.jld2

274 MB
xet

Upload checkpoint_interrupted.jld2 (261.1 MB) 3 months ago
checkpoint_latest.jld2

274 MB
xet

Upload checkpoint_latest.jld2 (261.2 MB) 3 months ago
final_model.jld2

274 MB
xet

Upload final_model.jld2 (261.1 MB) 3 months ago
juliaflux_weights.pt
Detected Pickle imports (3)
- "collections.OrderedDict",
- "torch._utils._rebuild_tensor_v2",
- "torch.FloatStorage"
What is a pickle import?
91.3 MB
xet

Add PyTorch weights (.pt) converted from JLD2 checkpoint 3 months ago
tokenizer.json

59.5 kB
Fix tokenizer: trim to 2000 vocab to match trained model 3 months ago