LisaMegaWatts
/

JuliaSLM

@@ -2,6 +2,7 @@
 language:
 - en
 library_name: julia
 pipeline_tag: text-generation
 tags:
 - philosophy
@@ -14,76 +15,63 @@ tags:
 - swiglu
 - small-language-model
 - openai-compatible
 datasets:
 - LisaMegaWatts/philosophy-corpus
 ---
-# JuliaSLM
-A ~5M parameter decoder-only transformer trained on classical philosophy and liberal arts texts. Built entirely in Julia with Lux.jl, featuring a modern architecture (RoPE, RMSNorm, SwiGLU, weight tying).
-## Architecture
 | Component | Detail |
 |---|---|
-| Parameters | ~4.7M |
 | Embedding dim | 256 |
 | Layers | 6 |
-| Attention heads | 4 |
-| Head dim | 64 |
-| FFN multiplier | 4x (SwiGLU) |
 | Context length | 256 tokens |
-| Positional encoding | Rotary (RoPE) |
-| Normalization | RMSNorm (pre-norm) |
-| Feed-forward | SwiGLU |
-| Weight tying | Yes (embedding = output projection) |
 | Tokenizer | BPE, 2000 subword tokens |
-## Training
-- **Framework:** Lux.jl (pure Julia, explicit parameter/state management)
-- **Optimizer:** AdamW (lr=6e-4, cosine decay to 6e-5, 500 warmup steps)
-- **Precision:** Mixed F16/F32
-- **Batch size:** 32
-- **Steps:** 12,305 (Chinchilla-optimal: ~100M tokens at 20 tokens/param)
-- **Gradient clipping:** max norm 1.0
-- **Hardware:** RTX 3060 12GB
-## Training Data
-Classical philosophy and liberal arts corpus (~2.4GB text) including:
-- **Trivium** (grammar, logic, rhetoric): Aristotle, Plato, Cicero, Seneca, Marcus Aurelius, Epictetus
-- **Quadrivium** (arithmetic, geometry, music, astronomy): Euclid, Ptolemy, Boethius
-- **Bridging texts**: interdisciplinary classical works
-- Supplementary WikiText data
-Processed via a custom text pipeline with sentence-boundary chunking, Unicode normalization, and deduplication.
 ## Files
-- `final.jld2` — Model checkpoint (parameters in JLD2 format)
-- `config.toml` — Model architecture configuration
-- `vocab.json` — BPE vocabulary (2000 tokens, dict format)
-- `merges.txt` — BPE merge rules
 ## Inference
-Served via an OpenAI-compatible API at [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM):
 ```bash
 curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{"messages": [{"role": "user", "content": "the nature of"}], "stream": true, "temperature": 0.8, "top_k": 40}'
-```
-Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference with no Lux dependency at runtime (pure NNlib).
-## Lineage
-Successor to [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (~5K params, character-level, scalar autograd). JuliaSLM upgrades to BPE tokenization, modern transformer components, and Chinchilla-optimal training at 1000x scale.
-## Links
-- [Inference Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)
-- [Training data](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)
-- [Source code](https://github.com/DavinciDreams/JuliaGPT)

 language:
 - en
 library_name: julia
+license: mit
 pipeline_tag: text-generation
 tags:
 - philosophy
 - swiglu
 - small-language-model
 - openai-compatible
+- chinchilla
 datasets:
 - LisaMegaWatts/philosophy-corpus
 ---
+# JuliaSLM — Inference Artifacts
+Serving-ready artifacts for the 5M parameter JuliaSLM transformer, packaged for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM). This repo contains the checkpoint, tokenizer, and config needed by the OpenAI-compatible inference server.
+For full model documentation, training details, loss curves, and usage instructions, see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
+## Model Summary
 | Component | Detail |
 |---|---|
+| Parameters | 5,037,312 |
+| Architecture | Decoder-only Transformer (RoPE, RMSNorm, SwiGLU) |
 | Embedding dim | 256 |
 | Layers | 6 |
+| Attention heads | 4 (head dim 64) |
 | Context length | 256 tokens |
 | Tokenizer | BPE, 2000 subword tokens |
+| Weight tying | Yes |
+| Training | Chinchilla-optimal (~100M tokens), AdamW, F16 mixed precision |
+| Final val loss | 3.54 (PPL 34.5) |
 ## Files
+| File | Description |
+|---|---|
+| `final.jld2` | Model parameters (JLD2 format, 58MB) |
+| `config.toml` | Architecture config (from 5m-chinchilla) |
+| `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
+| `merges.txt` | BPE merge rules |
 ## Inference
+The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API:
 ```bash
+# Streaming
 curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{"messages": [{"role": "user", "content": "the nature of"}], "stream": true, "temperature": 0.8, "top_k": 40}'
+# Non-streaming
+curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
+```
+Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
+## Related
+- **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** — Canonical model repo with full training details, loss curves, architecture diagrams, and code examples
+- **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** — Live inference endpoint
+- **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** — Training dataset
+- **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** — Predecessor (~5K params, character-level)
+- **[Source code](https://github.com/DavinciDreams/JuliaGPT)** — GitHub repository