LisaMegaWatts
/

JuliaSLM

@@ -20,39 +20,98 @@ datasets:
 - LisaMegaWatts/philosophy-corpus
 ---
-# JuliaSLM — Inference Artifacts
-Serving-ready artifacts for the 5M parameter JuliaSLM transformer, packaged for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM). This repo contains the checkpoint, tokenizer, and config needed by the OpenAI-compatible inference server.
-For full model documentation, training details, loss curves, and usage instructions, see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
 ## Model Summary
 | Component | Detail |
 |---|---|
 | Parameters | 5,037,312 |
-| Architecture | Decoder-only Transformer (RoPE, RMSNorm, SwiGLU) |
 | Embedding dim | 256 |
 | Layers | 6 |
 | Attention heads | 4 (head dim 64) |
 | Context length | 256 tokens |
-| Tokenizer | BPE, 2000 subword tokens |
 | Weight tying | Yes |
-| Training | Chinchilla-optimal (~100M tokens), AdamW, F16 mixed precision |
-| Final val loss | 3.54 (PPL 34.5) |
 ## Files
 | File | Description |
 |---|---|
 | `final.jld2` | Model parameters (JLD2 format, 58MB) |
-| `config.toml` | Architecture config (from 5m-chinchilla) |
 | `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
 | `merges.txt` | BPE merge rules |
-## Inference
-The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API:
 ```bash
 # Streaming
@@ -66,12 +125,24 @@ curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
   -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
 ```
-Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
 ## Related
-- **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** — Canonical model repo with full training details, loss curves, architecture diagrams, and code examples
 - **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** — Live inference endpoint
-- **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** — Training dataset
-- **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** — Predecessor (~5K params, character-level)
 - **[Source code](https://github.com/DavinciDreams/JuliaGPT)** — GitHub repository

 - LisaMegaWatts/philosophy-corpus
 ---
+# JuliaSLM — Inference Server Artifacts
+Serving-ready artifacts for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM), an OpenAI-compatible inference endpoint for the 5M parameter JuliaSLM transformer.
+For full training details, loss curves, architecture diagrams, and code examples see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
 ## Model Summary
+A 5,037,312 parameter decoder-only transformer trained to Chinchilla-optimal (100M tokens at 20 tokens/param) on classical philosophy and liberal arts texts.
+### Architecture
+```
+JuliaGPTModel
+├── tok_emb: Embedding(2000 → 256)     # weight-tied with output head
+├── rope: RotaryPositionalEncoding(64)
+├── blocks × 6:
+│   ├── ln1: RMSNorm(256)
+│   ├── attn: MultiHeadAttention(4 heads, 64 dim each)
+│   │   ├── wq, wk, wv: Dense(256 → 256)
+│   │   └── wo: Dense(256 → 256)
+│   ├── ln2: RMSNorm(256)
+│   └── ffn: SwiGLU(256 → 1024 → 256)
+│       ├── w1: Dense(256 → 1024)  # gate
+│       ├── v:  Dense(256 → 1024)  # value
+│       └── w2: Dense(1024 → 256)  # down-project
+├── ln_f: RMSNorm(256)
+└── head: TiedEmbeddingHead → (2000,)  # shares tok_emb weights
+```
 | Component | Detail |
 |---|---|
 | Parameters | 5,037,312 |
 | Embedding dim | 256 |
 | Layers | 6 |
 | Attention heads | 4 (head dim 64) |
+| FFN multiplier | 4x (SwiGLU, hidden 1024) |
 | Context length | 256 tokens |
+| Positional encoding | Rotary (RoPE) |
+| Normalization | RMSNorm (pre-norm) |
 | Weight tying | Yes |
+| Bias | None |
+### Training
+| Metric | Value |
+|---|---|
+| Optimizer | AdamW (lr=6e-4, min_lr=6e-5, wd=0.1) |
+| Schedule | Cosine decay with 500-step warmup |
+| Precision | Mixed F16/F32 |
+| Batch size | 32 |
+| Training steps | 12,305 |
+| Tokens processed | ~100M |
+| Training time | 66 min on RTX 3060 12GB |
+| Throughput | ~26K tok/s |
+| Final val loss | 3.54 |
+| Final val PPL | 34.5 |
+### Loss Curve
+| Step | Train Loss | Val Loss | Val PPL |
+|------|-----------|----------|---------|
+| 500 | 6.69 | 5.01 | 149.6 |
+| 2,000 | 4.09 | 4.02 | 56.0 |
+| 6,000 | 3.72 | 3.70 | 40.4 |
+| 10,000 | 3.58 | 3.57 | 35.4 |
+| 12,305 | 3.55 | 3.54 | 34.5 |
+### Tokenizer
+ByteLevel BPE with 2,000 subword tokens, trained on the philosophy corpus. Tokenizer files (`vocab.json`, `merges.txt`) are sourced from the [philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus) dataset.
+### Training Data
+[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus) — 981 source texts (BookCorpus, WikiText-103, PG-19, classical philosophy) processed through a custom text pipeline with deduplication and quality scoring.
+- **Train tokens**: 794.9M (pre-encoded as `train.bin`)
+- **Val tokens**: 88.2M (pre-encoded as `val.bin`)
+- **Sources**: Aristotle, Plato, Cicero, Seneca, Marcus Aurelius, Epictetus, Euclid, Kant, Spinoza, Nietzsche, and more
 ## Files
 | File | Description |
 |---|---|
 | `final.jld2` | Model parameters (JLD2 format, 58MB) |
+| `config.toml` | Architecture config (5m-chinchilla) |
 | `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
 | `merges.txt` | BPE merge rules |
+## Inference API
+The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API with SSE streaming, temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
 ```bash
 # Streaming
   -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
 ```
+### Endpoints
+- `GET /` — Health check and model info
+- `GET /v1/models` — List available models
+- `POST /v1/chat/completions` — Generate text (streaming + non-streaming)
+## Framework
+Built with:
+- [Lux.jl](https://github.com/LuxDL/Lux.jl) — Explicit-parameter neural networks (training)
+- [NNlib.jl](https://github.com/FluxML/NNlib.jl) — Softmax, activations (inference)
+- [Zygote.jl](https://github.com/FluxML/Zygote.jl) — Automatic differentiation (training)
+- [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) — GPU acceleration (training)
 ## Related
+- **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** — Canonical model repo (versioned checkpoints, full docs)
 - **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** — Live inference endpoint
+- **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** — Training dataset + tokenizer
+- **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** — Predecessor (~5K params, character-level, scalar autograd)
 - **[Source code](https://github.com/DavinciDreams/JuliaGPT)** — GitHub repository