LisaMegaWatts commited on
Commit
a90b172
Β·
verified Β·
1 Parent(s): 98da2a6

Update model card with full architecture, training details, and related links

Browse files
Files changed (1) hide show
  1. README.md +85 -14
README.md CHANGED
@@ -20,39 +20,98 @@ datasets:
20
  - LisaMegaWatts/philosophy-corpus
21
  ---
22
 
23
- # JuliaSLM β€” Inference Artifacts
24
 
25
- Serving-ready artifacts for the 5M parameter JuliaSLM transformer, packaged for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM). This repo contains the checkpoint, tokenizer, and config needed by the OpenAI-compatible inference server.
26
 
27
- For full model documentation, training details, loss curves, and usage instructions, see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
28
 
29
  ## Model Summary
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  | Component | Detail |
32
  |---|---|
33
  | Parameters | 5,037,312 |
34
- | Architecture | Decoder-only Transformer (RoPE, RMSNorm, SwiGLU) |
35
  | Embedding dim | 256 |
36
  | Layers | 6 |
37
  | Attention heads | 4 (head dim 64) |
 
38
  | Context length | 256 tokens |
39
- | Tokenizer | BPE, 2000 subword tokens |
 
40
  | Weight tying | Yes |
41
- | Training | Chinchilla-optimal (~100M tokens), AdamW, F16 mixed precision |
42
- | Final val loss | 3.54 (PPL 34.5) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Files
45
 
46
  | File | Description |
47
  |---|---|
48
  | `final.jld2` | Model parameters (JLD2 format, 58MB) |
49
- | `config.toml` | Architecture config (from 5m-chinchilla) |
50
  | `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
51
  | `merges.txt` | BPE merge rules |
52
 
53
- ## Inference
54
 
55
- The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API:
56
 
57
  ```bash
58
  # Streaming
@@ -66,12 +125,24 @@ curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
66
  -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
67
  ```
68
 
69
- Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ## Related
72
 
73
- - **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** β€” Canonical model repo with full training details, loss curves, architecture diagrams, and code examples
74
  - **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** β€” Live inference endpoint
75
- - **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** β€” Training dataset
76
- - **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** β€” Predecessor (~5K params, character-level)
77
  - **[Source code](https://github.com/DavinciDreams/JuliaGPT)** β€” GitHub repository
 
20
  - LisaMegaWatts/philosophy-corpus
21
  ---
22
 
23
+ # JuliaSLM β€” Inference Server Artifacts
24
 
25
+ Serving-ready artifacts for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM), an OpenAI-compatible inference endpoint for the 5M parameter JuliaSLM transformer.
26
 
27
+ For full training details, loss curves, architecture diagrams, and code examples see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
28
 
29
  ## Model Summary
30
 
31
+ A 5,037,312 parameter decoder-only transformer trained to Chinchilla-optimal (100M tokens at 20 tokens/param) on classical philosophy and liberal arts texts.
32
+
33
+ ### Architecture
34
+
35
+ ```
36
+ JuliaGPTModel
37
+ β”œβ”€β”€ tok_emb: Embedding(2000 β†’ 256) # weight-tied with output head
38
+ β”œβ”€β”€ rope: RotaryPositionalEncoding(64)
39
+ β”œβ”€β”€ blocks Γ— 6:
40
+ β”‚ β”œβ”€β”€ ln1: RMSNorm(256)
41
+ β”‚ β”œβ”€β”€ attn: MultiHeadAttention(4 heads, 64 dim each)
42
+ β”‚ β”‚ β”œβ”€β”€ wq, wk, wv: Dense(256 β†’ 256)
43
+ β”‚ β”‚ └── wo: Dense(256 β†’ 256)
44
+ β”‚ β”œβ”€β”€ ln2: RMSNorm(256)
45
+ β”‚ └── ffn: SwiGLU(256 β†’ 1024 β†’ 256)
46
+ β”‚ β”œβ”€β”€ w1: Dense(256 β†’ 1024) # gate
47
+ β”‚ β”œβ”€β”€ v: Dense(256 β†’ 1024) # value
48
+ β”‚ └── w2: Dense(1024 β†’ 256) # down-project
49
+ β”œβ”€β”€ ln_f: RMSNorm(256)
50
+ └── head: TiedEmbeddingHead β†’ (2000,) # shares tok_emb weights
51
+ ```
52
+
53
  | Component | Detail |
54
  |---|---|
55
  | Parameters | 5,037,312 |
 
56
  | Embedding dim | 256 |
57
  | Layers | 6 |
58
  | Attention heads | 4 (head dim 64) |
59
+ | FFN multiplier | 4x (SwiGLU, hidden 1024) |
60
  | Context length | 256 tokens |
61
+ | Positional encoding | Rotary (RoPE) |
62
+ | Normalization | RMSNorm (pre-norm) |
63
  | Weight tying | Yes |
64
+ | Bias | None |
65
+
66
+ ### Training
67
+
68
+ | Metric | Value |
69
+ |---|---|
70
+ | Optimizer | AdamW (lr=6e-4, min_lr=6e-5, wd=0.1) |
71
+ | Schedule | Cosine decay with 500-step warmup |
72
+ | Precision | Mixed F16/F32 |
73
+ | Batch size | 32 |
74
+ | Training steps | 12,305 |
75
+ | Tokens processed | ~100M |
76
+ | Training time | 66 min on RTX 3060 12GB |
77
+ | Throughput | ~26K tok/s |
78
+ | Final val loss | 3.54 |
79
+ | Final val PPL | 34.5 |
80
+
81
+ ### Loss Curve
82
+
83
+ | Step | Train Loss | Val Loss | Val PPL |
84
+ |------|-----------|----------|---------|
85
+ | 500 | 6.69 | 5.01 | 149.6 |
86
+ | 2,000 | 4.09 | 4.02 | 56.0 |
87
+ | 6,000 | 3.72 | 3.70 | 40.4 |
88
+ | 10,000 | 3.58 | 3.57 | 35.4 |
89
+ | 12,305 | 3.55 | 3.54 | 34.5 |
90
+
91
+ ### Tokenizer
92
+
93
+ ByteLevel BPE with 2,000 subword tokens, trained on the philosophy corpus. Tokenizer files (`vocab.json`, `merges.txt`) are sourced from the [philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus) dataset.
94
+
95
+ ### Training Data
96
+
97
+ [LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus) β€” 981 source texts (BookCorpus, WikiText-103, PG-19, classical philosophy) processed through a custom text pipeline with deduplication and quality scoring.
98
+
99
+ - **Train tokens**: 794.9M (pre-encoded as `train.bin`)
100
+ - **Val tokens**: 88.2M (pre-encoded as `val.bin`)
101
+ - **Sources**: Aristotle, Plato, Cicero, Seneca, Marcus Aurelius, Epictetus, Euclid, Kant, Spinoza, Nietzsche, and more
102
 
103
  ## Files
104
 
105
  | File | Description |
106
  |---|---|
107
  | `final.jld2` | Model parameters (JLD2 format, 58MB) |
108
+ | `config.toml` | Architecture config (5m-chinchilla) |
109
  | `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
110
  | `merges.txt` | BPE merge rules |
111
 
112
+ ## Inference API
113
 
114
+ The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API with SSE streaming, temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
115
 
116
  ```bash
117
  # Streaming
 
125
  -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
126
  ```
127
 
128
+ ### Endpoints
129
+
130
+ - `GET /` β€” Health check and model info
131
+ - `GET /v1/models` β€” List available models
132
+ - `POST /v1/chat/completions` β€” Generate text (streaming + non-streaming)
133
+
134
+ ## Framework
135
+
136
+ Built with:
137
+ - [Lux.jl](https://github.com/LuxDL/Lux.jl) β€” Explicit-parameter neural networks (training)
138
+ - [NNlib.jl](https://github.com/FluxML/NNlib.jl) β€” Softmax, activations (inference)
139
+ - [Zygote.jl](https://github.com/FluxML/Zygote.jl) β€” Automatic differentiation (training)
140
+ - [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) β€” GPU acceleration (training)
141
 
142
  ## Related
143
 
144
+ - **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** β€” Canonical model repo (versioned checkpoints, full docs)
145
  - **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** β€” Live inference endpoint
146
+ - **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** β€” Training dataset + tokenizer
147
+ - **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** β€” Predecessor (~5K params, character-level, scalar autograd)
148
  - **[Source code](https://github.com/DavinciDreams/JuliaGPT)** β€” GitHub repository