File size: 5,640 Bytes
c250c9f 0ea0737 c250c9f 0ea0737 c250c9f 0ea0737 c250c9f 0ea0737 c250c9f 0ea0737 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | ---
language:
- en
license: mit
library_name: flux
tags:
- julia
- flux-jl
- gpt-2
- character-level
- philosophy
- transformer
- text-generation
- layernorm
- gelu
- learned-position-embeddings
pipeline_tag: text-generation
---
# MicroJulia
A GPT-2 style character-level transformer trained on classical philosophy texts, implemented in Julia with Flux.jl. The **first model** in the Julia SLM lineage — a minimal proof-of-concept that established the training and serving infrastructure.
## Model Family Context
MicroJulia is the starting point of an architectural progression:
| Model | Generation | Architecture | Tokenizer | Framework |
|---|---|---|---|---|
| **MicroJulia** | **1st** | **GPT-2 (LayerNorm, GELU, learned pos)** | **Character-level** | **Flux.jl** |
| [JuliaFluxGPT](https://huggingface.co/LisaMegaWatts/JuliaFluxGPT) | 2nd | LLaMA-style (RMSNorm, SwiGLU, RoPE, GQA) | BPE 2000 | Flux.jl |
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | 3rd | Modern Transformer (RMSNorm, SwiGLU, RoPE) | BPE 2000 | Lux.jl |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | 3rd | Monarch Mixer (sub-quadratic) | BPE 2000 | Lux.jl |
| [SymbioSLM](https://huggingface.co/LisaMegaWatts/SymbioSLM) | 3rd | Symbiogenesis (3 organelles) | BPE 2000 | Lux.jl |
## Architecture
Classic GPT-2 design — deliberately minimal:
```
GPT (GPT-2 style)
+-- wte: Embedding(vocab_size -> n_embd) [token embeddings]
+-- wpe: Embedding(block_size -> n_embd) [learned position embeddings]
+-- drop: Dropout
+-- blocks x N:
| +-- ln1: LayerNorm(n_embd)
| +-- attn: CausalSelfAttention
| | +-- qkv: Dense(n_embd -> 3*n_embd) [fused Q/K/V projection]
| | +-- proj: Dense(n_embd -> n_embd)
| +-- ln2: LayerNorm(n_embd)
| +-- ffwd: FeedForward
| +-- Dense(n_embd -> 4*n_embd)
| +-- GELU
| +-- Dense(4*n_embd -> n_embd)
+-- ln_f: LayerNorm(n_embd)
+-- lm_head: Dense(n_embd -> vocab_size)
```
### Key Design Choices (GPT-2 era)
| Component | MicroJulia (GPT-2) | Later Models (LLaMA-style) |
|---|---|---|
| Normalization | LayerNorm (with bias) | RMSNorm (no bias) |
| Activation | GELU | SwiGLU |
| Position encoding | Learned embeddings | RoPE |
| QKV projection | Fused single Dense | Separate Q, K, V |
| FFN | Standard 4x expansion | SwiGLU 2/3 adjusted |
| Output head | Separate lm_head | Weight-tied with embedding |
| Tokenizer | Character-level (~28 chars) | BPE (2000 tokens) |
### Character-Level Tokenization
Uses a minimal character vocabulary:
```
a-z, space, period (28 characters)
```
Each character maps directly to a token ID. No subword segmentation — the model must learn word boundaries, morphology, and syntax from individual characters.
**Trade-offs:**
- Simpler tokenizer implementation
- No OOV (out-of-vocabulary) issues
- Model must spend capacity on character-level patterns
- Less efficient than BPE for the same context window
## Model Details
| Parameter | Value |
|---|---|
| Architecture | GPT-2 style (pre-norm Transformer) |
| Tokenizer | Character-level (~28 characters) |
| Position encoding | Learned position embeddings |
| Normalization | LayerNorm |
| Activation | GELU |
| Output projection | Separate Dense (not weight-tied) |
| Framework | Julia + Flux.jl |
Exact dimensions (vocab_size, n_embd, n_layer, n_head, block_size) are stored in the checkpoint `hyperparams` dict and loaded dynamically.
## Training
| | Value |
|---|---|
| Dataset | Classical philosophy texts |
| Tokenizer | Character-level mapping |
| Framework | Julia + Flux.jl |
| Hardware | Google Colab / NVIDIA GPU |
| Precision | Float32 |
## Implementation Notes
### Causal Masking
Uses a pre-computed additive upper-triangular mask (global constant):
```julia
CAUSAL_MASK = triu(fill(-Inf32, block_size, block_size), 1)
```
Applied to attention scores before softmax.
### Position Embeddings
Learned absolute position embeddings (not RoPE):
```julia
tok = wte(token_ids) # (C, T, B)
pos = wpe(1:T) # (C, T, 1) broadcast to batch
x = tok .+ pos
```
Limited to the trained block_size — no length extrapolation.
## Usage
### OpenAI-Compatible API
Served via [MicroJulia Space](https://huggingface.co/spaces/LisaMegaWatts/MicroJulia):
```bash
curl -X POST https://lisamegawatts-microjulia.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "hello"}],
"stream": true
}'
```
## Files
| File | Description |
|---|---|
| `checkpoint.jld2` | Trained model weights + hyperparams (JLD2 format) |
| `vocab.json` | Character vocabulary mapping |
Checkpoint contains:
- `model_state` — Flux model weights
- `hyperparams` — Dict with vocab_size, n_embd, block_size, n_layer, n_head
- `step` — Training step
- `best_val_loss` — Best validation loss
## Provenance
- **Author**: LisaMegaWatts
- **Repository**: [DavinciDreams/micro-julia](https://github.com/DavinciDreams/micro-julia)
- **Training date**: February 2026
- **Architecture reference**: GPT-2 (Radford et al., 2019), nanoGPT (Karpathy, 2023)
- **Lineage**: Evolved into [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (custom autograd) and the Lux.jl model family
## References
- Radford, A., et al. (2019). Language Models are Unsupervised Multitask Learners (GPT-2).
- Karpathy, A. (2023). nanoGPT. GitHub repository.
## Citation
```bibtex
@misc{microjulia2026,
title={MicroJulia: A Minimal Character-Level GPT in Julia},
author={LisaMegaWatts},
year={2026},
url={https://huggingface.co/LisaMegaWatts/MicroJulia}
}
```
## License
MIT
|