LisaMegaWatts
/

JuliaGPT

@@ -1,39 +1,98 @@
 ---
 language:
-- en
 library_name: julia
-pipeline_tag: text-generation
 tags:
-- character-level
-- philosophy
-- mathematics
-- julia
-- scalar-autograd
-- pure-julia
-- scriptio-continua
-- reduced-vocab
 datasets:
-- LisaMegaWatts/juliagpt-data
 ---
 # JuliaGPT
-An experimental character-level GPT in pure Julia exploring minimal vocabularies inspired by ancient Greek *scriptio continua*. Built with scalar autograd, no external ML dependencies.
 ## Architecture
-- 1 transformer layer, 4 attention heads
-- n_embd=16, block_size=256
-- RMSNorm, ReLU, KV cache for causal masking
-- Adam optimizer with linear LR decay
-- ~5K parameters
-## Vocabulary
-28 characters (a-z + space + period) + BOS = 29 vocab. Numerals converted to words, all punctuation removed except period.
 ## Training
-- **Dataset:** Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
-- **Current checkpoint:** step 650, val_loss=2.3414
-## Links
-- [Training data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)
-- [Source code](https://github.com/DavinciDreams/JuliaGPT)

 ---
 language:
+  - en
+license: mit
 library_name: julia
 tags:
+  - julia
+  - character-level
+  - philosophy
+  - scalar-autograd
+  - pure-julia
+  - scriptio-continua
+  - text-generation
+pipeline_tag: text-generation
 datasets:
+  - LisaMegaWatts/juliagpt-data
+model-index:
+  - name: JuliaGPT
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: LisaMegaWatts/juliagpt-data
+          name: juliagpt-data
+        metrics:
+          - type: loss
+            value: 2.34
+            name: Val Loss
+            verified: false
 ---
 # JuliaGPT
+An experimental **8,096 parameter** character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek *scriptio continua*. No external ML framework dependencies.
+## Model Lineage
+| Model | Params | Vocab | Context | Val Loss | Notes |
+|-------|--------|-------|---------|----------|-------|
+| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept |
+| **JuliaGPT** | **8,096** | **29 chars** | **256** | **2.34** | **Expanded context + vocab** |
+| [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level |
 ## Architecture
+| Parameter | Value |
+|-----------|-------|
+| Architecture | 1-layer Transformer (pure Julia, scalar autograd) |
+| Parameters | 8,096 |
+| Embedding dim | 16 |
+| Layers | 1 |
+| Attention heads | 4 |
+| Head dim | 4 |
+| FFN hidden dim | 64 |
+| Context length | 256 characters |
+| Vocabulary | 29 characters (a-z, space, period, + BOS) |
+### Vocabulary
+29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS
+Numerals converted to words, all punctuation removed except period.
 ## Training
+| | Value |
+|---|---|
+| Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) |
+| Best val loss | 2.34 |
+| Framework | Pure Julia (scalar autograd, no Flux/Lux) |
+## Files
+| File | Description |
+|------|-------------|
+| `best_model.json` | Original model weights + optimizer state (JSON format, scalar autograd) |
+| `vocab.json` | 38-character vocabulary array |
+| `data/aristotle_rhetoric.txt` | Training data |
+**Note:** The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`.
+## Inference Settings
+| Parameter | Value |
+|-----------|-------|
+| vocab_size | 29 |
+| context_length | 256 |
+## Provenance
+- **Author**: LisaMegaWatts
+- **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
+- **Training data**: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)
+## License
+MIT