Fix model card: document original 8K-param model in best_model.json, note jld2 files are v2
a188fdb verified | language: | |
| - en | |
| license: mit | |
| library_name: julia | |
| tags: | |
| - julia | |
| - character-level | |
| - philosophy | |
| - scalar-autograd | |
| - pure-julia | |
| - scriptio-continua | |
| - text-generation | |
| pipeline_tag: text-generation | |
| datasets: | |
| - LisaMegaWatts/juliagpt-data | |
| model-index: | |
| - name: JuliaGPT | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Text Generation | |
| dataset: | |
| type: LisaMegaWatts/juliagpt-data | |
| name: juliagpt-data | |
| metrics: | |
| - type: loss | |
| value: 2.34 | |
| name: Val Loss | |
| verified: false | |
| # JuliaGPT | |
| An experimental **8,096 parameter** character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek *scriptio continua*. No external ML framework dependencies. | |
| ## Model Lineage | |
| | Model | Params | Vocab | Context | Val Loss | Notes | | |
| |-------|--------|-------|---------|----------|-------| | |
| | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept | | |
| | **JuliaGPT** | **8,096** | **29 chars** | **256** | **2.34** | **Expanded context + vocab** | | |
| | [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level | | |
| ## Architecture | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Architecture | 1-layer Transformer (pure Julia, scalar autograd) | | |
| | Parameters | 8,096 | | |
| | Embedding dim | 16 | | |
| | Layers | 1 | | |
| | Attention heads | 4 | | |
| | Head dim | 4 | | |
| | FFN hidden dim | 64 | | |
| | Context length | 256 characters | | |
| | Vocabulary | 29 characters (a-z, space, period, + BOS) | | |
| ### Vocabulary | |
| 29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS | |
| Numerals converted to words, all punctuation removed except period. | |
| ## Training | |
| | | Value | | |
| |---|---| | |
| | Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) | | |
| | Best val loss | 2.34 | | |
| | Framework | Pure Julia (scalar autograd, no Flux/Lux) | | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `best_model.json` | Original model weights + optimizer state (JSON format, scalar autograd) | | |
| | `vocab.json` | 38-character vocabulary array | | |
| | `data/aristotle_rhetoric.txt` | Training data | | |
| **Note:** The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`. | |
| ## Inference Settings | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | vocab_size | 29 | | |
| | context_length | 256 | | |
| ## Provenance | |
| - **Author**: LisaMegaWatts | |
| - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT) | |
| - **Training data**: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data) | |
| ## License | |
| MIT | |