--- language: - en license: mit library_name: julia tags: - julia - character-level - philosophy - scalar-autograd - pure-julia - scriptio-continua - text-generation pipeline_tag: text-generation datasets: - LisaMegaWatts/juliagpt-data model-index: - name: JuliaGPT results: - task: type: text-generation name: Text Generation dataset: type: LisaMegaWatts/juliagpt-data name: juliagpt-data metrics: - type: loss value: 2.34 name: Val Loss verified: false --- # JuliaGPT An experimental **8,096 parameter** character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek *scriptio continua*. No external ML framework dependencies. ## Model Lineage | Model | Params | Vocab | Context | Val Loss | Notes | |-------|--------|-------|---------|----------|-------| | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept | | **JuliaGPT** | **8,096** | **29 chars** | **256** | **2.34** | **Expanded context + vocab** | | [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level | ## Architecture | Parameter | Value | |-----------|-------| | Architecture | 1-layer Transformer (pure Julia, scalar autograd) | | Parameters | 8,096 | | Embedding dim | 16 | | Layers | 1 | | Attention heads | 4 | | Head dim | 4 | | FFN hidden dim | 64 | | Context length | 256 characters | | Vocabulary | 29 characters (a-z, space, period, + BOS) | ### Vocabulary 29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS Numerals converted to words, all punctuation removed except period. ## Training | | Value | |---|---| | Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) | | Best val loss | 2.34 | | Framework | Pure Julia (scalar autograd, no Flux/Lux) | ## Files | File | Description | |------|-------------| | `best_model.json` | Original model weights + optimizer state (JSON format, scalar autograd) | | `vocab.json` | 38-character vocabulary array | | `data/aristotle_rhetoric.txt` | Training data | **Note:** The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`. ## Inference Settings | Parameter | Value | |-----------|-------| | vocab_size | 29 | | context_length | 256 | ## Provenance - **Author**: LisaMegaWatts - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT) - **Training data**: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data) ## License MIT