File size: 2,842 Bytes
4e77b21 7795248 a188fdb 7795248 4e77b21 a188fdb 4e77b21 a188fdb 4e77b21 c29660e a188fdb c29660e 7795248 c29660e a188fdb c29660e 7795248 c29660e a188fdb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | ---
language:
- en
license: mit
library_name: julia
tags:
- julia
- character-level
- philosophy
- scalar-autograd
- pure-julia
- scriptio-continua
- text-generation
pipeline_tag: text-generation
datasets:
- LisaMegaWatts/juliagpt-data
model-index:
- name: JuliaGPT
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: LisaMegaWatts/juliagpt-data
name: juliagpt-data
metrics:
- type: loss
value: 2.34
name: Val Loss
verified: false
---
# JuliaGPT
An experimental **8,096 parameter** character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek *scriptio continua*. No external ML framework dependencies.
## Model Lineage
| Model | Params | Vocab | Context | Val Loss | Notes |
|-------|--------|-------|---------|----------|-------|
| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept |
| **JuliaGPT** | **8,096** | **29 chars** | **256** | **2.34** | **Expanded context + vocab** |
| [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level |
## Architecture
| Parameter | Value |
|-----------|-------|
| Architecture | 1-layer Transformer (pure Julia, scalar autograd) |
| Parameters | 8,096 |
| Embedding dim | 16 |
| Layers | 1 |
| Attention heads | 4 |
| Head dim | 4 |
| FFN hidden dim | 64 |
| Context length | 256 characters |
| Vocabulary | 29 characters (a-z, space, period, + BOS) |
### Vocabulary
29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS
Numerals converted to words, all punctuation removed except period.
## Training
| | Value |
|---|---|
| Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) |
| Best val loss | 2.34 |
| Framework | Pure Julia (scalar autograd, no Flux/Lux) |
## Files
| File | Description |
|------|-------------|
| `best_model.json` | Original model weights + optimizer state (JSON format, scalar autograd) |
| `vocab.json` | 38-character vocabulary array |
| `data/aristotle_rhetoric.txt` | Training data |
**Note:** The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`.
## Inference Settings
| Parameter | Value |
|-----------|-------|
| vocab_size | 29 |
| context_length | 256 |
## Provenance
- **Author**: LisaMegaWatts
- **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
- **Training data**: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)
## License
MIT
|