LisaMegaWatts commited on
Commit
a188fdb
·
verified ·
1 Parent(s): 27163bf

Fix model card: document original 8K-param model in best_model.json, note jld2 files are v2

Browse files
Files changed (1) hide show
  1. README.md +83 -24
README.md CHANGED
@@ -1,39 +1,98 @@
1
  ---
2
  language:
3
- - en
 
4
  library_name: julia
5
- pipeline_tag: text-generation
6
  tags:
7
- - character-level
8
- - philosophy
9
- - mathematics
10
- - julia
11
- - scalar-autograd
12
- - pure-julia
13
- - scriptio-continua
14
- - reduced-vocab
15
  datasets:
16
- - LisaMegaWatts/juliagpt-data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
 
19
  # JuliaGPT
20
 
21
- An experimental character-level GPT in pure Julia exploring minimal vocabularies inspired by ancient Greek *scriptio continua*. Built with scalar autograd, no external ML dependencies.
 
 
 
 
 
 
 
 
22
 
23
  ## Architecture
24
- - 1 transformer layer, 4 attention heads
25
- - n_embd=16, block_size=256
26
- - RMSNorm, ReLU, KV cache for causal masking
27
- - Adam optimizer with linear LR decay
28
- - ~5K parameters
29
 
30
- ## Vocabulary
31
- 28 characters (a-z + space + period) + BOS = 29 vocab. Numerals converted to words, all punctuation removed except period.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## Training
34
- - **Dataset:** Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
35
- - **Current checkpoint:** step 650, val_loss=2.3414
36
 
37
- ## Links
38
- - [Training data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)
39
- - [Source code](https://github.com/DavinciDreams/JuliaGPT)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - en
4
+ license: mit
5
  library_name: julia
 
6
  tags:
7
+ - julia
8
+ - character-level
9
+ - philosophy
10
+ - scalar-autograd
11
+ - pure-julia
12
+ - scriptio-continua
13
+ - text-generation
14
+ pipeline_tag: text-generation
15
  datasets:
16
+ - LisaMegaWatts/juliagpt-data
17
+ model-index:
18
+ - name: JuliaGPT
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ type: LisaMegaWatts/juliagpt-data
25
+ name: juliagpt-data
26
+ metrics:
27
+ - type: loss
28
+ value: 2.34
29
+ name: Val Loss
30
+ verified: false
31
  ---
32
 
33
  # JuliaGPT
34
 
35
+ An experimental **8,096 parameter** character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek *scriptio continua*. No external ML framework dependencies.
36
+
37
+ ## Model Lineage
38
+
39
+ | Model | Params | Vocab | Context | Val Loss | Notes |
40
+ |-------|--------|-------|---------|----------|-------|
41
+ | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 27 chars | 64 | 2.43 | First proof-of-concept |
42
+ | **JuliaGPT** | **8,096** | **29 chars** | **256** | **2.34** | **Expanded context + vocab** |
43
+ | [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) | ~10M | 38 chars | 256 | 2.91 | Scaled-up char-level |
44
 
45
  ## Architecture
 
 
 
 
 
46
 
47
+ | Parameter | Value |
48
+ |-----------|-------|
49
+ | Architecture | 1-layer Transformer (pure Julia, scalar autograd) |
50
+ | Parameters | 8,096 |
51
+ | Embedding dim | 16 |
52
+ | Layers | 1 |
53
+ | Attention heads | 4 |
54
+ | Head dim | 4 |
55
+ | FFN hidden dim | 64 |
56
+ | Context length | 256 characters |
57
+ | Vocabulary | 29 characters (a-z, space, period, + BOS) |
58
+
59
+ ### Vocabulary
60
+
61
+ 29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS
62
+
63
+ Numerals converted to words, all punctuation removed except period.
64
 
65
  ## Training
 
 
66
 
67
+ | | Value |
68
+ |---|---|
69
+ | Dataset | Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) |
70
+ | Best val loss | 2.34 |
71
+ | Framework | Pure Julia (scalar autograd, no Flux/Lux) |
72
+
73
+ ## Files
74
+
75
+ | File | Description |
76
+ |------|-------------|
77
+ | `best_model.json` | Original model weights + optimizer state (JSON format, scalar autograd) |
78
+ | `vocab.json` | 38-character vocabulary array |
79
+ | `data/aristotle_rhetoric.txt` | Training data |
80
+
81
+ **Note:** The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`.
82
+
83
+ ## Inference Settings
84
+
85
+ | Parameter | Value |
86
+ |-----------|-------|
87
+ | vocab_size | 29 |
88
+ | context_length | 256 |
89
+
90
+ ## Provenance
91
+
92
+ - **Author**: LisaMegaWatts
93
+ - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
94
+ - **Training data**: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)
95
+
96
+ ## License
97
+
98
+ MIT