LisaMegaWatts commited on
Commit
286fc72
·
verified ·
1 Parent(s): 2e3ffaf

Add model card for JuliaGPT-v2 (384d/6L char-level model)

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: flux
6
+ tags:
7
+ - julia
8
+ - flux-jl
9
+ - character-level
10
+ - philosophy
11
+ - transformer
12
+ - gpt-2
13
+ - text-generation
14
+ pipeline_tag: text-generation
15
+ datasets:
16
+ - LisaMegaWatts/philosophy-corpus
17
+ model-index:
18
+ - name: JuliaGPT-v2
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ type: LisaMegaWatts/philosophy-corpus
25
+ name: philosophy-corpus
26
+ metrics:
27
+ - type: loss
28
+ value: 2.91
29
+ name: Val Loss
30
+ verified: false
31
+ ---
32
+
33
+ # JuliaGPT-v2
34
+
35
+ A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture.
36
+
37
+ ## Model Lineage
38
+
39
+ | Model | Params | Architecture | Vocab | Val Loss |
40
+ |-------|--------|-------------|-------|----------|
41
+ | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 |
42
+ | [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 |
43
+ | **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** |
44
+
45
+ ## Architecture
46
+
47
+ ```
48
+ GPT (GPT-2 style, scaled)
49
+ +-- wte: Embedding(38 -> 384)
50
+ +-- wpe: Embedding(256 -> 384) [learned position embeddings]
51
+ +-- blocks x 6:
52
+ | +-- attn: CausalSelfAttention
53
+ | | +-- wq: Dense(384 -> 384) [6 heads, 64 dim each]
54
+ | | +-- wk: Dense(384 -> 384)
55
+ | | +-- wv: Dense(384 -> 384)
56
+ | | +-- wo: Dense(384 -> 384)
57
+ | +-- ffwd: FeedForward
58
+ | +-- Dense(384 -> 1536)
59
+ | +-- Dense(1536 -> 384)
60
+ +-- lm_head: Dense(384 -> 38)
61
+ ```
62
+
63
+ ### Model Details
64
+
65
+ | Parameter | Value |
66
+ |-----------|-------|
67
+ | Architecture | GPT-2 style Transformer |
68
+ | Parameters | ~10M |
69
+ | Embedding dim | 384 |
70
+ | Layers | 6 |
71
+ | Attention heads | 6 |
72
+ | Head dim | 64 |
73
+ | Context length | 256 characters |
74
+ | Vocabulary | 38 characters (a-z, space, punctuation) |
75
+ | Dropout | 0.1 |
76
+ | Weight tying | No (separate lm_head) |
77
+ | Framework | Julia + Flux.jl |
78
+
79
+ ### Vocabulary
80
+
81
+ 38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz``
82
+
83
+ Character-level tokenization with no BPE — each character is one token.
84
+
85
+ ## Training
86
+
87
+ | | Value |
88
+ |---|---|
89
+ | Dataset | Classical philosophy corpus |
90
+ | Training steps | 14,739 |
91
+ | Best val loss | 2.91 |
92
+ | Hardware | NVIDIA RTX 3060 12GB |
93
+ | Precision | Float32 |
94
+
95
+ ## Inference Settings
96
+
97
+ | Parameter | Value |
98
+ |-----------|-------|
99
+ | vocab_size | 38 |
100
+ | context_length | 256 |
101
+ | temperature | 0.8 |
102
+ | top_k | 40 |
103
+
104
+ ## Checkpoint Format
105
+
106
+ JLD2 files containing:
107
+ - `model_state` — Flux model weights
108
+ - `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)`
109
+ - `step` — 14,739
110
+ - `best_val_loss` — 2.91
111
+
112
+ ## Files
113
+
114
+ | File | Description |
115
+ |------|-------------|
116
+ | `final_model.jld2` | Final training checkpoint |
117
+ | `best_model.jld2` | Best validation loss checkpoint |
118
+ | `checkpoint_latest.jld2` | Latest periodic checkpoint |
119
+ | `vocab.json` | Character vocabulary (38 chars) |
120
+
121
+ ## Provenance
122
+
123
+ - **Author**: LisaMegaWatts
124
+ - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
125
+
126
+ ## License
127
+
128
+ MIT