LisaMegaWatts
/

JuliaGPT-v2

+---
+language:
+  - en
+license: mit
+library_name: flux
+tags:
+  - julia
+  - flux-jl
+  - character-level
+  - philosophy
+  - transformer
+  - gpt-2
+  - text-generation
+pipeline_tag: text-generation
+datasets:
+  - LisaMegaWatts/philosophy-corpus
+model-index:
+  - name: JuliaGPT-v2
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: LisaMegaWatts/philosophy-corpus
+          name: philosophy-corpus
+        metrics:
+          - type: loss
+            value: 2.91
+            name: Val Loss
+            verified: false
+---
+# JuliaGPT-v2
+A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture.
+## Model Lineage
+| Model | Params | Architecture | Vocab | Val Loss |
+|-------|--------|-------------|-------|----------|
+| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 |
+| [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 |
+| **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** |
+## Architecture
+```
+GPT (GPT-2 style, scaled)
++-- wte: Embedding(38 -> 384)
++-- wpe: Embedding(256 -> 384)        [learned position embeddings]
++-- blocks x 6:
+|   +-- attn: CausalSelfAttention
+|   |   +-- wq: Dense(384 -> 384)     [6 heads, 64 dim each]
+|   |   +-- wk: Dense(384 -> 384)
+|   |   +-- wv: Dense(384 -> 384)
+|   |   +-- wo: Dense(384 -> 384)
+|   +-- ffwd: FeedForward
+|       +-- Dense(384 -> 1536)
+|       +-- Dense(1536 -> 384)
++-- lm_head: Dense(384 -> 38)
+```
+### Model Details
+| Parameter | Value |
+|-----------|-------|
+| Architecture | GPT-2 style Transformer |
+| Parameters | ~10M |
+| Embedding dim | 384 |
+| Layers | 6 |
+| Attention heads | 6 |
+| Head dim | 64 |
+| Context length | 256 characters |
+| Vocabulary | 38 characters (a-z, space, punctuation) |
+| Dropout | 0.1 |
+| Weight tying | No (separate lm_head) |
+| Framework | Julia + Flux.jl |
+### Vocabulary
+38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz``
+Character-level tokenization with no BPE — each character is one token.
+## Training
+| | Value |
+|---|---|
+| Dataset | Classical philosophy corpus |
+| Training steps | 14,739 |
+| Best val loss | 2.91 |
+| Hardware | NVIDIA RTX 3060 12GB |
+| Precision | Float32 |
+## Inference Settings
+| Parameter | Value |
+|-----------|-------|
+| vocab_size | 38 |
+| context_length | 256 |
+| temperature | 0.8 |
+| top_k | 40 |
+## Checkpoint Format
+JLD2 files containing:
+- `model_state` — Flux model weights
+- `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)`
+- `step` — 14,739
+- `best_val_loss` — 2.91
+## Files
+| File | Description |
+|------|-------------|
+| `final_model.jld2` | Final training checkpoint |
+| `best_model.jld2` | Best validation loss checkpoint |
+| `checkpoint_latest.jld2` | Latest periodic checkpoint |
+| `vocab.json` | Character vocabulary (38 chars) |
+## Provenance
+- **Author**: LisaMegaWatts
+- **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
+## License
+MIT