JuliaGPT-v2 / README.md
LisaMegaWatts's picture
Add model card for JuliaGPT-v2 (384d/6L char-level model)
286fc72 verified
---
language:
- en
license: mit
library_name: flux
tags:
- julia
- flux-jl
- character-level
- philosophy
- transformer
- gpt-2
- text-generation
pipeline_tag: text-generation
datasets:
- LisaMegaWatts/philosophy-corpus
model-index:
- name: JuliaGPT-v2
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: LisaMegaWatts/philosophy-corpus
name: philosophy-corpus
metrics:
- type: loss
value: 2.91
name: Val Loss
verified: false
---
# JuliaGPT-v2
A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture.
## Model Lineage
| Model | Params | Architecture | Vocab | Val Loss |
|-------|--------|-------------|-------|----------|
| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 |
| [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 |
| **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** |
## Architecture
```
GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384) [learned position embeddings]
+-- blocks x 6:
| +-- attn: CausalSelfAttention
| | +-- wq: Dense(384 -> 384) [6 heads, 64 dim each]
| | +-- wk: Dense(384 -> 384)
| | +-- wv: Dense(384 -> 384)
| | +-- wo: Dense(384 -> 384)
| +-- ffwd: FeedForward
| +-- Dense(384 -> 1536)
| +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)
```
### Model Details
| Parameter | Value |
|-----------|-------|
| Architecture | GPT-2 style Transformer |
| Parameters | ~10M |
| Embedding dim | 384 |
| Layers | 6 |
| Attention heads | 6 |
| Head dim | 64 |
| Context length | 256 characters |
| Vocabulary | 38 characters (a-z, space, punctuation) |
| Dropout | 0.1 |
| Weight tying | No (separate lm_head) |
| Framework | Julia + Flux.jl |
### Vocabulary
38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz``
Character-level tokenization with no BPE — each character is one token.
## Training
| | Value |
|---|---|
| Dataset | Classical philosophy corpus |
| Training steps | 14,739 |
| Best val loss | 2.91 |
| Hardware | NVIDIA RTX 3060 12GB |
| Precision | Float32 |
## Inference Settings
| Parameter | Value |
|-----------|-------|
| vocab_size | 38 |
| context_length | 256 |
| temperature | 0.8 |
| top_k | 40 |
## Checkpoint Format
JLD2 files containing:
- `model_state` — Flux model weights
- `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)`
- `step` — 14,739
- `best_val_loss` — 2.91
## Files
| File | Description |
|------|-------------|
| `final_model.jld2` | Final training checkpoint |
| `best_model.jld2` | Best validation loss checkpoint |
| `checkpoint_latest.jld2` | Latest periodic checkpoint |
| `vocab.json` | Character vocabulary (38 chars) |
## Provenance
- **Author**: LisaMegaWatts
- **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
## License
MIT