| | --- |
| | language: |
| | - en |
| | license: mit |
| | library_name: flux |
| | tags: |
| | - julia |
| | - flux-jl |
| | - character-level |
| | - philosophy |
| | - transformer |
| | - gpt-2 |
| | - text-generation |
| | pipeline_tag: text-generation |
| | datasets: |
| | - LisaMegaWatts/philosophy-corpus |
| | model-index: |
| | - name: JuliaGPT-v2 |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | type: LisaMegaWatts/philosophy-corpus |
| | name: philosophy-corpus |
| | metrics: |
| | - type: loss |
| | value: 2.91 |
| | name: Val Loss |
| | verified: false |
| | --- |
| | |
| | # JuliaGPT-v2 |
| |
|
| | A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture. |
| |
|
| | ## Model Lineage |
| |
|
| | | Model | Params | Architecture | Vocab | Val Loss | |
| | |-------|--------|-------------|-------|----------| |
| | | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 | |
| | | [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 | |
| | | **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** | |
| |
|
| | ## Architecture |
| |
|
| | ``` |
| | GPT (GPT-2 style, scaled) |
| | +-- wte: Embedding(38 -> 384) |
| | +-- wpe: Embedding(256 -> 384) [learned position embeddings] |
| | +-- blocks x 6: |
| | | +-- attn: CausalSelfAttention |
| | | | +-- wq: Dense(384 -> 384) [6 heads, 64 dim each] |
| | | | +-- wk: Dense(384 -> 384) |
| | | | +-- wv: Dense(384 -> 384) |
| | | | +-- wo: Dense(384 -> 384) |
| | | +-- ffwd: FeedForward |
| | | +-- Dense(384 -> 1536) |
| | | +-- Dense(1536 -> 384) |
| | +-- lm_head: Dense(384 -> 38) |
| | ``` |
| |
|
| | ### Model Details |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Architecture | GPT-2 style Transformer | |
| | | Parameters | ~10M | |
| | | Embedding dim | 384 | |
| | | Layers | 6 | |
| | | Attention heads | 6 | |
| | | Head dim | 64 | |
| | | Context length | 256 characters | |
| | | Vocabulary | 38 characters (a-z, space, punctuation) | |
| | | Dropout | 0.1 | |
| | | Weight tying | No (separate lm_head) | |
| | | Framework | Julia + Flux.jl | |
| | |
| | ### Vocabulary |
| | |
| | 38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz`` |
| | |
| | Character-level tokenization with no BPE — each character is one token. |
| | |
| | ## Training |
| | |
| | | | Value | |
| | |---|---| |
| | | Dataset | Classical philosophy corpus | |
| | | Training steps | 14,739 | |
| | | Best val loss | 2.91 | |
| | | Hardware | NVIDIA RTX 3060 12GB | |
| | | Precision | Float32 | |
| | |
| | ## Inference Settings |
| | |
| | | Parameter | Value | |
| | |-----------|-------| |
| | | vocab_size | 38 | |
| | | context_length | 256 | |
| | | temperature | 0.8 | |
| | | top_k | 40 | |
| |
|
| | ## Checkpoint Format |
| |
|
| | JLD2 files containing: |
| | - `model_state` — Flux model weights |
| | - `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)` |
| | - `step` — 14,739 |
| | - `best_val_loss` — 2.91 |
| |
|
| | ## Files |
| |
|
| | | File | Description | |
| | |------|-------------| |
| | | `final_model.jld2` | Final training checkpoint | |
| | | `best_model.jld2` | Best validation loss checkpoint | |
| | | `checkpoint_latest.jld2` | Latest periodic checkpoint | |
| | | `vocab.json` | Character vocabulary (38 chars) | |
| |
|
| | ## Provenance |
| |
|
| | - **Author**: LisaMegaWatts |
| | - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT) |
| |
|
| | ## License |
| |
|
| | MIT |
| |
|