--- language: - en license: mit library_name: flux tags: - julia - flux-jl - character-level - philosophy - transformer - gpt-2 - text-generation pipeline_tag: text-generation datasets: - LisaMegaWatts/philosophy-corpus model-index: - name: JuliaGPT-v2 results: - task: type: text-generation name: Text Generation dataset: type: LisaMegaWatts/philosophy-corpus name: philosophy-corpus metrics: - type: loss value: 2.91 name: Val Loss verified: false --- # JuliaGPT-v2 A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture. ## Model Lineage | Model | Params | Architecture | Vocab | Val Loss | |-------|--------|-------------|-------|----------| | [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 | | [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 | | **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** | ## Architecture ``` GPT (GPT-2 style, scaled) +-- wte: Embedding(38 -> 384) +-- wpe: Embedding(256 -> 384) [learned position embeddings] +-- blocks x 6: | +-- attn: CausalSelfAttention | | +-- wq: Dense(384 -> 384) [6 heads, 64 dim each] | | +-- wk: Dense(384 -> 384) | | +-- wv: Dense(384 -> 384) | | +-- wo: Dense(384 -> 384) | +-- ffwd: FeedForward | +-- Dense(384 -> 1536) | +-- Dense(1536 -> 384) +-- lm_head: Dense(384 -> 38) ``` ### Model Details | Parameter | Value | |-----------|-------| | Architecture | GPT-2 style Transformer | | Parameters | ~10M | | Embedding dim | 384 | | Layers | 6 | | Attention heads | 6 | | Head dim | 64 | | Context length | 256 characters | | Vocabulary | 38 characters (a-z, space, punctuation) | | Dropout | 0.1 | | Weight tying | No (separate lm_head) | | Framework | Julia + Flux.jl | ### Vocabulary 38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz`` Character-level tokenization with no BPE — each character is one token. ## Training | | Value | |---|---| | Dataset | Classical philosophy corpus | | Training steps | 14,739 | | Best val loss | 2.91 | | Hardware | NVIDIA RTX 3060 12GB | | Precision | Float32 | ## Inference Settings | Parameter | Value | |-----------|-------| | vocab_size | 38 | | context_length | 256 | | temperature | 0.8 | | top_k | 40 | ## Checkpoint Format JLD2 files containing: - `model_state` — Flux model weights - `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)` - `step` — 14,739 - `best_val_loss` — 2.91 ## Files | File | Description | |------|-------------| | `final_model.jld2` | Final training checkpoint | | `best_model.jld2` | Best validation loss checkpoint | | `checkpoint_latest.jld2` | Latest periodic checkpoint | | `vocab.json` | Character vocabulary (38 chars) | ## Provenance - **Author**: LisaMegaWatts - **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT) ## License MIT