JuliaGPT-v2 / README.md
LisaMegaWatts's picture
Add model card for JuliaGPT-v2 (384d/6L char-level model)
286fc72 verified
metadata
language:
  - en
license: mit
library_name: flux
tags:
  - julia
  - flux-jl
  - character-level
  - philosophy
  - transformer
  - gpt-2
  - text-generation
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: JuliaGPT-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: loss
            value: 2.91
            name: Val Loss
            verified: false

JuliaGPT-v2

A ~10M parameter character-level GPT trained on classical philosophy texts. Scaled-up successor to the original JuliaGPT (8K params), using the same 38-character vocabulary but with a much larger architecture.

Model Lineage

Model Params Architecture Vocab Val Loss
MicroJulia 4,992 1L/16d/4H, block=64 27 chars 2.43
JuliaGPT 8,096 1L/16d/4H, block=256 29 chars 2.34
JuliaGPT-v2 ~10M 6L/384d/6H, block=256 38 chars 2.91

Architecture

GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384)        [learned position embeddings]
+-- blocks x 6:
|   +-- attn: CausalSelfAttention
|   |   +-- wq: Dense(384 -> 384)     [6 heads, 64 dim each]
|   |   +-- wk: Dense(384 -> 384)
|   |   +-- wv: Dense(384 -> 384)
|   |   +-- wo: Dense(384 -> 384)
|   +-- ffwd: FeedForward
|       +-- Dense(384 -> 1536)
|       +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)

Model Details

Parameter Value
Architecture GPT-2 style Transformer
Parameters ~10M
Embedding dim 384
Layers 6
Attention heads 6
Head dim 64
Context length 256 characters
Vocabulary 38 characters (a-z, space, punctuation)
Dropout 0.1
Weight tying No (separate lm_head)
Framework Julia + Flux.jl

Vocabulary

38 characters: !"'(),-.:;?abcdefghijklmnopqrstuvwxyz

Character-level tokenization with no BPE — each character is one token.

Training

Value
Dataset Classical philosophy corpus
Training steps 14,739
Best val loss 2.91
Hardware NVIDIA RTX 3060 12GB
Precision Float32

Inference Settings

Parameter Value
vocab_size 38
context_length 256
temperature 0.8
top_k 40

Checkpoint Format

JLD2 files containing:

  • model_state — Flux model weights
  • hyperparamsDict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)
  • step — 14,739
  • best_val_loss — 2.91

Files

File Description
final_model.jld2 Final training checkpoint
best_model.jld2 Best validation loss checkpoint
checkpoint_latest.jld2 Latest periodic checkpoint
vocab.json Character vocabulary (38 chars)

Provenance

License

MIT