Add model card for JuliaGPT-v2 (384d/6L char-level model)

286fc72 verified 4 days ago

3.34 kB

language:
  - en
license: mit
library_name: flux
tags:
  - julia
  - flux-jl
  - character-level
  - philosophy
  - transformer
  - gpt-2
  - text-generation
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: JuliaGPT-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: loss
            value: 2.91
            name: Val Loss
            verified: false

JuliaGPT-v2

A ~10M parameter character-level GPT trained on classical philosophy texts. Scaled-up successor to the original JuliaGPT (8K params), using the same 38-character vocabulary but with a much larger architecture.

Model Lineage

Model	Params	Architecture	Vocab	Val Loss
MicroJulia	4,992	1L/16d/4H, block=64	27 chars	2.43
JuliaGPT	8,096	1L/16d/4H, block=256	29 chars	2.34
JuliaGPT-v2	~10M	6L/384d/6H, block=256	38 chars	2.91

Architecture

GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384)        [learned position embeddings]
+-- blocks x 6:
|   +-- attn: CausalSelfAttention
|   |   +-- wq: Dense(384 -> 384)     [6 heads, 64 dim each]
|   |   +-- wk: Dense(384 -> 384)
|   |   +-- wv: Dense(384 -> 384)
|   |   +-- wo: Dense(384 -> 384)
|   +-- ffwd: FeedForward
|       +-- Dense(384 -> 1536)
|       +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)

Model Details

Parameter	Value
Architecture	GPT-2 style Transformer
Parameters	~10M
Embedding dim	384
Layers	6
Attention heads	6
Head dim	64
Context length	256 characters
Vocabulary	38 characters (a-z, space, punctuation)
Dropout	0.1
Weight tying	No (separate lm_head)
Framework	Julia + Flux.jl

Vocabulary

38 characters: !"'(),-.:;?abcdefghijklmnopqrstuvwxyz

Character-level tokenization with no BPE — each character is one token.

Training

	Value
Dataset	Classical philosophy corpus
Training steps	14,739
Best val loss	2.91
Hardware	NVIDIA RTX 3060 12GB
Precision	Float32

Inference Settings

Parameter	Value
vocab_size	38
context_length	256
temperature	0.8
top_k	40

Checkpoint Format

JLD2 files containing:

model_state — Flux model weights
hyperparams — Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)
step — 14,739
best_val_loss — 2.91

Files

File	Description
`final_model.jld2`	Final training checkpoint
`best_model.jld2`	Best validation loss checkpoint
`checkpoint_latest.jld2`	Latest periodic checkpoint
`vocab.json`	Character vocabulary (38 chars)

Provenance

Author: LisaMegaWatts
Source code: DavinciDreams/JuliaGPT

License

MIT