JuliaGPT / README.md
LisaMegaWatts's picture
Fix model card: document original 8K-param model in best_model.json, note jld2 files are v2
a188fdb verified
metadata
language:
  - en
license: mit
library_name: julia
tags:
  - julia
  - character-level
  - philosophy
  - scalar-autograd
  - pure-julia
  - scriptio-continua
  - text-generation
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/juliagpt-data
model-index:
  - name: JuliaGPT
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/juliagpt-data
          name: juliagpt-data
        metrics:
          - type: loss
            value: 2.34
            name: Val Loss
            verified: false

JuliaGPT

An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.

Model Lineage

Model Params Vocab Context Val Loss Notes
MicroJulia 4,992 27 chars 64 2.43 First proof-of-concept
JuliaGPT 8,096 29 chars 256 2.34 Expanded context + vocab
JuliaGPT-v2 ~10M 38 chars 256 2.91 Scaled-up char-level

Architecture

Parameter Value
Architecture 1-layer Transformer (pure Julia, scalar autograd)
Parameters 8,096
Embedding dim 16
Layers 1
Attention heads 4
Head dim 4
FFN hidden dim 64
Context length 256 characters
Vocabulary 29 characters (a-z, space, period, + BOS)

Vocabulary

29 tokens: .abcdefghijklmnopqrstuvwxyz + BOS

Numerals converted to words, all punctuation removed except period.

Training

Value
Dataset Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
Best val loss 2.34
Framework Pure Julia (scalar autograd, no Flux/Lux)

Files

File Description
best_model.json Original model weights + optimizer state (JSON format, scalar autograd)
vocab.json 38-character vocabulary array
data/aristotle_rhetoric.txt Training data

Note: The .jld2 checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to JuliaGPT-v2. The original JuliaGPT is preserved in best_model.json.

Inference Settings

Parameter Value
vocab_size 29
context_length 256

Provenance

License

MIT