JuliaGPT / README.md

Fix model card: document original 8K-param model in best_model.json, note jld2 files are v2

a188fdb verified 4 days ago

2.84 kB

language:
  - en
license: mit
library_name: julia
tags:
  - julia
  - character-level
  - philosophy
  - scalar-autograd
  - pure-julia
  - scriptio-continua
  - text-generation
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/juliagpt-data
model-index:
  - name: JuliaGPT
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/juliagpt-data
          name: juliagpt-data
        metrics:
          - type: loss
            value: 2.34
            name: Val Loss
            verified: false

JuliaGPT

An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.

Model Lineage

Model	Params	Vocab	Context	Val Loss	Notes
MicroJulia	4,992	27 chars	64	2.43	First proof-of-concept
JuliaGPT	8,096	29 chars	256	2.34	Expanded context + vocab
JuliaGPT-v2	~10M	38 chars	256	2.91	Scaled-up char-level

Architecture

Parameter	Value
Architecture	1-layer Transformer (pure Julia, scalar autograd)
Parameters	8,096
Embedding dim	16
Layers	1
Attention heads	4
Head dim	4
FFN hidden dim	64
Context length	256 characters
Vocabulary	29 characters (a-z, space, period, + BOS)

Vocabulary

29 tokens: .abcdefghijklmnopqrstuvwxyz + BOS

Numerals converted to words, all punctuation removed except period.

Training

	Value
Dataset	Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
Best val loss	2.34
Framework	Pure Julia (scalar autograd, no Flux/Lux)

Files

File	Description
`best_model.json`	Original model weights + optimizer state (JSON format, scalar autograd)
`vocab.json`	38-character vocabulary array
`data/aristotle_rhetoric.txt`	Training data

Note: The .jld2 checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to JuliaGPT-v2. The original JuliaGPT is preserved in best_model.json.

Inference Settings

Parameter	Value
vocab_size	29
context_length	256

Provenance

Author: LisaMegaWatts
Source code: DavinciDreams/JuliaGPT
Training data: LisaMegaWatts/juliagpt-data

License

MIT