JuliaGPT / README.md

Fix model card: document original 8K-param model in best_model.json, note jld2 files are v2

a188fdb verified 4 days ago

2.84 kB

	---
	language:
	- en
	license: mit
	library_name: julia
	tags:
	- julia
	- character-level
	- philosophy
	- scalar-autograd
	- pure-julia
	- scriptio-continua
	- text-generation
	pipeline_tag: text-generation
	datasets:
	- LisaMegaWatts/juliagpt-data
	model-index:
	- name: JuliaGPT
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: LisaMegaWatts/juliagpt-data
	name: juliagpt-data
	metrics:
	- type: loss
	value: 2.34
	name: Val Loss
	verified: false
	---

	# JuliaGPT

	An experimental 8,096 parameter character-level GPT in pure Julia with scalar autograd. Explores minimal vocabularies inspired by ancient Greek scriptio continua. No external ML framework dependencies.

	## Model Lineage

	\| Model \| Params \| Vocab \| Context \| Val Loss \| Notes \|
	\|-------\|--------\|-------\|---------\|----------\|-------\|
	\| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) \| 4,992 \| 27 chars \| 64 \| 2.43 \| First proof-of-concept \|
	\| JuliaGPT \| 8,096 \| 29 chars \| 256 \| 2.34 \| Expanded context + vocab \|
	\| [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2) \| ~10M \| 38 chars \| 256 \| 2.91 \| Scaled-up char-level \|

	## Architecture

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Architecture \| 1-layer Transformer (pure Julia, scalar autograd) \|
	\| Parameters \| 8,096 \|
	\| Embedding dim \| 16 \|
	\| Layers \| 1 \|
	\| Attention heads \| 4 \|
	\| Head dim \| 4 \|
	\| FFN hidden dim \| 64 \|
	\| Context length \| 256 characters \|
	\| Vocabulary \| 29 characters (a-z, space, period, + BOS) \|

	### Vocabulary

	29 tokens: `` .abcdefghijklmnopqrstuvwxyz`` + BOS

	Numerals converted to words, all punctuation removed except period.

	## Training

	\| \| Value \|
	\|---\|---\|
	\| Dataset \| Aristotle's Rhetoric + Euclid's Elements (8,461 chunks) \|
	\| Best val loss \| 2.34 \|
	\| Framework \| Pure Julia (scalar autograd, no Flux/Lux) \|

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `best_model.json` \| Original model weights + optimizer state (JSON format, scalar autograd) \|
	\| `vocab.json` \| 38-character vocabulary array \|
	\| `data/aristotle_rhetoric.txt` \| Training data \|

	Note: The `.jld2` checkpoint files in this repo contain a different, larger model (384d/6L/38vocab). That model has been moved to [JuliaGPT-v2](https://huggingface.co/LisaMegaWatts/JuliaGPT-v2). The original JuliaGPT is preserved in `best_model.json`.

	## Inference Settings

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| vocab_size \| 29 \|
	\| context_length \| 256 \|

	## Provenance

	- Author: LisaMegaWatts
	- Source code: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)
	- Training data: [LisaMegaWatts/juliagpt-data](https://huggingface.co/datasets/LisaMegaWatts/juliagpt-data)

	## License

	MIT