---
language:
  - en
license: mit
library_name: flux
tags:
  - julia
  - flux-jl
  - character-level
  - philosophy
  - transformer
  - gpt-2
  - text-generation
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: JuliaGPT-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: loss
            value: 2.91
            name: Val Loss
            verified: false
---

# JuliaGPT-v2

A **~10M parameter** character-level GPT trained on classical philosophy texts. Scaled-up successor to the original [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (8K params), using the same 38-character vocabulary but with a much larger architecture.

## Model Lineage

| Model | Params | Architecture | Vocab | Val Loss |
|-------|--------|-------------|-------|----------|
| [MicroJulia](https://huggingface.co/LisaMegaWatts/MicroJulia) | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 |
| [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 |
| **JuliaGPT-v2** | **~10M** | **6L/384d/6H, block=256** | **38 chars** | **2.91** |

## Architecture

```
GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384)        [learned position embeddings]
+-- blocks x 6:
|   +-- attn: CausalSelfAttention
|   |   +-- wq: Dense(384 -> 384)     [6 heads, 64 dim each]
|   |   +-- wk: Dense(384 -> 384)
|   |   +-- wv: Dense(384 -> 384)
|   |   +-- wo: Dense(384 -> 384)
|   +-- ffwd: FeedForward
|       +-- Dense(384 -> 1536)
|       +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)
```

### Model Details

| Parameter | Value |
|-----------|-------|
| Architecture | GPT-2 style Transformer |
| Parameters | ~10M |
| Embedding dim | 384 |
| Layers | 6 |
| Attention heads | 6 |
| Head dim | 64 |
| Context length | 256 characters |
| Vocabulary | 38 characters (a-z, space, punctuation) |
| Dropout | 0.1 |
| Weight tying | No (separate lm_head) |
| Framework | Julia + Flux.jl |

### Vocabulary

38 characters: `` !"'(),-.:;?abcdefghijklmnopqrstuvwxyz``

Character-level tokenization with no BPE — each character is one token.

## Training

| | Value |
|---|---|
| Dataset | Classical philosophy corpus |
| Training steps | 14,739 |
| Best val loss | 2.91 |
| Hardware | NVIDIA RTX 3060 12GB |
| Precision | Float32 |

## Inference Settings

| Parameter | Value |
|-----------|-------|
| vocab_size | 38 |
| context_length | 256 |
| temperature | 0.8 |
| top_k | 40 |

## Checkpoint Format

JLD2 files containing:
- `model_state` — Flux model weights
- `hyperparams` — `Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)`
- `step` — 14,739
- `best_val_loss` — 2.91

## Files

| File | Description |
|------|-------------|
| `final_model.jld2` | Final training checkpoint |
| `best_model.jld2` | Best validation loss checkpoint |
| `checkpoint_latest.jld2` | Latest periodic checkpoint |
| `vocab.json` | Character vocabulary (38 chars) |

## Provenance

- **Author**: LisaMegaWatts
- **Source code**: [DavinciDreams/JuliaGPT](https://github.com/DavinciDreams/JuliaGPT)

## License

MIT