gmmeyer commited on
Commit
a284d49
·
verified ·
1 Parent(s): a72f10e

Switch back to Run B (curriculum-tuned) — cleaner live generation

Browse files
README.md CHANGED
@@ -15,8 +15,8 @@ tags:
15
  # Cicero LLM
16
 
17
  A 100M-parameter Latin language model, **trained from scratch** — no pretrained
18
- backbone, no English/Greek base. It generates Latin in the browser or anywhere
19
- ONNX runs.
20
 
21
  Live demo (browser inference): https://cicerollm.com
22
 
@@ -25,8 +25,12 @@ Live demo (browser inference): https://cicerollm.com
25
  - Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
26
  2048 block size, learned absolute positions, tied embeddings)
27
  - 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
28
- - Trained from random init on a ~466M-token Latin corpus (v5 maximalist mix),
29
- 30,000 steps, dropout 0.15
 
 
 
 
30
 
31
  ## Evaluation
32
 
@@ -35,18 +39,17 @@ cross-model number):
35
 
36
  | pack | accuracy |
37
  |---|---|
38
- | canonical (textbook) | 0.80 |
39
- | literary diagnostic | 0.75 |
40
- | held-out blind (144 items) | 0.69 |
41
- | bits-per-char (held-out) | 1.58 |
42
-
43
- First checkpoint in the project to clear the 0.75 canonical-cloze stretch goal.
44
 
45
  ## Files
46
 
47
  - `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
48
  - `model.onnx` — fp32 ONNX (~543 MB)
49
- - `checkpoint_step_030000.pt` — raw PyTorch weights + optimizer state (~1.3 GB)
50
  - `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
51
  - `config.json` — architecture metadata
52
 
@@ -65,7 +68,7 @@ logits = sess.run(None, {"input_ids": np.array([ids], dtype=np.int64)})[0]
65
 
66
  Research artifact. Autoregressive completion with temperature + top-k sampling;
67
  no instruction tuning, no chat behavior. Give it Latin and it continues in
68
- Latin.
69
 
70
  ## License
71
 
 
15
  # Cicero LLM
16
 
17
  A 100M-parameter Latin language model, **trained from scratch** — no pretrained
18
+ backbone, no English/Greek base. It generates Classical Latin in the browser
19
+ or anywhere ONNX runs.
20
 
21
  Live demo (browser inference): https://cicerollm.com
22
 
 
25
  - Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
26
  2048 block size, learned absolute positions, tied embeddings)
27
  - 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
28
+ - Trained from random init on a ~466M-token Latin corpus (30,000 steps,
29
+ dropout 0.15), then **continued-pretrained on a targeted classical-grammar
30
+ curriculum** (synthetic Cicero-register prose, generated and quality-filtered
31
+ by a stronger model) mixed 30/70 with clean classical replay for 3,000 steps.
32
+ The curriculum step pushes generation toward classical register and cuts the
33
+ medieval/neo-Latin contamination and repetition of the base model.
34
 
35
  ## Evaluation
36
 
 
39
 
40
  | pack | accuracy |
41
  |---|---|
42
+ | held-out blind (144 items) | 0.72 |
43
+ | literary diagnostic | 0.82 |
44
+ | grammar-probe / weakness (60 items) | 0.82 |
45
+ | in-distribution textbook | 0.77 |
46
+ | bits-per-char (held-out) | 1.56 |
 
47
 
48
  ## Files
49
 
50
  - `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
51
  - `model.onnx` — fp32 ONNX (~543 MB)
52
+ - `checkpoint_step_033000.pt` — raw PyTorch weights + optimizer state (~1.3 GB)
53
  - `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
54
  - `config.json` — architecture metadata
55
 
 
68
 
69
  Research artifact. Autoregressive completion with temperature + top-k sampling;
70
  no instruction tuning, no chat behavior. Give it Latin and it continues in
71
+ Latin. Best results in classical (Caesarian / Ciceronian) register.
72
 
73
  ## License
74
 
checkpoint_step_030000.pt → checkpoint_step_033000.pt RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:05c1583bf99efa5f489037f85f0a71efae9cd54c0c00d2a8fcb00748638e07b9
3
- size 1334658819
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee9d6db5ebe680ac7cf769c80efb8d7db77b7d6a41a48caf0f43106c08597f0a
3
+ size 1334661059
model.int8.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:651da45148a22618556159b144f94467ec08c972070fc79f169a57df16a0f14c
3
  size 136465819
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff4c5ea14d3f2f294c5960075cd8592a7a103d57a46f68265eca2f7adfc4672e
3
  size 136465819
model.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b478a8e464d0ecc41768eb4f697feb76f2c2f7b37f9d491cfd9da63c4513991
3
  size 543306444
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:266d81a974e28b2b321a0263e97842267c58a2887c3051586ecae4f3b0547ea4
3
  size 543306444