Switch back to Run B (curriculum-tuned) — cleaner live generation

Browse files

Files changed (4) hide show

README.md +15 -12
checkpoint_step_030000.pt → checkpoint_step_033000.pt +2 -2
model.int8.onnx +1 -1
model.onnx +1 -1

README.md CHANGED Viewed

@@ -15,8 +15,8 @@ tags:
 # Cicero LLM
 A 100M-parameter Latin language model, **trained from scratch** — no pretrained
-backbone, no English/Greek base. It generates Latin in the browser or anywhere
-ONNX runs.
 Live demo (browser inference): https://cicerollm.com
@@ -25,8 +25,12 @@ Live demo (browser inference): https://cicerollm.com
 - Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
   2048 block size, learned absolute positions, tied embeddings)
 - 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
-- Trained from random init on a ~466M-token Latin corpus (v5 maximalist mix),
-  30,000 steps, dropout 0.15
 ## Evaluation
@@ -35,18 +39,17 @@ cross-model number):
 | pack | accuracy |
 |---|---|
-| canonical (textbook) | 0.80 |
-| literary diagnostic | 0.75 |
-| held-out blind (144 items) | 0.69 |
-| bits-per-char (held-out) | 1.58 |
-First checkpoint in the project to clear the 0.75 canonical-cloze stretch goal.
 ## Files
 - `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
 - `model.onnx` — fp32 ONNX (~543 MB)
-- `checkpoint_step_030000.pt` — raw PyTorch weights + optimizer state (~1.3 GB)
 - `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
 - `config.json` — architecture metadata
@@ -65,7 +68,7 @@ logits = sess.run(None, {"input_ids": np.array([ids], dtype=np.int64)})[0]
 Research artifact. Autoregressive completion with temperature + top-k sampling;
 no instruction tuning, no chat behavior. Give it Latin and it continues in
-Latin.
 ## License

 # Cicero LLM
 A 100M-parameter Latin language model, **trained from scratch** — no pretrained
+backbone, no English/Greek base. It generates Classical Latin in the browser
+or anywhere ONNX runs.
 Live demo (browser inference): https://cicerollm.com
 - Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
   2048 block size, learned absolute positions, tied embeddings)
 - 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
+- Trained from random init on a ~466M-token Latin corpus (30,000 steps,
+  dropout 0.15), then **continued-pretrained on a targeted classical-grammar
+  curriculum** (synthetic Cicero-register prose, generated and quality-filtered
+  by a stronger model) mixed 30/70 with clean classical replay for 3,000 steps.
+  The curriculum step pushes generation toward classical register and cuts the
+  medieval/neo-Latin contamination and repetition of the base model.
 ## Evaluation
 | pack | accuracy |
 |---|---|
+| held-out blind (144 items) | 0.72 |
+| literary diagnostic | 0.82 |
+| grammar-probe / weakness (60 items) | 0.82 |
+| in-distribution textbook | 0.77 |
+| bits-per-char (held-out) | 1.56 |
 ## Files
 - `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
 - `model.onnx` — fp32 ONNX (~543 MB)
+- `checkpoint_step_033000.pt` — raw PyTorch weights + optimizer state (~1.3 GB)
 - `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
 - `config.json` — architecture metadata
 Research artifact. Autoregressive completion with temperature + top-k sampling;
 no instruction tuning, no chat behavior. Give it Latin and it continues in
+Latin. Best results in classical (Caesarian / Ciceronian) register.
 ## License

checkpoint_step_030000.pt → checkpoint_step_033000.pt RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:05c1583bf99efa5f489037f85f0a71efae9cd54c0c00d2a8fcb00748638e07b9
-size 1334658819

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee9d6db5ebe680ac7cf769c80efb8d7db77b7d6a41a48caf0f43106c08597f0a
+size 1334661059

model.int8.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:651da45148a22618556159b144f94467ec08c972070fc79f169a57df16a0f14c
 size 136465819

 version https://git-lfs.github.com/spec/v1
+oid sha256:ff4c5ea14d3f2f294c5960075cd8592a7a103d57a46f68265eca2f7adfc4672e
 size 136465819

model.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b478a8e464d0ecc41768eb4f697feb76f2c2f7b37f9d491cfd9da63c4513991
 size 543306444

 version https://git-lfs.github.com/spec/v1
+oid sha256:266d81a974e28b2b321a0263e97842267c58a2887c3051586ecae4f3b0547ea4
 size 543306444