Switch back to Run B (curriculum-tuned) — cleaner live generation
Browse files- README.md +15 -12
- checkpoint_step_030000.pt → checkpoint_step_033000.pt +2 -2
- model.int8.onnx +1 -1
- model.onnx +1 -1
README.md
CHANGED
|
@@ -15,8 +15,8 @@ tags:
|
|
| 15 |
# Cicero LLM
|
| 16 |
|
| 17 |
A 100M-parameter Latin language model, **trained from scratch** — no pretrained
|
| 18 |
-
backbone, no English/Greek base. It generates Latin in the browser
|
| 19 |
-
ONNX runs.
|
| 20 |
|
| 21 |
Live demo (browser inference): https://cicerollm.com
|
| 22 |
|
|
@@ -25,8 +25,12 @@ Live demo (browser inference): https://cicerollm.com
|
|
| 25 |
- Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
|
| 26 |
2048 block size, learned absolute positions, tied embeddings)
|
| 27 |
- 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
|
| 28 |
-
- Trained from random init on a ~466M-token Latin corpus (
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Evaluation
|
| 32 |
|
|
@@ -35,18 +39,17 @@ cross-model number):
|
|
| 35 |
|
| 36 |
| pack | accuracy |
|
| 37 |
|---|---|
|
| 38 |
-
|
|
| 39 |
-
| literary diagnostic | 0.
|
| 40 |
-
|
|
| 41 |
-
|
|
| 42 |
-
|
| 43 |
-
First checkpoint in the project to clear the 0.75 canonical-cloze stretch goal.
|
| 44 |
|
| 45 |
## Files
|
| 46 |
|
| 47 |
- `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
|
| 48 |
- `model.onnx` — fp32 ONNX (~543 MB)
|
| 49 |
-
- `
|
| 50 |
- `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
|
| 51 |
- `config.json` — architecture metadata
|
| 52 |
|
|
@@ -65,7 +68,7 @@ logits = sess.run(None, {"input_ids": np.array([ids], dtype=np.int64)})[0]
|
|
| 65 |
|
| 66 |
Research artifact. Autoregressive completion with temperature + top-k sampling;
|
| 67 |
no instruction tuning, no chat behavior. Give it Latin and it continues in
|
| 68 |
-
Latin.
|
| 69 |
|
| 70 |
## License
|
| 71 |
|
|
|
|
| 15 |
# Cicero LLM
|
| 16 |
|
| 17 |
A 100M-parameter Latin language model, **trained from scratch** — no pretrained
|
| 18 |
+
backbone, no English/Greek base. It generates Classical Latin in the browser
|
| 19 |
+
or anywhere ONNX runs.
|
| 20 |
|
| 21 |
Live demo (browser inference): https://cicerollm.com
|
| 22 |
|
|
|
|
| 25 |
- Decoder-only transformer, ~111M params (12 layers × 12 heads × 768 dim,
|
| 26 |
2048 block size, learned absolute positions, tied embeddings)
|
| 27 |
- 32K SentencePiece-BPE tokenizer trained on the same Latin corpus
|
| 28 |
+
- Trained from random init on a ~466M-token Latin corpus (30,000 steps,
|
| 29 |
+
dropout 0.15), then **continued-pretrained on a targeted classical-grammar
|
| 30 |
+
curriculum** (synthetic Cicero-register prose, generated and quality-filtered
|
| 31 |
+
by a stronger model) mixed 30/70 with clean classical replay for 3,000 steps.
|
| 32 |
+
The curriculum step pushes generation toward classical register and cuts the
|
| 33 |
+
medieval/neo-Latin contamination and repetition of the base model.
|
| 34 |
|
| 35 |
## Evaluation
|
| 36 |
|
|
|
|
| 39 |
|
| 40 |
| pack | accuracy |
|
| 41 |
|---|---|
|
| 42 |
+
| held-out blind (144 items) | 0.72 |
|
| 43 |
+
| literary diagnostic | 0.82 |
|
| 44 |
+
| grammar-probe / weakness (60 items) | 0.82 |
|
| 45 |
+
| in-distribution textbook | 0.77 |
|
| 46 |
+
| bits-per-char (held-out) | 1.56 |
|
|
|
|
| 47 |
|
| 48 |
## Files
|
| 49 |
|
| 50 |
- `model.int8.onnx` — int8-quantized ONNX (~136 MB; used by the browser demo)
|
| 51 |
- `model.onnx` — fp32 ONNX (~543 MB)
|
| 52 |
+
- `checkpoint_step_033000.pt` — raw PyTorch weights + optimizer state (~1.3 GB)
|
| 53 |
- `tokenizer.json`, `tokenizer.model`, `tokenizer_config.json` — SentencePiece 32K
|
| 54 |
- `config.json` — architecture metadata
|
| 55 |
|
|
|
|
| 68 |
|
| 69 |
Research artifact. Autoregressive completion with temperature + top-k sampling;
|
| 70 |
no instruction tuning, no chat behavior. Give it Latin and it continues in
|
| 71 |
+
Latin. Best results in classical (Caesarian / Ciceronian) register.
|
| 72 |
|
| 73 |
## License
|
| 74 |
|
checkpoint_step_030000.pt → checkpoint_step_033000.pt
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ee9d6db5ebe680ac7cf769c80efb8d7db77b7d6a41a48caf0f43106c08597f0a
|
| 3 |
+
size 1334661059
|
model.int8.onnx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 136465819
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ff4c5ea14d3f2f294c5960075cd8592a7a103d57a46f68265eca2f7adfc4672e
|
| 3 |
size 136465819
|
model.onnx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 543306444
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:266d81a974e28b2b321a0263e97842267c58a2887c3051586ecae4f3b0547ea4
|
| 3 |
size 543306444
|