Commit History

fix: apply do_lower_case to root-level HF-facing files
f25becd

diyclassics Claude Opus 4.6 (1M context) commited on

fix: add do_lower_case=True to tokenizer (v1.1.1)
c7b1be1

diyclassics Claude Opus 4.6 (1M context) commited on

fix: revert tie_word_embeddings — safetensors needs tying for decoder weights
8c959a5

diyclassics Claude Opus 4.6 (1M context) commited on

chore: gitignore cluster/, track test vocab fixture
a285690

diyclassics Claude Opus 4.6 (1M context) commited on

fix: add decode/unescape to fast tokenizer, silence tied-weights warning
86e0990

diyclassics Claude Opus 4.6 (1M context) commited on

docs: remove blockquote from experimental note
7a1b678

diyclassics Claude Opus 4.6 (1M context) commited on

docs: add links to original repo and paper, add experimental proviso
0f1214f

diyclassics Claude Opus 4.6 (1M context) commited on

feat: add LatinBertTokenizerFast with word_ids() support
ed6af90

diyclassics Claude Opus 4.6 (1M context) commited on

chore: re-upload model weights (pytorch_model.bin)
6d20995

diyclassics commited on

chore: add HF model repo files (config, tokenizer, encoder, README)
872519e

diyclassics Claude Opus 4.6 (1M context) commited on

refactor: extract shared case study utils and move data to tracked paths
f04d50f

diyclassics Claude Opus 4.6 (1M context) commited on

fix: handle >512 token sentences and add MPS device support
2c07f6c

diyclassics Claude Opus 4.6 commited on

test: add contextual nearest neighbors case study (Bamman & Burns §4.4)
3510517

diyclassics Claude Opus 4.6 commited on

feat: make benchmarks model-agnostic with --model-path option
8af2caa

diyclassics Claude Opus 4.6 commited on

chore: register pytest slow marker
bb53c05

diyclassics Claude Opus 4.6 commited on

test: add WSD case study reproduction (Bamman & Burns Table 2)
73784ba

diyclassics Claude Opus 4.6 commited on

test: add POS tagging case study reproduction (Bamman & Burns Table 1)
bbde973

diyclassics Claude Opus 4.6 commited on

test: add infilling case study reproduction (Bamman & Burns Table 3)
c5bfe4c

diyclassics Claude Opus 4.6 commited on

Fix tokenizer ID offset: reserve IDs 0-4 for BERT special tokens
ce59834

diyclassics Claude Opus 4.6 commited on

Initial: HF-compatible Latin BERT tokenizer (Bamman & Burns 2020)
68d8806

diyclassics Claude Opus 4.6 commited on