898 MB

Ctrl+K

3 contributors

History: 16 commits

SFconvertbot

Adding `safetensors` variant of this model

a6d5a2a verified 4 months ago

data
refactor: extract shared case study utils and move data to tracked paths 4 months ago
src
Fix tokenizer ID offset: reserve IDs 0-4 for BERT special tokens 4 months ago
tests
refactor: extract shared case study utils and move data to tracked paths 4 months ago
.gitattributes

1.52 kB
chore: add HF model repo files (config, tokenizer, encoder, README) 4 months ago
.gitignore

590 Bytes
chore: add HF model repo files (config, tokenizer, encoder, README) 4 months ago
.python-version

5 Bytes
Initial: HF-compatible Latin BERT tokenizer (Bamman & Burns 2020) 4 months ago
README.md

3.33 kB
docs: remove blockquote from experimental note 4 months ago
config.json

562 Bytes
chore: add HF model repo files (config, tokenizer, encoder, README) 4 months ago
latin.subword.encoder

287 kB
chore: add HF model repo files (config, tokenizer, encoder, README) 4 months ago
model.safetensors

448 MB
xet

Adding `safetensors` variant of this model 4 months ago
pyproject.toml

680 Bytes
test: add contextual nearest neighbors case study (Bamman & Burns §4.4) 4 months ago
pytorch_model.bin
Detected Pickle imports (3)
- "torch._utils._rebuild_tensor_v2",
- "collections.OrderedDict",
- "torch.FloatStorage"
What is a pickle import?
448 MB
xet

chore: re-upload model weights (pytorch_model.bin) 4 months ago
tokenization_latin_bert.py

11.2 kB
chore: add HF model repo files (config, tokenizer, encoder, README) 4 months ago
tokenization_latin_bert_fast.py

8.87 kB
feat: add LatinBertTokenizerFast with word_ids() support 4 months ago
tokenizer_config.json

364 Bytes
feat: add LatinBertTokenizerFast with word_ids() support 4 months ago
uv.lock

340 kB
Initial: HF-compatible Latin BERT tokenizer (Bamman & Burns 2020) 4 months ago

Detected Pickle imports (3)