Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
latincy
/
latin-bert
like
0
Follow
LatinCy
21
Fill-Mask
Transformers
PyTorch
Safetensors
Latin
bert
feature-extraction
latin
nlp
classics
arxiv:
2009.10053
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
1
Deploy
Use this model
refs/pr/1
latin-bert
898 MB
Ctrl+K
Ctrl+K
3 contributors
History:
16 commits
SFconvertbot
Adding `safetensors` variant of this model
a6d5a2a
verified
28 days ago
data
refactor: extract shared case study utils and move data to tracked paths
29 days ago
src
Fix tokenizer ID offset: reserve IDs 0-4 for BERT special tokens
about 1 month ago
tests
refactor: extract shared case study utils and move data to tracked paths
29 days ago
.gitattributes
Safe
1.52 kB
chore: add HF model repo files (config, tokenizer, encoder, README)
29 days ago
.gitignore
590 Bytes
chore: add HF model repo files (config, tokenizer, encoder, README)
29 days ago
.python-version
Safe
5 Bytes
Initial: HF-compatible Latin BERT tokenizer (Bamman & Burns 2020)
about 1 month ago
README.md
Safe
3.33 kB
docs: remove blockquote from experimental note
28 days ago
config.json
Safe
562 Bytes
chore: add HF model repo files (config, tokenizer, encoder, README)
29 days ago
latin.subword.encoder
Safe
287 kB
chore: add HF model repo files (config, tokenizer, encoder, README)
29 days ago
model.safetensors
Safe
448 MB
xet
Adding `safetensors` variant of this model
28 days ago
pyproject.toml
Safe
680 Bytes
test: add contextual nearest neighbors case study (Bamman & Burns §4.4)
about 1 month ago
pytorch_model.bin
Safe
pickle
Detected Pickle imports (3)
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
,
"torch.FloatStorage"
What is a pickle import?
448 MB
xet
chore: re-upload model weights (pytorch_model.bin)
29 days ago
tokenization_latin_bert.py
Safe
11.2 kB
chore: add HF model repo files (config, tokenizer, encoder, README)
29 days ago
tokenization_latin_bert_fast.py
8.87 kB
feat: add LatinBertTokenizerFast with word_ids() support
29 days ago
tokenizer_config.json
Safe
364 Bytes
feat: add LatinBertTokenizerFast with word_ids() support
29 days ago
uv.lock
Safe
340 kB
Initial: HF-compatible Latin BERT tokenizer (Bamman & Burns 2020)
about 1 month ago