Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
metaXu264
/
Generator_new_tokenizer
like
0
Text Generation
Transformers
Safetensors
llama
biology
genomics
long-context
text-generation-inference
arxiv:
2502.07272
License:
mit
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
Generator_new_tokenizer
12 GB
2 contributors
History:
6 commits
XuJP264
Resize embeddings to match 4260-token tokenizer
192f9c4
6 days ago
.gitattributes
1.52 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago
README.md
6.44 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago
add_special_tokens_to_vocab.py
3.57 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago
config.json
731 Bytes
Resize embeddings to match 4260-token tokenizer
6 days ago
generation_config.json
111 Bytes
Add updated vocab with CRISPR/Cas special tokens
7 days ago
model-00001-of-00003.safetensors
5 GB
LFS
Resize embeddings to match 4260-token tokenizer
6 days ago
model-00002-of-00003.safetensors
4.96 GB
xet
Add updated vocab with CRISPR/Cas special tokens
7 days ago
model-00003-of-00003.safetensors
2.03 GB
LFS
Resize embeddings to match 4260-token tokenizer
6 days ago
model.safetensors.index.json
22.5 kB
Resize embeddings to match 4260-token tokenizer
6 days ago
special_token.txt
3.86 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago
special_tokens_map.json
3.67 kB
Declare CRISPR control tokens as additional_special_tokens
6 days ago
tokenizer.py
8.43 kB
Update tokenizer source and config vocab_size to 4260
6 days ago
tokenizer_config.json
1.3 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago
vocab.txt
30 kB
Add updated vocab with CRISPR/Cas special tokens
7 days ago