Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
geeteshcodes
/
sllm
like
1
Text Generation
PyTorch
English
language-model
gpt
transformer
from-scratch
causal-lm
License:
mit
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
sllm
/
tokenizer
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
geeteshcodes
Initial commit
7f974df
verified
3 days ago
fineweb_edu_tokenizer
Initial commit
3 days ago
bpe.py
4.91 kB
Initial commit
3 days ago
fineweb_edu_tokenizer.json
2.2 MB
Initial commit
3 days ago
normalizer.py
1.2 kB
Initial commit
3 days ago
post_processor.py
5.2 kB
Initial commit
3 days ago
pretokenizer.py
5.95 kB
Initial commit
3 days ago
tempCodeRunnerFile.py
182 Bytes
Initial commit
3 days ago
tokenize_dataset.py
13.6 kB
Initial commit
3 days ago
traintokenizer.py
6.95 kB
Initial commit
3 days ago
wrap_tokenizer.py
8.13 kB
Initial commit
3 days ago