Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

geeteshcodes
/
sllm

Text Generation
PyTorch
English
language-model
gpt
transformer
from-scratch
causal-lm
Model card Files Files and versions
xet
Community
sllm / tokenizer
Ctrl+K
Ctrl+K
  • 1 contributor
History: 1 commit
geeteshcodes's picture
geeteshcodes
Initial commit
7f974df verified 3 days ago
  • fineweb_edu_tokenizer
    Initial commit 3 days ago
  • bpe.py
    4.91 kB
    Initial commit 3 days ago
  • fineweb_edu_tokenizer.json
    2.2 MB
    Initial commit 3 days ago
  • normalizer.py
    1.2 kB
    Initial commit 3 days ago
  • post_processor.py
    5.2 kB
    Initial commit 3 days ago
  • pretokenizer.py
    5.95 kB
    Initial commit 3 days ago
  • tempCodeRunnerFile.py
    182 Bytes
    Initial commit 3 days ago
  • tokenize_dataset.py
    13.6 kB
    Initial commit 3 days ago
  • traintokenizer.py
    6.95 kB
    Initial commit 3 days ago
  • wrap_tokenizer.py
    8.13 kB
    Initial commit 3 days ago