2796gauravc's picture
Upload README.md with huggingface_hub
bbeb0b9 verified

TinyGuardrail Tokenizer

Advanced BPE-based tokenizer for TinyGuardrail safety model.

Specifications

  • Vocabulary Size: 16,000
  • Max Length: 512
  • Min Frequency: 2
  • Special Tokens: , , ,
  • BPE Merges: 141

Usage

from src.data.tokenizer import load_tokenizer

# Load from HuggingFace
tokenizer = load_tokenizer(hf_repo="2796gauravc/tinyguardrail-tokenizer")

# Or load from local path
tokenizer = load_tokenizer("outputs/tokenizer.pkl")

# Encode text
tokens = tokenizer.encode("Your text here")

# Decode tokens
text = tokenizer.decode(tokens)