# TinyGuardrail Tokenizer Advanced BPE-based tokenizer for TinyGuardrail safety model. ## Specifications - **Vocabulary Size**: 16,000 - **Max Length**: 512 - **Min Frequency**: 2 - **Special Tokens**: , , , - **BPE Merges**: 141 ## Usage ```python from src.data.tokenizer import load_tokenizer # Load from HuggingFace tokenizer = load_tokenizer(hf_repo="2796gauravc/tinyguardrail-tokenizer") # Or load from local path tokenizer = load_tokenizer("outputs/tokenizer.pkl") # Encode text tokens = tokenizer.encode("Your text here") # Decode tokens text = tokenizer.decode(tokens) ```