--- license: apache-2.0 datasets: - VTSNLP/vietnamese_curated_dataset language: - vi tags: - tokenizer - vietnamese - byte-bpe - causal-lm - nlp --- # Vietnamese Tokenizer This repository contains a **ByteLevel BPE tokenizer** trained **from scratch** specifically for the **Vietnamese language**, designed for **decoder-only language model pretraining**. --- ## 🚀 Usage ### Load tokenizer ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "tranhuyHoang/mini_VN_decoder_tokenizer", use_fast=True )