Vietnamese Character-Level Tokenizer

This tokenizer operates at character level and is designed for Vietnamese seq2seq models.

Special tokens

  • <pad> = padding
  • <unk> = unknown character
  • <sos> = start of sequence
  • <eos> = end of sequence

Usage

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("tranhuyHoang/char-vietnamese-tokenizer")
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support