Model Card for Model ID
This repository provides a SentencePiece tokenizer trained for the Azerbaijani language. It uses a Byte-Pair Encoding (BPE) model trained with SentencePiece and wrapped as a Hugging Face T5TokenizerFast object for easy integration with the 🤗 Transformers ecosystem.
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: [Nazrin Burziyeva]
- Model type: [Tokenizer]
- Language(s) (NLP): [Azerbaijani]
- License: [More Information Needed]
- Finetuned from model [N/A]: [ (trained from raw Azerbaijani text corpus)]
Uses
Preprocessing Azerbaijani text for NLP tasks
How to Get Started with the Model
from transformers import AutoTokenizer
# Load tokenizer from the Hub
tokenizer = AutoTokenizer.from_pretrained("nazrinburz/azerbaijani-sentencepiece-tokenizer")
# Example text in Azerbaijani
text = "Azərbaycan dilinin inkişafı mədəniyyətimizin qorunması və gələcək nəsillərə ötürülməsi üçün vacib şərtdir."
# Tokenize
tokens = tokenizer.tokenize(text)
print("Tokens:", tokens)
# Encode into IDs
ids = tokenizer.encode(text)
print("Token IDs:", ids)
# Decode back to text
decoded = tokenizer.decode(ids)
print("Decoded:", decoded)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support