--- language: tl tags: - lexical-normalization - filipino - byt5 base_model: google/byt5-base --- # FiLex: Filipino Lexical Normalization A lexical normalization model for Filipino/Tagalog lexical normalization. Created by fine-tuning Google's ByT5-base model using a custom dataset. Converts informal/noisy Filipino text (e.g. SMS, social media) into its canonical form. ## Usage ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer import torch model = AutoModelForSeq2SeqLM.from_pretrained("Angelo25/Filipino-Lexical-Normalization") tokenizer = AutoTokenizer.from_pretrained("Angelo25/Filipino-Lexical-Normalization") model.eval() inputs = tokenizer("Sample Input Text", return_tensors="pt").to(model.device) output = model.generate( **inputs, max_new_tokens=inputs["input_ids"].shape[1] + 50, num_beams=3, early_stopping=True, use_cache=True ) print(tokenizer.decode(output[0], skip_special_tokens=True))