File size: 961 Bytes
782bf7f f3a590e 782bf7f 971776b 7df59ea f3a590e c02bc29 f3a590e c02bc29 f3a590e 782bf7f b0f59be 383ab14 b0f59be | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
---
language: tl
tags:
- lexical-normalization
- filipino
- byt5
base_model: google/byt5-base
---
# FiLex: Filipino Lexical Normalization
A lexical normalization model for Filipino/Tagalog lexical normalization.
Created by fine-tuning Google's ByT5-base model using a custom dataset.
Converts informal/noisy Filipino text (e.g. SMS, social media) into its canonical form.
## Usage
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
model = AutoModelForSeq2SeqLM.from_pretrained("Angelo25/Filipino-Lexical-Normalization")
tokenizer = AutoTokenizer.from_pretrained("Angelo25/Filipino-Lexical-Normalization")
model.eval()
inputs = tokenizer("Sample Input Text", return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=inputs["input_ids"].shape[1] + 50,
num_beams=3,
early_stopping=True,
use_cache=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
|