Angelo25 commited on
Commit
b0f59be
·
verified ·
1 Parent(s): e0f6619

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -8
README.md CHANGED
@@ -2,21 +2,31 @@
2
  ---
3
  language: tl
4
  tags:
5
- - text2text-generation
6
  - lexical-normalization
7
  - filipino
8
  - byt5
9
- pipeline_tag: text-generation
10
  ---
11
 
12
  # FiLex: Filipino Lexical Normalization (ByT5-base)
13
 
14
- Fine-tuned `google/byt5-base` for Filipino/Tagalog lexical normalization.
15
- Converts informal/noisy Filipino text (e.g. SMS, social media) into its canonical form.
16
 
17
  ## Usage
18
  ```python
19
- from transformers import pipeline
20
- pipe = pipeline("text2text-generation", model="Angelo25/filex-byt5-filipino-lexnorm")
21
- pipe("ang ganda nya po subra")
22
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  ---
3
  language: tl
4
  tags:
 
5
  - lexical-normalization
6
  - filipino
7
  - byt5
 
8
  ---
9
 
10
  # FiLex: Filipino Lexical Normalization (ByT5-base)
11
 
12
+ Fine-tuned `google/byt5-base` model for Filipino/Tagalog lexical normalization.
13
+ Converts informal/noisy Filipino text (e.g. SMS, social media) into normalized form.
14
 
15
  ## Usage
16
  ```python
17
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
18
+ import torch
19
+
20
+ model = AutoModelForSeq2SeqLM.from_pretrained("Angelo25/Filipino-Lexical-Normalization")
21
+ tokenizer = AutoTokenizer.from_pretrained("Angelo25/Filipino-Lexical-Normalization")
22
+ model.eval()
23
+
24
+ inputs = tokenizer("idol q tlaga yn", return_tensors="pt").to(model.device)
25
+ output = model.generate(
26
+ **inputs,
27
+ max_new_tokens=inputs["input_ids"].shape[1] + 50,
28
+ num_beams=3,
29
+ early_stopping=True,
30
+ use_cache=True
31
+ )
32
+ print(tokenizer.decode(output[0], skip_special_tokens=True))