nmcuong
/

ByT5-Vi-Normalization

@@ -48,22 +48,20 @@ ByT5-Vi-Normalization is a fine-tuned version of Google's ByT5-small model, spec
 ## Usage
 You can use the model with Hugging Face Transformers as follows:
 ```python
 from transformers import T5ForConditionalGeneration, AutoTokenizer
 import torch
 # Load model and tokenizer
 model_dir = "nmcuong/ByT5-Vi-Normalization"
 model = T5ForConditionalGeneration.from_pretrained(model_dir)
 tokenizer = AutoTokenizer.from_pretrained(model_dir)
 # Move model to GPU if available
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model = model.to(device).to(dtype=torch.bfloat16)
 model.eval()
 # Example usage
 input_text = "Normalize: Theo thông tư số 01/2023/TT-BTC, từ ngày 1/1/2024, Việt Nam sẽ áp dụng thuế giá trị gia tăng (VAT) mới cho các mặt hàng tiêu dùng."
 inputs = tokenizer(input_text, return_tensors="pt", padding=True).to(device)
@@ -71,10 +69,45 @@ with torch.no_grad():
     outputs = model.generate(**inputs, max_length=768, num_beams=2)
     decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print("Result:", decoded)
 # Result: Theo thông tư số không một, năm hai nghìn không trăm hai mươi ba, Thông tư của Bộ Tài chính, từ ngày một tháng một, năm hai nghìn không trăm hai mươi tư, Việt Nam sẽ áp dụng thuế giá trị gia tăng mới cho các mặt hàng tiêu dùng.
 ```
 ---
 ## Example Inputs & Outputs

 ## Usage
+## Hugging Face
 You can use the model with Hugging Face Transformers as follows:
 ```python
 from transformers import T5ForConditionalGeneration, AutoTokenizer
 import torch
 # Load model and tokenizer
 model_dir = "nmcuong/ByT5-Vi-Normalization"
 model = T5ForConditionalGeneration.from_pretrained(model_dir)
 tokenizer = AutoTokenizer.from_pretrained(model_dir)
 # Move model to GPU if available
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model = model.to(device).to(dtype=torch.bfloat16)
 model.eval()
 # Example usage
 input_text = "Normalize: Theo thông tư số 01/2023/TT-BTC, từ ngày 1/1/2024, Việt Nam sẽ áp dụng thuế giá trị gia tăng (VAT) mới cho các mặt hàng tiêu dùng."
 inputs = tokenizer(input_text, return_tensors="pt", padding=True).to(device)
     outputs = model.generate(**inputs, max_length=768, num_beams=2)
     decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print("Result:", decoded)
 # Result: Theo thông tư số không một, năm hai nghìn không trăm hai mươi ba, Thông tư của Bộ Tài chính, từ ngày một tháng một, năm hai nghìn không trăm hai mươi tư, Việt Nam sẽ áp dụng thuế giá trị gia tăng mới cho các mặt hàng tiêu dùng.
 ```
+---
+## CTranslate2
+you can also use the model with CTranslate2 for faster inference:
+```bash
+ct2-transformers-converter --model nmcuong/ByT5-Vi-Normalization --output_dir ByT5-Vi-Normalization-CT2
+```
+Then, you can load the model in Python:
+```python
+import ctranslate2
+import transformers
+# https://opennmt.net/CTranslate2/python/ctranslate2.Translator.html
+translator = ctranslate2.Translator(
+    "ByT5-Vi-Normalization-CT2",
+    device="cuda",
+    device_index=0,
+    compute_type="bfloat16",
+)
+tokenizer = transformers.AutoTokenizer.from_pretrained("nmcuong/ByT5-Vi-Normalization")
+input_text = "Normalize: Hôm nay là ngày 15/07/2025. Giá xăng tăng lên 25.000 đồng/lít"
+input_tokens = tokenizer.convert_ids_to_tokens(
+    tokenizer.encode(input_text),
+)
+results = translator.translate_batch([input_tokens], max_decoding_length=768)
+output_tokens = results[0].hypotheses[0]
+output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
+print(output_text)
+# Result: Hôm nay là ngày mười lăm tháng bảy năm hai nghìn không trăm hai mươi lăm. Giá xăng tăng lên hai mươi lăm nghìn đồng một lít.
+```
+> Note: The news in this example is for testing purposes only and does not represent real-life news.
 ---
 ## Example Inputs & Outputs

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2a80d21de722518aa7d7c09239581cfb17793bcbe909199025e14d01868c796d
 size 1198571496

 version https://git-lfs.github.com/spec/v1
+oid sha256:1897fbd0ed4f4abe3406bcd6c84bd17586aee136025459e97541c4da7fe2774c
 size 1198571496