YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
license: mit
tags:
- text-normalization
- vietnamese
- lexical-normalization
- visonorm
- visobert
pipeline_tag: fill-mask
---
# hadung1802/visobert-normalizer-mix100
This model is a Vietnamese text normalization model trained using the ViSoNorm framework with VISOBERT architecture.
## Model Description
This model performs lexical normalization for Vietnamese text, converting informal text to standard Vietnamese. It was trained using the ViSoNorm (Self-training with Weak Supervision) framework.
## Performance
## Training Configuration
- **Base Model**: VISOBERT
- **Training Mode**: weakly_supervised
- **Learning Rate**: 0.001
- **Epochs**: 10
- **Batch Size**: 16
## Usage
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
# Load model and tokenizer
model_repo = "your-username/your-model-name" # Replace with your actual repo
tokenizer = AutoTokenizer.from_pretrained(model_repo)
model = AutoModelForMaskedLM.from_pretrained(model_repo, trust_remote_code=True)
# Normalize text using the built-in method
text = "sv dh gia dinh chua cho di lam :))"
normalized_text, source_tokens, predicted_tokens = model.normalize_text(
tokenizer, text, device='cpu'
)
# Output: sinh viên đại học gia đình chưa cho đi làm :))
```
## Example Outputs
| Input | Output |
|-------|--------|
| `sv dh gia dinh chua cho di lam :))` | `sinh viên đại học gia đình chưa cho đi làm :))` |
| `chúng nó bảo em là ctrai` | `chúng nó bảo em là con trai` |
| `anh ơi em muốn đi chơi` | `anh ơi em muốn đi chơi` |
## Citation
If you use this model, please cite the ViSoNorm paper:
```bibtex
@article{visonorm2024,
title={ViSoNorm: Self-training with Weak Supervision for Vietnamese Text Normalization},
author={Your Name},
journal={arXiv preprint},
year={2024}
}
```
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support