|
|
--- |
|
|
language: ur |
|
|
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png |
|
|
tags: |
|
|
- roberta-urdu-small |
|
|
- urdu |
|
|
- transformers |
|
|
license: mit |
|
|
--- |
|
|
## roberta-urdu-small |
|
|
|
|
|
[](https://github.com/urduhack/urduhack/blob/master/LICENSE) |
|
|
### Overview |
|
|
**Language model:** roberta-urdu-small |
|
|
**Model size:** 125M |
|
|
**Language:** Urdu |
|
|
**Training data:** News data from urdu news resources in Pakistan |
|
|
### About roberta-urdu-small |
|
|
roberta-urdu-small is a language model for urdu language. |
|
|
``` |
|
|
from transformers import pipeline |
|
|
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small") |
|
|
``` |
|
|
## Training procedure |
|
|
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from |
|
|
urduhack to eliminate characters from other languages like arabic. |
|
|
|
|
|
### About Urduhack |
|
|
Urduhack is a Natural Language Processing (NLP) library for urdu language. |
|
|
Github: https://github.com/urduhack/urduhack |
|
|
|
|
|
--- |
|
|
## 🚀 AWS Neuron Optimized Version Available |
|
|
|
|
|
A Neuron-optimized version of this model is available for improved performance on AWS Inferentia/Trainium instances: |
|
|
|
|
|
**[badaoui/urduhack-roberta-urdu-small-neuron](https://huggingface.co/badaoui/urduhack-roberta-urdu-small-neuron)** |
|
|
|
|
|
The Neuron-optimized version provides: |
|
|
- Pre-compiled artifacts for faster loading |
|
|
- Optimized performance on AWS Neuron devices |
|
|
- Same model capabilities with improved inference speed |
|
|
|