File size: 1,543 Bytes
ec83de8 cd58217 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
language: ur
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png
tags:
- roberta-urdu-small
- urdu
- transformers
license: mit
---
## roberta-urdu-small
[](https://github.com/urduhack/urduhack/blob/master/LICENSE)
### Overview
**Language model:** roberta-urdu-small
**Model size:** 125M
**Language:** Urdu
**Training data:** News data from urdu news resources in Pakistan
### About roberta-urdu-small
roberta-urdu-small is a language model for urdu language.
```
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
```
## Training procedure
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from
urduhack to eliminate characters from other languages like arabic.
### About Urduhack
Urduhack is a Natural Language Processing (NLP) library for urdu language.
Github: https://github.com/urduhack/urduhack
---
## 🚀 AWS Neuron Optimized Version Available
A Neuron-optimized version of this model is available for improved performance on AWS Inferentia/Trainium instances:
**[badaoui/urduhack-roberta-urdu-small-neuron](https://huggingface.co/badaoui/urduhack-roberta-urdu-small-neuron)**
The Neuron-optimized version provides:
- Pre-compiled artifacts for faster loading
- Optimized performance on AWS Neuron devices
- Same model capabilities with improved inference speed
|