How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("fill-mask", model="urduhack/roberta-urdu-small")
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
Quick Links

roberta-urdu-small

License: MIT

Overview

Language model: roberta-urdu-small Model size: 125M Language: Urdu Training data: News data from urdu news resources in Pakistan

About roberta-urdu-small

roberta-urdu-small is a language model for urdu language.

from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")

Training procedure

roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from urduhack to eliminate characters from other languages like arabic.

About Urduhack

Urduhack is a Natural Language Processing (NLP) library for urdu language. Github: https://github.com/urduhack/urduhack

Downloads last month
207
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for urduhack/roberta-urdu-small

Finetunes
11 models

Spaces using urduhack/roberta-urdu-small 4