Instructions to use rahular/varta-t5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rahular/varta-t5 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("rahular/varta-t5") model = AutoModelForSeq2SeqLM.from_pretrained("rahular/varta-t5") - Notebooks
- Google Colab
- Kaggle
correct the first paragraph
#2
by oooooorange - opened
README.md
CHANGED
|
@@ -7,7 +7,7 @@
|
|
| 7 |
# Varta-T5
|
| 8 |
|
| 9 |
## Model Description
|
| 10 |
-
Varta-
|
| 11 |
|
| 12 |
Varta is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
|
| 13 |
The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta). The data is released in [this bucket](https://console.cloud.google.com/storage/browser/varta-eu/data-release).
|
|
|
|
| 7 |
# Varta-T5
|
| 8 |
|
| 9 |
## Model Description
|
| 10 |
+
Varta-T5 is a model pre-trained on the `full` training set of Varta in 14 Indic languages (Assamese, Bhojpuri, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Tamil, Telugu, and Urdu) and English, using span corruption and gap-sentence generation as objectives.
|
| 11 |
|
| 12 |
Varta is a large-scale news corpus for Indic languages, including 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources.
|
| 13 |
The dataset and the model are introduced in [this paper](https://arxiv.org/abs/2305.05858). The code is released in [this repository](https://github.com/rahular/varta). The data is released in [this bucket](https://console.cloud.google.com/storage/browser/varta-eu/data-release).
|