d0rj/OpenOrca-ru
Viewer • Updated • 4.23M • 255 • 16
How to use melmoth/ru-rope-t5-small-instruct with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("melmoth/ru-rope-t5-small-instruct")
model = AutoModelForSeq2SeqLM.from_pretrained("melmoth/ru-rope-t5-small-instruct")The Russian Rotary Position Embedding T5 model of small version after instruct tuning
The model was trained in a Russian corpus with a mix of English using the Mixture-Of-Denoisers pre-training method by UL2 on 1024 length sequences. Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.
Finetuning for downstream tasks
Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size
A corpus of Russian texts from Vikhr filtered by FRED-T5-1.7B perplexy. Instructions are translated English set
Using AdamWScale instead of Adafactor for stable learning without loss explosions