IlyaGusev/saiga_scored
Viewer โข Updated โข 41.6k โข 638 โข 23
LoRa tuned version of ruadapt llama 3 8B with extended tokenizer after LEP (Learned Embedding Propagation, paper will be soon) procedure on saiga_scored d7 dataset.
Thanks to the extended tokenizer, the model works more efficiently with the Russian language.
Tikhomirov M., Chernyshev D. Facilitating large language model Russian adaptation with Learned Embedding Propagation // 2024 (will be soon)
Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //2023 Ivannikov Ispras Open Conference (ISPRAS). โ IEEE, 2023. โ ะก. 163-168.