{ "model_id": "DeepPavlov/rubert-base-cased-conversational", "downloads": 216822, "tags": [ "transformers", "pytorch", "jax", "bert", "feature-extraction", "ru", "endpoints_compatible", "region:us" ], "description": "--- language: - ru --- # rubert-base-cased-conversational Conversational RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on OpenSubtitles\\[1\\], Dirty, Pikabu, and a Social Media segment of Taiga corpus\\[2\\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with RuBERT. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \\(LREC 2016\\) \\[2\\]: Shavrina T., Shapovalova O. \\(2017\\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.", "model_explanation_gemini": "Russian conversational BERT model trained on diverse dialogue datasets for tasks like masked language modeling and next sentence prediction." }