back_rag_huggingface / data /model_data_json /DeepPavlov_rubert-base-cased-conversational.json
shayan5422's picture
Upload 1308 files
e9162e8 verified
{
"model_id": "DeepPavlov/rubert-base-cased-conversational",
"downloads": 216822,
"tags": [
"transformers",
"pytorch",
"jax",
"bert",
"feature-extraction",
"ru",
"endpoints_compatible",
"region:us"
],
"description": "--- language: - ru --- # rubert-base-cased-conversational Conversational RuBERT \\(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\\) was trained on OpenSubtitles\\[1\\], Dirty, Pikabu, and a Social Media segment of Taiga corpus\\[2\\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with RuBERT. 08.11.2021: upload model with MLM and NSP heads \\[1\\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \\(LREC 2016\\) \\[2\\]: Shavrina T., Shapovalova O. \\(2017\\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.",
"model_explanation_gemini": "Russian conversational BERT model trained on diverse dialogue datasets for tasks like masked language modeling and next sentence prediction."
}