Add library_name, update pipeline_tag and link to paper (#1)

Browse files

- Add library_name, update pipeline_tag and link to paper (b88caaf5db85e112a0e9d84d55a5ab92f872d78f)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -1,8 +1,11 @@
 ---
 language:
 - tr
 - en
 license: apache-2.0
 tags:
 - fill-mask
 - turkish
@@ -12,14 +15,14 @@ tags:
 - modernbert
 - TRUBA
 - MN5
-base_model: ModernBERT-base
-pipeline_tag: fill-mask
 ---
 # Mursit-Base
 [![GitHub](https://img.shields.io/badge/GitHub-NewMindAI-black?logo=github)](https://github.com/newmindai/mecellem-models) [![HuggingFace Space](https://img.shields.io/badge/HF%20Space-Mizan-blue?logo=huggingface)](https://huggingface.co/spaces/newmindai/Mizan) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 ## Model Description
 Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
@@ -304,7 +307,7 @@ If you use this model, please cite our paper:
 ```bibtex
 @article{mecellem2026,
   title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
-  author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin, İclal and Sağbaş, Ömer Can},
   journal={arXiv preprint arXiv:2601.16018},
   year={2026},
   month={January},

 ---
+base_model: ModernBERT-base
 language:
 - tr
 - en
 license: apache-2.0
+pipeline_tag: feature-extraction
+library_name: transformers
 tags:
 - fill-mask
 - turkish
 - modernbert
 - TRUBA
 - MN5
 ---
 # Mursit-Base
 [![GitHub](https://img.shields.io/badge/GitHub-NewMindAI-black?logo=github)](https://github.com/newmindai/mecellem-models) [![HuggingFace Space](https://img.shields.io/badge/HF%20Space-Mizan-blue?logo=huggingface)](https://huggingface.co/spaces/newmindai/Mizan) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora, as introduced in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
 ## Model Description
 Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
 ```bibtex
 @article{mecellem2026,
   title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
+  author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, and Sağbaş, Ömer Can},
   journal={arXiv preprint arXiv:2601.16018},
   year={2026},
   month={January},