nmmursit nielsr HF Staff commited on
Commit
aedc579
·
verified ·
1 Parent(s): b217ff7

Add library_name, update pipeline_tag and link to paper (#1)

Browse files

- Add library_name, update pipeline_tag and link to paper (b88caaf5db85e112a0e9d84d55a5ab92f872d78f)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -1,8 +1,11 @@
1
  ---
 
2
  language:
3
  - tr
4
  - en
5
  license: apache-2.0
 
 
6
  tags:
7
  - fill-mask
8
  - turkish
@@ -12,14 +15,14 @@ tags:
12
  - modernbert
13
  - TRUBA
14
  - MN5
15
- base_model: ModernBERT-base
16
- pipeline_tag: fill-mask
17
  ---
18
 
19
  # Mursit-Base
20
 
21
  [![GitHub](https://img.shields.io/badge/GitHub-NewMindAI-black?logo=github)](https://github.com/newmindai/mecellem-models) [![HuggingFace Space](https://img.shields.io/badge/HF%20Space-Mizan-blue?logo=huggingface)](https://huggingface.co/spaces/newmindai/Mizan) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
22
 
 
 
23
  ## Model Description
24
 
25
  Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
@@ -304,7 +307,7 @@ If you use this model, please cite our paper:
304
  ```bibtex
305
  @article{mecellem2026,
306
  title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
307
- author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin, İclal and Sağbaş, Ömer Can},
308
  journal={arXiv preprint arXiv:2601.16018},
309
  year={2026},
310
  month={January},
 
1
  ---
2
+ base_model: ModernBERT-base
3
  language:
4
  - tr
5
  - en
6
  license: apache-2.0
7
+ pipeline_tag: feature-extraction
8
+ library_name: transformers
9
  tags:
10
  - fill-mask
11
  - turkish
 
15
  - modernbert
16
  - TRUBA
17
  - MN5
 
 
18
  ---
19
 
20
  # Mursit-Base
21
 
22
  [![GitHub](https://img.shields.io/badge/GitHub-NewMindAI-black?logo=github)](https://github.com/newmindai/mecellem-models) [![HuggingFace Space](https://img.shields.io/badge/HF%20Space-Mizan-blue?logo=huggingface)](https://huggingface.co/spaces/newmindai/Mizan) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
23
 
24
+ Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora, as introduced in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
25
+
26
  ## Model Description
27
 
28
  Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
 
307
  ```bibtex
308
  @article{mecellem2026,
309
  title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
310
+ author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, and Sağbaş, Ömer Can},
311
  journal={arXiv preprint arXiv:2601.16018},
312
  year={2026},
313
  month={January},