Add library_name, update pipeline_tag and link to paper (#1)
Browse files- Add library_name, update pipeline_tag and link to paper (b88caaf5db85e112a0e9d84d55a5ab92f872d78f)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,8 +1,11 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- tr
|
| 4 |
- en
|
| 5 |
license: apache-2.0
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
- fill-mask
|
| 8 |
- turkish
|
|
@@ -12,14 +15,14 @@ tags:
|
|
| 12 |
- modernbert
|
| 13 |
- TRUBA
|
| 14 |
- MN5
|
| 15 |
-
base_model: ModernBERT-base
|
| 16 |
-
pipeline_tag: fill-mask
|
| 17 |
---
|
| 18 |
|
| 19 |
# Mursit-Base
|
| 20 |
|
| 21 |
[](https://github.com/newmindai/mecellem-models) [](https://huggingface.co/spaces/newmindai/Mizan) [](https://opensource.org/licenses/Apache-2.0)
|
| 22 |
|
|
|
|
|
|
|
| 23 |
## Model Description
|
| 24 |
|
| 25 |
Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
|
|
@@ -304,7 +307,7 @@ If you use this model, please cite our paper:
|
|
| 304 |
```bibtex
|
| 305 |
@article{mecellem2026,
|
| 306 |
title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
|
| 307 |
-
author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin,
|
| 308 |
journal={arXiv preprint arXiv:2601.16018},
|
| 309 |
year={2026},
|
| 310 |
month={January},
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: ModernBERT-base
|
| 3 |
language:
|
| 4 |
- tr
|
| 5 |
- en
|
| 6 |
license: apache-2.0
|
| 7 |
+
pipeline_tag: feature-extraction
|
| 8 |
+
library_name: transformers
|
| 9 |
tags:
|
| 10 |
- fill-mask
|
| 11 |
- turkish
|
|
|
|
| 15 |
- modernbert
|
| 16 |
- TRUBA
|
| 17 |
- MN5
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# Mursit-Base
|
| 21 |
|
| 22 |
[](https://github.com/newmindai/mecellem-models) [](https://huggingface.co/spaces/newmindai/Mizan) [](https://opensource.org/licenses/Apache-2.0)
|
| 23 |
|
| 24 |
+
Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora, as introduced in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
|
| 25 |
+
|
| 26 |
## Model Description
|
| 27 |
|
| 28 |
Mursit-Base is a Turkish Masked Language Model pre-trained entirely from scratch on Turkish-dominant corpora. The model is based on ModernBERT-base architecture (155M parameters) and serves as a foundation model for downstream tasks including text classification, named entity recognition, and feature extraction. Unlike domain-adaptive approaches that continue training from existing checkpoints, this model is initialized randomly and trained on a carefully curated dataset combining Turkish legal text with general web data.
|
|
|
|
| 307 |
```bibtex
|
| 308 |
@article{mecellem2026,
|
| 309 |
title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
|
| 310 |
+
author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, and Sağbaş, Ömer Can},
|
| 311 |
journal={arXiv preprint arXiv:2601.16018},
|
| 312 |
year={2026},
|
| 313 |
month={January},
|