hakandsai commited on
Commit
f36f9fc
·
verified ·
1 Parent(s): df30256

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: fill-mask
14
 
15
  [![trmodernbert.webp](https://huggingface.co/artiwise-ai/modernbert-base-tr-uncased/resolve/main/trmodernbert.webp)](https://huggingface.co/artiwise-ai/modernbert-base-tr-uncased/resolve/main/trmodernbert.webp)
16
 
17
- We present Artiwise ModernBERT for Turkish 🎉.
18
 
19
  This model is a Turkish adaptation of ModernBERT, fine-tuned from `answerdotai/ModernBERT-base` using only the Turkish part of CulturaX.
20
 
@@ -28,20 +28,19 @@ The benchmark results below demonstrate that Artiwise ModernBERT consistently ou
28
 
29
  | Dataset & Mask Level | Artiwise Modern Bert | ytu-ce-cosmos/turkish-base-bert-uncased | dbmdz/bert-base-turkish-uncased |
30
  |--------------------------------------|----------------------|----------------------------------------|----------------------------------|
31
- | QA Dataset (5% mask) | **74.50** | 60.84 | 48.57 |
32
- | QA Dataset (10% mask) | **72.18** | 58.75 | 46.29 |
33
- | QA Dataset (15% mask) | **69.46** | 56.50 | 44.30 |
34
- | Review Dataset (5% mask) | **62.04** | 48.31 | 35.50 |
35
- | Review Dataset (10% mask) | **59.52** | 45.88 | 33.74 |
36
- | Review Dataset (15% mask) | **56.32** | 43.12 | 31.70 |
37
- | Biomedical Dataset* (5% mask) | **58.11** | 50.78 | 40.82 |
38
- | Biomedical Dataset* (10% mask) | **55.55** | 48.37 | 38.51 |
39
- | Biomedical Dataset* (15% mask) | **52.71** | 45.82 | 36.44 |
40
-
41
- \* *Only the first 100,000 entries from the Biomedical Dataset were used for evaluation.*
42
-
43
-
44
-
45
 
46
  # Model Usage
47
  Note: Torch version must be >= 2.6.0 and transformers version>=4.50.0 for the model to function properly.
 
14
 
15
  [![trmodernbert.webp](https://huggingface.co/artiwise-ai/modernbert-base-tr-uncased/resolve/main/trmodernbert.webp)](https://huggingface.co/artiwise-ai/modernbert-base-tr-uncased/resolve/main/trmodernbert.webp)
16
 
17
+ We present Artiwise ModernBERT for Turkish 🎉. A BERT model with modernized architecture and increased context size (512 --> 8192).
18
 
19
  This model is a Turkish adaptation of ModernBERT, fine-tuned from `answerdotai/ModernBERT-base` using only the Turkish part of CulturaX.
20
 
 
28
 
29
  | Dataset & Mask Level | Artiwise Modern Bert | ytu-ce-cosmos/turkish-base-bert-uncased | dbmdz/bert-base-turkish-uncased |
30
  |--------------------------------------|----------------------|----------------------------------------|----------------------------------|
31
+ | QA Dataset (5% mask) | **74.50** | 60.84 | 48.57 |
32
+ | QA Dataset (10% mask) | **72.18** | 58.75 | 46.29 |
33
+ | QA Dataset (15% mask) | **69.46** | 56.50 | 44.30 |
34
+ | Review Dataset (5% mask) | **62.67** | 48.57 | 35.38 |
35
+ | Review Dataset (10% mask) | **59.60** | 45.77 | 33.04 |
36
+ | Review Dataset (15% mask) | **56.51** | 43.05 | 31.05 |
37
+ | Biomedical Dataset (5% mask) | **58.11** | 50.78 | 40.82 |
38
+ | Biomedical Dataset (10% mask) | **55.55** | 48.37 | 38.51 |
39
+ | Biomedical Dataset (15% mask) | **52.71** | 45.82 | 36.44 |
40
+
41
+ For each dataset (QA, Reviews, Biomedical) and each masking level (5 %, 10 %, 15 %), we randomly masked the specified percentage of tokens in every input example and then measured each model’s ability to correctly predict those masked tokens. All models were in bfloat16 precision.
42
+
43
+ Our experiments used three datasets: the [Turkish Biomedical Corpus](https://huggingface.co/hazal/Turkish-Biomedical-corpus-trM), the [Turkish Product Reviews dataset](https://huggingface.co/fthbrmnby/turkish_product_reviews), and the general‑domain QA corpus [turkish\_v2](https://huggingface.co/blackerx/turkish_v2).
 
44
 
45
  # Model Usage
46
  Note: Torch version must be >= 2.6.0 and transformers version>=4.50.0 for the model to function properly.