Update README.md
Browse files
README.md
CHANGED
|
@@ -52,16 +52,15 @@ model = AutoModelForMaskedLM.from_pretrained(model_name)
|
|
| 52 |
|
| 53 |
### Evaluation Overview
|
| 54 |
|
| 55 |
-
Models were tested on the [`sud-resh-benchmark
|
| 56 |
|
| 57 |
-
> **Note:** The
|
| 58 |
|
| 59 |
* **Top-1 Accuracy:** fraction of masked tokens predicted exactly.
|
| 60 |
* **Top-5 Accuracy:** fraction of masked tokens predicted within the top 5 candidates.
|
| 61 |
|
| 62 |
Results reflect performance across all masked tokens, aggregated for the dataset.
|
| 63 |
|
| 64 |
-
|
| 65 |
## MLM Accuracy Comparison
|
| 66 |
|
| 67 |
| MLM Probability | Metric | ruBERT-ruLaw | rubert-base-cased | legal-bert-base-uncased |
|
|
|
|
| 52 |
|
| 53 |
### Evaluation Overview
|
| 54 |
|
| 55 |
+
Models were tested on the [`sud-resh-benchmark`](https://huggingface.co/datasets/lawful-good-project/sud-resh-benchmark/tree/main) legal texts using a masked language modeling setup. Tokens were randomly masked at varying probabilities (10–40%), and models predicted them using their pre-trained heads.
|
| 56 |
|
| 57 |
+
> **Note:** The ruBERT-ruLaw model was **pre-trained on legal texts such as laws and statutes**, but **not specifically on judicial decisions**. The evaluation reflects how well it generalizes to predicting masked tokens in Russian court rulings.
|
| 58 |
|
| 59 |
* **Top-1 Accuracy:** fraction of masked tokens predicted exactly.
|
| 60 |
* **Top-5 Accuracy:** fraction of masked tokens predicted within the top 5 candidates.
|
| 61 |
|
| 62 |
Results reflect performance across all masked tokens, aggregated for the dataset.
|
| 63 |
|
|
|
|
| 64 |
## MLM Accuracy Comparison
|
| 65 |
|
| 66 |
| MLM Probability | Metric | ruBERT-ruLaw | rubert-base-cased | legal-bert-base-uncased |
|