File size: 6,329 Bytes
f1dd719 bcdc73e f1dd719 bcdc73e f1dd719 bcdc73e f1dd719 bcdc73e f1dd719 bcdc73e f1dd719 bcdc73e f1dd719 bcdc73e d1a2dcf bcdc73e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
base_model: unsloth/Phi-3.5-mini-instruct
language:
- de
- fr
- it
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
datasets:
- ipst/slds
metrics:
- bertscore
- bleu
- rouge
---
# Model Card for Phi-3.5-mini-instruct-SLDS
## Model Summary
This model is a **Phi-3.5-mini-instruct fine-tuned on the Swiss Landmark Decisions Summarization (SLDS) dataset**.
SLDS is a multilingual dataset of **20,000 Swiss Federal Supreme Court decisions** (1954–2024), each paired with **headnotes in German, French, and Italian**, resulting in ~60,000 decision–headnote pairs.
The model is optimized for **legal abstractive summarization** and is capable of producing **concise, legally structured headnotes**.
It can be used for both **monolingual** and **cross-lingual summarization** tasks.
This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
---
## Intended Use
- **Primary Task**: Judicial summarization (decision → headnote generation).
- **Languages**: German (`de`), French (`fr`), Italian (`it`).
- **Scenarios**:
- Monolingual summarization: e.g., German decision → German headnote.
- Cross-lingual summarization: e.g., German decision → French headnote.
- Legal research support: assisting in retrieval and navigation of court decisions.
**Not intended for**:
- Replacing human legal expertise.
- Serving as an authoritative legal source.
- Automated legal advice or decision-making.
---
## Training Data
- **Dataset**: [Swiss Landmark Decisions Summarization (SLDS)](https://huggingface.co/datasets/ipst/slds).
- **Size**: ~20K decisions, ~60K decision–headnote pairs.
- **Splits**: Train (1954–2021), Validation (2022), Test (2023–2024).
- **Source**: [Swiss Federal Supreme Court](https://www.bger.ch).
---
## Training Procedure
- **Base Models**:
- Qwen2.5 family (0.5B–14B)
- Llama 3.2 (3B)
- Phi-3.5-mini
- **Fine-tuning Objective**: Conditional generation (decision → headnote).
- **Evaluation Metrics**:
- Lexical: ROUGE-1/2/L, BLEU, BERTScore.
- Domain-specific: LLM-as-a-Judge framework (DeepSeek V3) assessing five rubrics: accuracy, completeness, clarity, legal citations, and considerations.
---
## Model Performance
On the SLDS test set (2023–2024):
| Model | Setting | BERTScore ↑ | BLEU ↑ | ROUGE-1 ↑ | ROUGE-2 ↑ | ROUGE-L ↑ | JUDGE ↑ |
|:--- |:--- |:--- |:--- |:--- |:--- |:--- |:--- |
| [Phi-3.5-mini](https://huggingface.co/ipst/Phi-3.5-mini-instruct-SLDS) | fine-tuned | 11.24 ± 3.82 | 34.84 ± 0.41 | 31.20 ± 2.08 | 14.11 ± 1.27 | 20.96 ± 1.35 | 15.25 ± 2.32 |
| [Llama 3.2B](https://huggingface.co/ipst/Llama-3.2-3B-Instruct-SLDS) | fine-tuned | 15.20 ± 4.40 | 21.89 ± 0.42 | 31.89 ± 2.34 | 14.87 ± 1.61 | 22.49 ± 1.60 | 18.47 ± 2.99 |
| [Qwen2.5 0.5B](https://huggingface.co/ipst/Qwen2.5-0.5B-Instruct-SLDS) | fine-tuned | -1.37 ± 3.85 | 32.20 ± 0.35 | 23.87 ± 1.68 | 9.46 ± 0.94 | 17.37 ± 1.09 | 5.80 ± 1.26 |
| [Qwen2.5 1.5B](https://huggingface.co/ipst/Qwen2.5-1.5B-Instruct-SLDS) | fine-tuned | 19.81 ± 2.72 | 36.79 ± 0.34 | 33.03 ± 1.73 | 14.14 ± 1.08 | 22.67 ± 1.13 | 15.92 ± 2.27 |
| [Qwen2.5 3B](https://huggingface.co/ipst/Qwen2.5-3B-Instruct-SLDS) | fine-tuned | 23.23 ± 2.80 | 38.42 ± 0.34 | 35.18 ± 1.79 | 15.66 ± 1.23 | 24.10 ± 1.17 | 20.31 ± 2.66 |
| [Qwen2.5 7B](https://huggingface.co/ipst/Qwen2.5-7B-Instruct-SLDS) | fine-tuned | 29.59 ± 1.97 | 41.40 ± 0.34 | 39.24 ± 1.59 | 18.26 ± 1.25 | 26.44 ± 1.15 | 28.37 ± 3.07 |
| [Qwen2.5 14B](https://huggingface.co/ipst/Qwen2.5-14B-Instruct-SLDS) | fine-tuned | **32.48 ± 1.98** | **41.80 ± 0.37** | 40.04 ± 1.74 | **19.99 ± 1.41** | **28.00 ± 1.28** | 31.38 ± 3.19 |
| GPT-4o | one-shot | 30.44 ± 1.74 | 31.89 ± 0.25 | **42.12 ± 1.79** | 18.92 ± 1.22 | 25.92 ± 1.05 | 39.70 ± 2.66 |
| Claude 3.5 Sonnet | one-shot | 5.53 ± 2.00 | 21.88 ± 0.25 | 41.86 ± 1.64 | 19.23 ± 1.19 | 27.67 ± 1.20 | 41.25 ± 2.90 |
| DeepSeek-R1 | one-shot | 20.28 ± 1.45 | 22.37 ± 0.18 | 38.30 ± 1.82 | 15.97 ± 0.85 | 21.03 ± 0.84 | **42.28 ± 2.21** |
| o3-mini | one-shot | 14.18 ± 1.31 | 20.55 ± 0.17 | 34.77 ± 1.43 | 11.92 ± 0.69 | 18.21 ± 0.67 | 34.82 ± 2.41 |
- **Lexical metrics**: Fine-tuned models outperform in overlap-based scores.
- **LLM-judge scores**: Larger proprietary and reasoning models outperform in legal precision.
---
## Limitations
- **Language imbalance**: German decisions dominate, while Italian remains underrepresented.
- **Biases**: Headnotes reflect judicial style and conventions, not neutral summaries.
- **Evaluation mismatch**: ROUGE and BLEU may not fully capture legal accuracy.
- **Overfitting risk**: Models may overfit to formulaic headnote structures.
- **Cross-lingual difficulty**: Some models struggle with non-monolingual headnote generation.
---
## Ethical Considerations
- **Sensitive information**: All data is anonymized by the Swiss Federal Supreme Court before publication.
- **Legal risk**: Generated headnotes must not be used as official legal advice.
- **Fair use**: Ensure attribution when reusing outputs.
---
## How to Cite
If you use this model, please cite the dataset paper:
```bibtex
@inproceedings{rolshoven-etal-2025-unlocking,
title = "Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in {S}witzerland",
author = {Rolshoven, Luca and
Rasiah, Vishvaksenan and
Bose, Srinanda Br{\"u}gger and
Hostettler, Sarah and
Burkhalter, Lara and
St{\"u}rmer, Matthias and
Niklaus, Joel},
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.832/",
pages = "15382--15411",
ISBN = "979-8-89176-335-7",
}
``` |