dariofinardi's picture
docs: replace ~ with 'about' to avoid GFM strikethrough rendering bug
69d2c47 verified
---
language:
- it
- en
- pt
- es
- fr
- de
license: cc-by-nc-4.0
license_name: cc-by-nc-4.0
license_link: https://creativecommons.org/licenses/by-nc/4.0/
base_model: utter-project/EuroLLM-9B-Instruct-2512
base_model_relation: finetune
library_name: transformers
pipeline_tag: text-generation
tags:
- plain-language
- iso-24495-1
- compliance
- legal-nlp
- multilingual
- eurollm
- lora
- structured-output
gated: auto
extra_gated_heading: "Access to EuroLLM-ISO24495-9b-Instruct (v0.2)"
extra_gated_description: >
This model is released under CC-BY-NC-4.0 (non-commercial). The form below
helps us understand who is using the model and prioritize improvements for
v1.0. Approval is automatic once the form is submitted.
extra_gated_prompt: >
By submitting this form you confirm that (1) your intended use complies
with the CC-BY-NC-4.0 license terms (non-commercial), and (2) you have
read the Limitations section of the model card. For commercial use,
please contact hf@semplifica.ai.
extra_gated_fields:
Full name: text
Organization or affiliation: text
Country: country
Intended use:
type: text
description: "Briefly describe how you intend to use the model (1-2 sentences)."
Affiliation type:
type: select
options:
- Academic / Research
- Public administration
- Non-profit
- Industry (non-commercial evaluation only)
- Individual / Personal
I agree to non-commercial use only (CC-BY-NC-4.0):
type: checkbox
extra_gated_button_content: "Request access"
model-index:
- name: EuroLLM-ISO24495-9b-Instruct-v0.2
results:
- task:
type: text-generation
name: ISO 24495-1 Plain Language Compliance Analysis
dataset:
name: semplifica.Language synthetic v3 test set (blind)
type: custom
config: 200_samples_blind
metrics:
- type: mae
value: 2.74
name: Score MAE (0–100)
verified: false
- type: f1
value: 0.9577
name: Verdict F1 (binary)
verified: false
- type: precision
value: 0.9714
name: Verdict Precision
verified: false
- type: recall
value: 0.9444
name: Verdict Recall
verified: false
- type: false_positive_rate
value: 0.0156
name: False Positive Rate
verified: false
- type: f1
value: 0.3653
name: Span F1 (IoU 0.5)
verified: false
- type: rouge
value: 0.2655
name: Checklist ROUGE-L
verified: false
---
# EuroLLM-ISO24495-9b-Instruct (v0.2)
A fine-tuned [EuroLLM-9B-Instruct-2512](https://huggingface.co/utter-project/EuroLLM-9B-Instruct-2512)
specialised in **ISO 24495-1 (Plain Language)** compliance analysis of legal,
administrative and technical texts across **six European languages**:
Italian, English, Portuguese, Spanish, French, German.
Given a document, the model emits a structured XML analysis with: a
compliance score (0–100), a binary verdict, a list of violation spans with
character-level offsets and corrective suggestions, and a prioritised
checklist of corrective actions.
> **Version**: `v0.2` — trained on about 28,000 records (v3 dataset, hybrid
> synthetic + human-curated), with verdict balance per language and a 21 %
> anti-forgetting mix (EuroBlocks instruct conversations).
> **Previous**: [`v0.1-base`](https://huggingface.co/SemplificaAI/EuroLLM-ISO24495-9b-Instruct/tree/v0.1) — trained on 10 K records, see git tag.
> **Next**: `v1.0` — adds manually-annotated samples from domain experts
> (in preparation).
## What changed in v0.2
Compared to **v0.1-base** (the first public release):
- **2.8× larger training set** (28,410 records vs 10,225): same 9
document types in 6 EU languages, plus a new `other` catch-all category
for greater stylistic diversity.
- **Per-language verdict balance** of 40–60 % conforme (v0.1 was skewed
to about 30 % conforme): reduces the model's prior bias toward
"non_conforme" verdicts on borderline cases.
- **Anti-forgetting mix**: 21 % of training is general-purpose instruct
conversation (`euroblocks_instruct`) so the model retains broad
instruction-following capability when asked questions outside the ISO
24495-1 task.
- **Better language coverage**: Italian went from 50 % → 43 %; German
tripled (5 % → 14 %); English nearly doubled (15 % → 26 %).
- **Sentence-aware document chunking**: long documents are split at
sentence boundaries (max 500 words / chunk) with violation spans
re-localized to the new offsets.
- **Conservative training**: 2 epochs (instead of 3), learning rate
1.5e-4 (instead of 2e-4), warmup 100 steps (instead of 50). All to
reduce overfitting risk on the larger, more diverse corpus.
### Headline metric improvements (200-sample blind test)
| Metric | v0.1 | **v0.2** | Δ |
|---|---|---|---|
| `score_mae` (lower is better) | 3.86 | **2.74** | **-29 %** |
| `verdict_f1` | 0.9934 | 0.9577 | -3.6 % * |
| `false_positive_rate` (lower is better) | 0.0000 | 0.0156 | +1.6 pp |
| `span_f1` (IoU ≥ 0.5) | 0.3192 | **0.3653** | **+14 %** |
| `checklist_rouge_l` | 0.2375 | **0.2655** | **+12 %** |
\* v0.2 is evaluated on the **blind test set** (more rigorous), v0.1 was
on the validation set. The verdict F1 remains well above the production
threshold (≥ 0.88) on both.
---
## Quick start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
REPO = "SemplificaAI/EuroLLM-ISO24495-9b-Instruct"
# Recommended: 8-bit loading → ~9 GB VRAM (instead of ~18 GB in bf16)
bnb = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForCausalLM.from_pretrained(
REPO, quantization_config=bnb, device_map="auto", torch_dtype=torch.bfloat16,
)
model.eval()
SYSTEM = (
"You are an expert in plain language according to ISO 24495-1:2023. "
"Analyze the provided text and produce: (1) a compliance score 0-100, "
"(2) parts to improve with specific suggestions, "
"(3) an ordered checklist of corrective actions. "
"Reply directly without thinking aloud."
)
text = """The Parties hereby acknowledge, in light of the foregoing premises
which form an integral and substantive part of this Agreement, that the
Confidential Information shall not include..."""
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f"Analyze this text for ISO 24495-1 plain language compliance:\n\n<TEXT>\n{text}\n</TEXT>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=3072, do_sample=False,
pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
System prompts in the other five languages: see [§ Multilingual prompts](#multilingual-prompts).
---
## Output format
The model emits a single XML block with four fields:
```xml
<ANALYSIS>
<SCORE>42</SCORE>
<VERDICT>non_conforme</VERDICT>
<SPANS>
[
{
"text_fragment": "The Parties hereby acknowledge, in light of the foregoing premises...",
"violation_type": "legalese_overload",
"suggestion": "Both parties agree, based on the above context, that...",
"start_char": 0,
"end_char": 78,
"severity": "high"
}
]
</SPANS>
<CHECKLIST>
1. Replace archaic legal formulas with direct expressions.
2. Break long sentences into shorter periods.
3. Define technical terms on first use.
</CHECKLIST>
</ANALYSIS>
```
### Fields
| Field | Value | Notes |
|---|---|---|
| `<SCORE>` | integer `0``100` | 100 = fully compliant |
| `<VERDICT>` | `conforme` \| `non_conforme` | internal threshold around 60 |
| `<SPANS>` | inline JSON array | violations with char-level spans |
| `<CHECKLIST>` | numbered list | corrective actions in priority order |
### `violation_type` vocabulary (10 ISO-aligned categories)
`sentence_too_long`, `passive_voice_overuse`, `undefined_jargon`,
`buried_action`, `nominalization`, `double_negative`, `ambiguous_reference`,
`missing_structure`, `inconsistent_terminology`, `legalese_overload`.
### `severity`
`low` | `medium` | `high`
### Reference parser
A tolerant Python parser (handles truncated output and non-standard JSON
escapes) is available in the companion training-scripts repository, in
`scripts/shared/text_utils.py`.
---
## Examples
Two real runs of the model on documents from different domains and
languages, processed end-to-end with greedy decoding (`do_sample=False`,
`max_new_tokens=3072`).
### Example 1 — Italian NDA (legal)
**Input** (excerpt from a pseudonymised non-disclosure agreement, about 1,500 words):
> *Su richiesta dell'altra Parte, ovvero alla conclusione o all'interruzione,
> per qualsiasi motivo, senza alcun pregiudizio per quanto riguarda gli altri
> impegni di cui al presente Accordo, la Parte ricevente si obbliga a
> riconsegnare entro 30 giorni all'altra, ovvero, a scelta di quest'ultima,
> a distruggere e attestare per iscritto la distruzione, ogni copia di tutti
> i documenti, o altro materiale in qualsiasi forma in possesso della Parte
> stessa, delle Persone Collegate o di Terzi, che contengano o che si
> riferiscano alle "Informazioni riservate"...*
**Selected output fields** (full output has 8 spans + 5 checklist items):
```
SCORE: 15 / 100
VERDICT: non_conforme
```
| # | violation_type | severity | Suggestion |
|---|---|---|---|
| 1 | `legalese_overload` | high | *"Se richiesto dall'altra Parte, alla fine o all'interruzione dell'Accordo, la Parte Ricevente deve riconsegnare o distruggere tutte le copie..."* |
| 2 | `legalese_overload` | medium | *"Le Parti dichiarano di acconsentire al trattamento dei loro dati personali, in conformità al Regolamento UE 2016/679."* |
| 3 | `legalese_overload` | medium | *"Le Parti stipulano quanto segue"* (replacing "TUTTO CIÒ PREMESSO / SI STIPULA E CONVIENE QUANTO SEGUE") |
**Checklist excerpt**:
> 1. Semplificare il linguaggio giuridico per renderlo più accessibile.
> 2. Eliminare le formule rituali e le espressioni arcaiche.
> 3. Riformulare le frasi lunghe e complesse in periodi più brevi e chiari.
**Wall-clock**: 42 s on a single RTX 4090 (8-bit loading, 1,682 tokens generated).
### Example 2 — English technical safety manual
**Input** (excerpt from a HVAC equipment safety manual, about 2,300 words, OCR-cleaned and brand-anonymised):
> *PROHIBITION. It is forbidden to use the machine without the safety devices:
> not working, installed incorrectly. Operating the machine without the safety
> devices creates potential hazards for the operator. For correct and
> long-lasting operation of the machine, carry out the scheduled maintenance
> work as specified by the manufacturer...*
**Selected output fields** (full output has 8 spans + 5 checklist items):
```
SCORE: 15 / 100
VERDICT: non_conforme
```
| # | violation_type | severity | Suggestion |
|---|---|---|---|
| 1 | `missing_structure` | high | Add a section title (e.g., *'Prohibited Modifications'*) and use bullet points for the consequences. |
| 2 | `missing_structure` | high | Add a section title (e.g., *'Safety Device Requirements'*) and list the consequences of non-compliance. |
| 5 | `inconsistent_terminology` | medium | Use *'explosion risk areas'* consistently instead of *'areas classified as at risk of explosion'*. |
| 6 | `inconsistent_terminology` | medium | Use *'fixed guards'* consistently instead of *'fixed guards protecting the moving parts'*. |
**Checklist excerpt**:
> 1. Organize the manual into logical sections with clear, bold headings.
> 2. Use bulleted lists to present rules, prohibitions, and safety requirements.
> 3. Standardize terminology for the machine, fluids, and safety devices throughout the text.
> 4. Add a table of contents to help readers navigate the document.
Both documents score 15/100 in different ways: the NDA is flagged for
*legalese overload*, the safety manual for *missing structure* and
*inconsistent terminology*. The model correctly diagnoses different failure
modes for different document types.
---
## Evaluation
Evaluated on **200 blind samples** drawn from the v3 held-out test split,
stratified by `(language × doc_type × difficulty × verdict)`, never seen
during training or validation.
### Metrics
| Metric | Prod threshold | Acceptable threshold | **v0.2 result** | Status |
|---|---|---|---|---|
| `score_mae` (mean absolute error on 0–100 score) | ≤ 8.0 | ≤ 12.0 | **2.74** | ✅ **PROD** |
| `verdict_f1` (binary F1 conforme / non_conforme) | ≥ 0.88 | ≥ 0.80 | **0.9577** | ✅ **PROD** |
| `verdict_precision` | — | — | **0.9714** | (high) |
| `verdict_recall` | — | — | **0.9444** | (high) |
| `false_positive_rate` (on `conforme` class) | ≤ 0.08 | ≤ 0.15 | **0.0156** | ✅ **PROD** |
| `span_f1` (IoU char-level ≥ 0.5) | ≥ 0.72 | ≥ 0.62 | 0.3653 | ⚠️ below accept |
| `checklist_rouge_l` | ≥ 0.55 | ≥ 0.45 | 0.2655 | ⚠️ below accept |
### Interpretation
**Strengths**
- **Excellent score calibration**: MAE 2.74 on a 0–100 scale, far below
the production threshold (≤ 8). The model's quantitative agreement
with the ground truth is very tight.
- **Strong binary classification**: verdict F1 0.96 with high precision
(0.97) and recall (0.94). Very few false positives on compliant texts
(1.6 %).
- **Robust XML schema** adherence: canonical tags, canonical violation
vocabulary, coherent character-level offsets across all six languages.
**Measured weaknesses** (improving from v0.1, still below acceptable)
- **Span F1 0.37**: the model identifies fewer spans than the ground
truth on dense documents, or with offset drifts that fail the
IoU ≥ 0.5 threshold. Improvement target for v1.0.
- **Checklist ROUGE-L 0.27**: corrective items are semantically
plausible but lexically divergent from the ground truth (ROUGE
penalises paraphrasing). A semantic metric (BERTScore) would likely
reward these outputs more accurately.
### Test set composition
- **Languages**: IT 50 %, EN 15 %, PT 12 %, ES 10 %, FR 8 %, DE 5 %
(natural distribution preserved in val/test, balanced in train)
- **Document types (10)**: 9 administrative/legal categories plus a
catch-all `other` category for stylistic diversity
- **Difficulty buckets**: easy / medium / hard / very_hard
The aggregate metrics are **averaged across all six languages**. A
per-language breakdown will be released with v1.0.
---
## Intended use
**Recommended use cases**
- Automated triage of contractual, regulatory and administrative documents
to flag problematic clauses from a plain-language perspective.
- Decision-support tool for editors, compliance officers, in-house legal
teams.
- First-draft generation of accessible rewrites for portions of a document.
- Teaching and research on ISO 24495-1 and plain language across
multilingual corpora.
**Out-of-scope use cases**
- **Fully automated decisions without human review.** Output must always be
validated by an expert, especially for legally consequential implications.
- **Domains outside training scope**: clinical/medical text, purely academic
scientific writing, creative literature. The model is optimised on the
nine administrative/legal document types of the training set.
- **Languages other than the six supported.** Performance outside the EU
language set is not guaranteed.
- **Legal or compliance advice substitute.** The model identifies
*readability* issues, not legal correctness or compliance with other
regulations.
---
## Limitations
- **Hybrid training set** (about 23,000 task records): first about
9,000 records are fully synthetic (`gemini-2.5-flash` +
`gemini-3.5-flash` recovery), remaining records are built on top of
human-curated source documents with partial assisted re-annotation by
`gemini-3.5-flash` under human review. Generator-side biases have not
been formally measured.
- **Not validated on standard public benchmarks.** The reported metrics
come from an internal blind test set (200 samples) drawn from the same
distribution as the training set. External validation is planned for v1.0.
- **Per-language variability.** The training task data is more balanced
across languages than v0.1, but Italian is still the largest single
language (about 43 % of the task split). Expect slightly better
calibration on Italian than on German (14 %).
- **Long outputs may be truncated.** On documents with many violations the
generation can exceed 2,048 tokens; we recommend `max_new_tokens=3072+`
combined with a parser tolerant of unclosed XML tags.
- **Sub-optimal span detection** (see § Evaluation). On dense documents the
model tends to be conservative in the number of spans reported.
- **No support for documents longer than about 30,000 characters**
(training-time sequence-length limit = 3,072 tokens). For very long
documents, pre-chunk at sentence boundaries (≤ 500 words per chunk).
---
## Training details
| | |
|---|---|
| **Base model** | [utter-project/EuroLLM-9B-Instruct-2512](https://huggingface.co/utter-project/EuroLLM-9B-Instruct-2512) (Apache 2.0) |
| **Architecture** | Llama-style decoder, 9B parameters, native ChatML chat template |
| **Fine-tuning method** | LoRA in bf16 on top of an int8-quantised base (bitsandbytes) |
| **LoRA rank / alpha** | 64 / 128 |
| **LoRA target modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| **Trainable parameters** | 203,685,888 (2.18 % of total) |
| **Framework** | [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) 0.16.1; Liger kernel (fused linear + cross-entropy) |
| **Sample packing** | yes |
| **Sequence length** | 3,072 tokens |
| **Epochs** | 2 (vs 3 for v0.1) |
| **Optimizer steps** | 896 total (448 per epoch) |
| **Batch** | 1 micro × 16 gradient accumulation = 16 effective |
| **Optimizer** | Paged AdamW 8-bit (bitsandbytes) |
| **Learning rate** | 1.5e-4, cosine schedule, 100-step warmup (vs 2e-4 / 50-step for v0.1) |
| **Loss masking** | assistant tokens only (`roles_to_train: ["assistant"]`) |
| **Hardware** | 1× NVIDIA RTX 4090 (24 GB) + 128 GB system RAM |
| **Training time** | 7h 14m wall clock |
| **Final loss** | 0.30 (from 0.65 at step 5, −54 %) |
| **Peak VRAM** | ~21 GB / 24 GB |
The model published in this repository is the **final merge** of the LoRA
adapter into the base model, saved as a single `model.safetensors` file in
**bf16** (about 18 GB). For 8-bit inference, load with
`BitsAndBytesConfig(load_in_8bit=True)` as shown in the Quick start.
The bf16 merge is the "neutral ground": it can be re-quantised post-hoc to
any target format (int8, NF4, GGUF Q4_K_M).
---
## Dataset
The model was trained on **`semplifica.Language v3`**, an internal
**hybrid (synthetic + human-curated)** dataset of **28,410 records**
(23,589 train / 2,194 validation / 2,627 test) covering six European
languages, with the following structure:
### Composition
- **Train mix**:
- `task_iso24495`: 18,589 records (79 %) — the primary compliance task.
- `euroblocks_instruct`: 5,000 records (21 %) — anti-forgetting,
general-purpose instruct conversations to retain broad capability.
- **Origin of the task records** (the `task_iso24495` portion):
- First **about 9,000 records**: **fully synthetic**, generated with
`gemini-2.5-flash` (with `gemini-3.5-flash` recovery passes on
blocking defects).
- Remaining **about 9,500+ records**: built on top of **human-curated
source documents** from selected public/proprietary datasets
(`text_complexity_de`, `german4all`, `plaba`, `med_easi`,
`porsimples_sent`, `admin_it`, `simpitiki`), cleaned and normalised,
then **partially re-annotated with assistance from `gemini-3.5-flash`**
under human review. This phase brought real-world stylistic variety,
edge-case clauses, and harder negative examples that pure synthetic
generation underproduced.
- **Format**: ChatML triples `(system, user, assistant)` with structured
XML output (matching the schema documented in § Output format).
- **Languages** (task split): IT 43 %, EN 26 %, FR 17 %, PT 16 %, DE 14 %,
ES 11 %.
- **Document types (10)**: service contracts, privacy notices, general
terms & conditions, business letters, internal regulations, tender
notices, insurance policies, consent forms, administrative
communications, plus an `other` catch-all for stylistic diversity.
- **Difficulty buckets**: easy / medium / hard / very_hard, with target
word counts and violation density scaled accordingly.
- **Splits**: stratified by `(lang × doc_type × difficulty × verdict)` to
keep distribution consistent across train / val / test. Val/test
preserve natural distribution; train is balanced for verdict
(40–60 % conforme per language).
### Generation and curation pipeline
- **Synthetic generation** (first about 9,000 records):
initial bulk generation with `gemini-2.5-flash`, recovery pass with
`gemini-3.5-flash` for blocking defects.
- **Human-curated phase** (later records):
source documents from the datasets listed above, cleaned and
normalised, then passed through `gemini-3.5-flash` for assisted
re-annotation, with human review on the violation labels and span
boundaries.
- **Sentence-aware chunking** for long documents (max 500 words per
chunk, abbreviation-aware for IT/EN/FR/DE/ES/PT).
- **Algorithmic defect scan and repair** across the whole corpus:
case-insensitive matching, whitespace normalisation, span
re-localization.
- **Verdict balancing** via positive sample generation (mix 70 % Gemini
2.5 Flash + 30 % Gemini 3.5 Flash) on the human-curated baselines.
Each record carries provenance metadata: `id`, `lang`, `doc_type`,
`difficulty`, `score`, `verdict`, `source`.
### Distribution
The dataset is **not currently published**. The decision on public release
is being evaluated jointly with the v1.0 model release. For collaboration
or research access requests please use the contact channel below.
---
## Roadmap
| Version | Status | Training set | Notes |
|---|---|---|---|
| **v0.1-base** | ✅ released | ~10 K synthetic records | LoRA bf16 + 8-bit base, 3 epochs |
| **v0.2** (this) | ✅ released | ~28 K synthetic records | + verdict balance, + anti-forgetting mix, + sentence-aware chunking, 2 epochs |
| **v1.0** | 🔄 in preparation | ~28 K synthetic + manually-annotated samples | Domain-expert annotations to capture edge cases (contextual ambiguity, niche jargon, severity nuances) |
| **v1.1 / v2** | 🔜 backlog | DPO post-v1.0 | human-feedback alignment on rewrite preferences |
We are building v1.0 by adding **manually-annotated samples** from
domain experts (plain-language editors, legal reviewers, compliance
officers) to the synthetic pipeline. The synthetic data has reached
diminishing returns on the structural quality dimension; manual annotation
is what's needed to close the gap on `span_f1` and `checklist_rouge_l`.
A 1.7 B edge-distilled sub-release (`v1.0-mini`) for CPU / laptop
deployment is also planned.
---
## Multilingual prompts
The model accepts system prompts in all six target languages. Examples
optimised to match the training distribution:
```python
SYSTEM_PROMPTS = {
"it": "Sei un esperto di plain language secondo ISO 24495-1:2023. ...",
"en": "You are an expert in plain language according to ISO 24495-1:2023. ...",
"fr": "Vous êtes expert en langage clair selon ISO 24495-1:2023. ...",
"de": "Sie sind Experte für Verständlichkeit gemäß ISO 24495-1:2023. ...",
"es": "Eres experto en lenguaje claro según ISO 24495-1:2023. ...",
"pt": "Você é especialista em linguagem simples segundo a ISO 24495-1:2023. ...",
}
```
The full set of prompts is available in `iso_principles.py` in the
companion training-scripts repository.
---
## License
The **fine-tuned model** (this repository) is released under the
**Creative Commons Attribution-NonCommercial 4.0 International ([CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/))**
license.
> Non-commercial use is freely permitted (research, academia, internal
> evaluation). For commercial use, please contact the authors (see § Contact).
The **base model**
([utter-project/EuroLLM-9B-Instruct-2512](https://huggingface.co/utter-project/EuroLLM-9B-Instruct-2512))
is released under the **Apache License 2.0** (© 2024 UTTER project). The
distribution of this derivative work **incorporates and attributes** the
base model as required by Apache 2.0. See [`ATTRIBUTION.md`](ATTRIBUTION.md)
for full details.
---
## Citation
If you use this model in academic publications or research materials,
please cite as:
```bibtex
@misc{semplifica_iso24495_9b_v02_2026,
title = {EuroLLM-ISO24495-9b-Instruct (v0.2): A Fine-Tuned EuroLLM-9B
for ISO 24495-1 Plain Language Compliance Analysis in Six EU Languages},
author = {SemplificaAI},
year = {2026},
url = {https://huggingface.co/SemplificaAI/EuroLLM-ISO24495-9b-Instruct},
note = {v0.2},
}
```
Please also cite the **base model**:
```bibtex
@misc{eurollm9b_2024,
title = {EuroLLM-9B: Open-Weight European LLM},
author = {UTTER project},
year = {2024},
url = {https://huggingface.co/utter-project/EuroLLM-9B-Instruct-2512},
}
```
---
## Contact
- **Commercial use or access to v1.0**: [hf@semplifica.ai](mailto:hf@semplifica.ai)
- **Issues, bugs, qualitative feedback**: use the *Community* tab of this HF repository.
- **Academic collaboration**: contact the authors for joint dataset /
benchmark initiatives.