---
language:
  - it
  - en
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - lora
  - fine-tuned
  - banking
  - regtech
  - compliance
  - rag
  - tool-calling
  - italian
  - qwen2.5
pipeline_tag: text-generation
---

# 🏦 RegTech-7B-Instruct

> **Fine-tuned for RAG-powered banking compliance — not general knowledge.**

A specialized [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance.

This model doesn't try to memorize regulations — it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, and maintain the right tone and terminology when grounded on regulatory documents.

---

## 🎯 What This Model Does

This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically:

| Task | Description |
|---|---|
| 📋 **RAG Q&A** | Answer regulatory questions grounded on retrieved documents |
| 🔧 **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting |
| 🔍 **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval |
| 🧠 **Intent Detection** | Classify if a message needs document search or is conversational |
| 📊 **Document Reranking** | Score candidate documents by relevance |
| 📝 **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format |
| ⚖️ **Impact Analysis** | Cross-reference external regulations against internal bank procedures |

---

## 📈 Evaluation — LLM-as-Judge

Evaluated by **Claude Opus 4.6** (Anthropic) across 11 blind test scenarios. The judge compared base vs fine-tuned model outputs without knowing which was which.

### 🏆 Head-to-Head

```
┌─────────────────────────────────────────┐
│  🟢 Tuned Wins    7/11    (68.2%)       │
│  🔴 Base Wins     3/11    (31.8%)       │
│  ⚪ Ties          1/11                   │
└─────────────────────────────────────────┘
```

### 📊 Quality Scores (1–5)

| Criterion | Base | Tuned | Delta | |
|---|:---:|:---:|:---:|---|
| 🎯 Instruction Following | 3.27 | **4.82** | +1.55 | 🟢🟢🟢 |
| 📎 Context Adherence | 3.64 | **5.00** | +1.36 | 🟢🟢🟢 |
| ✅ Accuracy | 3.73 | **4.73** | +1.00 | 🟢🟢 |
| 📐 Format | 4.09 | **4.64** | +0.55 | 🟢 |
| 🗣️ Tone | 4.45 | **4.73** | +0.28 | 🟢 |
| **📊 Overall** | **3.84** | **4.78** | **+0.95** | **🟢🟢** |

> **Largest improvement across all model sizes.** Instruction following jumps +1.55 and context adherence reaches a perfect 5.00 — the fine-tuning transforms this model's ability to follow retrieved regulatory context.

### 📂 Results by Category

| Category | Base | Tuned | Tie |
|---|:---:|:---:|:---:|
| 🚫 Refusal Handling | 0 | **2** | 0 |
| ⚠️ Edge Cases | 0 | **1** | 0 |
| 🎨 Style & Tone | 0 | **1** | 0 |
| 📤 Data Extraction | 0 | 0 | 1 |
| 📋 JSON Output | 1 | 1 | 0 |
| 📖 RAG Q&A | 1 | 1 | 0 |
| 🔧 Tool Use | 1 | 1 | 0 |

### 🔄 Comparison Across Model Sizes

| Metric | 4B | 7B | 32B |
|---|:---:|:---:|:---:|
| Base score (pre-tuning) | 4.11 | 3.84 | **4.36** |
| Tuned score | 4.68 | **4.78** | **4.80** |
| Delta (improvement) | +0.57 | **+0.95** | +0.44 |
| Best eval loss | 1.191 | 1.330 | **0.813** |
| Token accuracy | ~73% | ~72% | **~81%** |
| Train/eval gap | 0.050 | 0.083 | **0.030** |

> The 7B shows the **highest delta** (+0.95) — it benefits the most from fine-tuning, reaching near-parity with the 32B tuned model (4.78 vs 4.80).

---

## 💡 Usage Examples

### 📋 RAG Q&A — Answering from Retrieved Context

The model is designed to receive **retrieved regulatory documents as context** and answer based on them:

```python
messages = [
    {
        "role": "system",
        "content": """Sei un assistente per la compliance bancaria. 
Rispondi SOLO basandoti sul contesto fornito.

<contesto_recuperato>
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti 
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
Il coefficiente è calcolato come rapporto tra i fondi propri e 
l'importo complessivo dell'esposizione al rischio.
</contesto_recuperato>"""
    },
    {
        "role": "user", 
        "content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
    }
]
```

### 🔍 Query Expansion — Improving RAG Retrieval

```python
messages = [
    {
        "role": "system",
        "content": "Riscrivi la query dell'utente in una versione più ricca per migliorare il recupero documentale (RAG). Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON richiesto."
    },
    {
        "role": "user",
        "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
    }
]

# Expected output:
# {"query": "obblighi segnalazione operazioni sospette SOS UIF D.Lgs. 231/2007 
#   art. 35 riciclaggio finanziamento terrorismo portale RADAR tempistiche 
#   invio indicatori anomalia"}
```

### 🔧 Tool Calling — Compliance Workflows

```python
messages = [
    {
        "role": "system",
        "content": """Sei un assistente operativo per la compliance.
        
<tools>
{"name": "calcola_scoring_rischio", "parameters": {...}}
{"name": "controlla_liste_pep", "parameters": {...}}
{"name": "verifica_kyc", "parameters": {...}}
</tools>

<contesto_recuperato>
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere 
applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
</contesto_recuperato>"""
    },
    {
        "role": "user",
        "content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
    }
]

# The model will:
# 1. Call controlla_liste_pep for the representative
# 2. Call calcola_scoring_rischio based on risk factors  
# 3. Recommend EDD procedure per AML-003, grounded on retrieved policy
```

### 📊 Document Reranking

```python
messages = [
    {
        "role": "system",
        "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Restituisci solo i candidati rilevanti con score 0-100. Rispondi SOLO con il JSON richiesto."
    },
    {
        "role": "user",
        "content": '{"query": "requisiti CET1 fondi propri", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR", "content": "..."}, {"id": "doc_002", "title": "DORA Art. 5", "content": "..."}]}'
    }
]

# Expected: {"matches": [{"id": "doc_001", "relevance": 95}]}
```

---

## ⚙️ Training Details

| | |
|---|---|
| 🧬 **Method** | LoRA — bf16 full precision (no quantization) |
| 🏗️ **Base Model** | Qwen2.5-7B-Instruct |
| 📦 **Dataset** | 923 train / 102 eval samples |
| ⏱️ **Duration** | 13.2 minutes |


### 📉 Training Metrics

| Metric | Value |
|---|---|
| Final Train Loss | 1.247 |
| Best Eval Loss | 1.330 (step 680/693) |
| Train/Eval Gap | 0.083 ✅ |

> Gap of 0.083 indicates **stable training with no overfitting**.

---

## 📚 Dataset Coverage

The training data covers the full lifecycle of a RAG-based compliance assistant:

| Category | Purpose |
|---|---|
| 🏷️ Title Generation | Generate conversation titles from user queries |
| 🔍 Query Expansion | Enrich queries with regulatory terms for better retrieval |
| 🧠 Intent Classification | Route queries to RAG vs conversational responses |
| 📊 Document Reranking | Score retrieved documents by relevance |
| 📝 Topic Extraction | Extract main topics from regulatory text pages |
| 📖 Document Summarization | Summarize multi-page regulatory documents |
| ⚖️ Relevance Filtering | Filter regulatory text relevant to banks |
| 📅 Metadata Extraction | Find application dates, issuing authorities |
| 🔧 Impact Analysis | Cross-reference regulations vs internal procedures |
| 💬 RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools |

**Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.

---

## 🚀 Deployment

### With vLLM
```bash
vllm serve ./models/RegTech-7B-Instruct --dtype bfloat16
```

### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## ⚠️ Important Notes

- 🎯 **RAG-optimized** — trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
- 🏦 **Domain-specific** — optimized for Italian banking compliance. General capabilities may differ from the base model.
- ⚖️ **Not legal advice** — a tool to assist compliance professionals, not a substitute for regulatory expertise.
- 🔧 **Tool schemas** — tool calling works best with the specific function signatures used during training.
- 🏆 **Best cost/performance ratio** — shows the largest improvement from fine-tuning (+0.95 delta) while reaching near-parity with the 32B model.

---

<p align="center">
  Built with ❤️ for banking RAG<br>
  <em>Fine-tuned with LoRA • Evaluated by Claude Opus 4.6 • Powered by Qwen2.5</em><br>
  <em>Contact For Commercial Use: https://landing.2sophia.ai</em>
</p>