|
|
--- |
|
|
language: |
|
|
- it |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
base_model: Qwen/Qwen2.5-14B-Instruct |
|
|
tags: |
|
|
- lora |
|
|
- fine-tuned |
|
|
- banking |
|
|
- regtech |
|
|
- compliance |
|
|
- rag |
|
|
- tool-calling |
|
|
- italian |
|
|
- qwen2.5 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# π¦ RegTech-14B-Instruct |
|
|
|
|
|
> **Fine-tuned for RAG-powered banking compliance β not general knowledge.** |
|
|
|
|
|
A specialized [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance. |
|
|
|
|
|
This model doesn't try to memorize regulations β it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, and maintain the right tone and terminology when grounded on regulatory documents. |
|
|
|
|
|
--- |
|
|
|
|
|
## π― What This Model Does |
|
|
|
|
|
This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically: |
|
|
|
|
|
| Task | Description | |
|
|
|---|---| |
|
|
| π **RAG Q&A** | Answer regulatory questions grounded on retrieved documents | |
|
|
| π§ **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting | |
|
|
| π **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval | |
|
|
| π§ **Intent Detection** | Classify if a message needs document search or is conversational | |
|
|
| π **Document Reranking** | Score candidate documents by relevance | |
|
|
| π **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format | |
|
|
| βοΈ **Impact Analysis** | Cross-reference external regulations against internal bank procedures | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation β LLM-as-Judge |
|
|
|
|
|
Evaluated by **Claude Opus 4.6** (Anthropic) across 11 blind test scenarios. The judge compared base vs fine-tuned model outputs without knowing which was which. |
|
|
|
|
|
### π Head-to-Head |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββ |
|
|
β π’ Tuned Wins 8/11 (77.3%) β |
|
|
β π΄ Base Wins 2/11 (22.7%) β |
|
|
β βͺ Ties 1/11 β |
|
|
βββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
### π Quality Scores (1β5) |
|
|
|
|
|
| Criterion | Base | Tuned | Delta | | |
|
|
|---|:---:|:---:|:---:|---| |
|
|
| π― Instruction Following | 3.55 | **4.64** | +1.09 | π’π’π’ | |
|
|
| π Context Adherence | 3.82 | **4.82** | +1.00 | π’π’ | |
|
|
| β
Accuracy | 4.00 | **4.73** | +0.73 | π’π’ | |
|
|
| π Format | 4.18 | **4.45** | +0.27 | π’ | |
|
|
| π£οΈ Tone | 4.73 | **4.82** | +0.09 | β | |
|
|
| **π Overall** | **4.06** | **4.69** | **+0.64** | **π’π’** | |
|
|
|
|
|
> Highest win rate across all model sizes at 77.3%. Instruction following jumps +1.09 and context adherence +1.00 β the fine-tuning dramatically improves the model's ability to stay grounded on retrieved regulatory context. |
|
|
|
|
|
### π Results by Category |
|
|
|
|
|
| Category | Base | Tuned | Tie | |
|
|
|---|:---:|:---:|:---:| |
|
|
| π RAG Q&A | 0 | **2** | 0 | |
|
|
| π« Refusal Handling | 0 | **2** | 0 | |
|
|
| β οΈ Edge Cases | 0 | **1** | 0 | |
|
|
| π¨ Style & Tone | 0 | **1** | 0 | |
|
|
| π€ Data Extraction | 0 | 0 | 1 | |
|
|
| π JSON Output | 1 | 1 | 0 | |
|
|
| π§ Tool Use | 1 | 1 | 0 | |
|
|
|
|
|
### π Comparison Across Model Sizes |
|
|
|
|
|
| Metric | 4B | 7B | 14B | 32B | |
|
|
|---|:---:|:---:|:---:|:---:| |
|
|
| Base score (pre-tuning) | 4.11 | 3.84 | 4.06 | **4.36** | |
|
|
| Tuned score | 4.68 | 4.78 | 4.69 | **4.80** | |
|
|
| Delta (improvement) | +0.57 | +0.95 | +0.64 | +0.44 | |
|
|
| Win rate | 68.2% | 68.2% | **77.3%** | 68.2% | |
|
|
| Best eval loss | 1.191 | 1.330 | 1.225 | **0.813** | |
|
|
| Token accuracy | ~73% | ~72% | ~72% | **~81%** | |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘ Usage Examples |
|
|
|
|
|
### π RAG Q&A β Answering from Retrieved Context |
|
|
|
|
|
The model is designed to receive **retrieved regulatory documents as context** and answer based on them: |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": """Sei un assistente per la compliance bancaria. |
|
|
Rispondi SOLO basandoti sul contesto fornito. |
|
|
|
|
|
<contesto_recuperato> |
|
|
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti |
|
|
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%. |
|
|
Il coefficiente Γ¨ calcolato come rapporto tra i fondi propri e |
|
|
l'importo complessivo dell'esposizione al rischio. |
|
|
</contesto_recuperato>""" |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": "Quali sono i requisiti minimi di capitale secondo il CRR?" |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
### π Query Expansion β Improving RAG Retrieval |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": "Riscrivi la query dell'utente in una versione piΓΉ ricca per migliorare il recupero documentale (RAG). Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON richiesto." |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]" |
|
|
} |
|
|
] |
|
|
|
|
|
# Expected output: |
|
|
# {"query": "obblighi segnalazione operazioni sospette SOS UIF D.Lgs. 231/2007 |
|
|
# art. 35 riciclaggio finanziamento terrorismo portale RADAR tempistiche |
|
|
# invio indicatori anomalia"} |
|
|
``` |
|
|
|
|
|
### π§ Tool Calling β Compliance Workflows |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": """Sei un assistente operativo per la compliance. |
|
|
|
|
|
<tools> |
|
|
{"name": "calcola_scoring_rischio", "parameters": {...}} |
|
|
{"name": "controlla_liste_pep", "parameters": {...}} |
|
|
{"name": "verifica_kyc", "parameters": {...}} |
|
|
</tools> |
|
|
|
|
|
<contesto_recuperato> |
|
|
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere |
|
|
applicata per PEP, paesi ad alto rischio e profili con scoring > 60. |
|
|
</contesto_recuperato>""" |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": "Devo aprire un conto per una societΓ con sede a Dubai. Il legale rappresentante Γ¨ il sig. Al-Rashid." |
|
|
} |
|
|
] |
|
|
|
|
|
# The model will: |
|
|
# 1. Call controlla_liste_pep for the representative |
|
|
# 2. Call calcola_scoring_rischio based on risk factors |
|
|
# 3. Recommend EDD procedure per AML-003, grounded on retrieved policy |
|
|
``` |
|
|
|
|
|
### π Document Reranking |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Restituisci solo i candidati rilevanti con score 0-100. Rispondi SOLO con il JSON richiesto." |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": '{"query": "requisiti CET1 fondi propri", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR", "content": "..."}, {"id": "doc_002", "title": "DORA Art. 5", "content": "..."}]}' |
|
|
} |
|
|
] |
|
|
|
|
|
# Expected: {"matches": [{"id": "doc_001", "relevance": 95}]} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Training Details |
|
|
|
|
|
| | | |
|
|
|---|---| |
|
|
| 𧬠**Method** | LoRA β bf16 full precision (no quantization) | |
|
|
| ποΈ **Base Model** | Qwen2.5-14B-Instruct | |
|
|
| π¦ **Dataset** | 923 train / 102 eval samples | |
|
|
| β±οΈ **Duration** | 23.5 minutes | |
|
|
|
|
|
### π Training Metrics |
|
|
|
|
|
| Metric | Value | |
|
|
|---|---| |
|
|
| Final Train Loss | 1.127 | |
|
|
| Best Eval Loss | 1.225 (step 640/693) | |
|
|
| Train/Eval Gap | 0.098 β
| |
|
|
|
|
|
> Gap of 0.098 indicates **stable training with no overfitting**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Dataset Coverage |
|
|
|
|
|
The training data covers the full lifecycle of a RAG-based compliance assistant: |
|
|
|
|
|
| Category | Purpose | |
|
|
|---|---| |
|
|
| π·οΈ Title Generation | Generate conversation titles from user queries | |
|
|
| π Query Expansion | Enrich queries with regulatory terms for better retrieval | |
|
|
| π§ Intent Classification | Route queries to RAG vs conversational responses | |
|
|
| π Document Reranking | Score retrieved documents by relevance | |
|
|
| π Topic Extraction | Extract main topics from regulatory text pages | |
|
|
| π Document Summarization | Summarize multi-page regulatory documents | |
|
|
| βοΈ Relevance Filtering | Filter regulatory text relevant to banks | |
|
|
| π
Metadata Extraction | Find application dates, issuing authorities | |
|
|
| π§ Impact Analysis | Cross-reference regulations vs internal procedures | |
|
|
| π¬ RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools | |
|
|
|
|
|
**Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Deployment |
|
|
|
|
|
### With vLLM |
|
|
```bash |
|
|
vllm serve ./models/RegTech-14B-Instruct --dtype bfloat16 |
|
|
``` |
|
|
|
|
|
### With Transformers |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto") |
|
|
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID") |
|
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Important Notes |
|
|
|
|
|
- π― **RAG-optimized** β trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt. |
|
|
- π¦ **Domain-specific** β optimized for Italian banking compliance. General capabilities may differ from the base model. |
|
|
- βοΈ **Not legal advice** β a tool to assist compliance professionals, not a substitute for regulatory expertise. |
|
|
- π§ **Tool schemas** β tool calling works best with the specific function signatures used during training. |
|
|
|
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
Built with β€οΈ for banking RAG<br> |
|
|
<em>Fine-tuned with LoRA β’ Evaluated by Claude Opus 4.6 β’ Powered by Qwen2.5</em><br> |
|
|
<em>Contact For Commercial Use: https://landing.2sophia.ai</em> |
|
|
</p> |