--- language: - it - en license: apache-2.0 library_name: transformers base_model: Qwen/Qwen2.5-7B-Instruct tags: - lora - fine-tuned - banking - regtech - compliance - rag - tool-calling - italian - qwen2.5 pipeline_tag: text-generation --- # 🏦 RegTech-7B-Instruct > **Fine-tuned for RAG-powered banking compliance β€” not general knowledge.** A specialized [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model fine-tuned to excel within a **Retrieval-Augmented Generation (RAG) pipeline** for Italian banking regulatory compliance. This model doesn't try to memorize regulations β€” it's trained to **work with retrieved context**: follow instructions precisely, produce structured outputs, call compliance tools, and maintain the right tone and terminology when grounded on regulatory documents. --- ## 🎯 What This Model Does This fine-tuning optimizes the model's **behavior within a RAG system**, not its factual knowledge. Specifically: | Task | Description | |---|---| | πŸ“‹ **RAG Q&A** | Answer regulatory questions grounded on retrieved documents | | πŸ”§ **Tool Calling** | KYC verification, risk scoring, PEP checks, SOS reporting | | πŸ” **Query Expansion** | Rewrite user queries with regulatory terminology for better retrieval | | 🧠 **Intent Detection** | Classify if a message needs document search or is conversational | | πŸ“Š **Document Reranking** | Score candidate documents by relevance | | πŸ“ **Structured JSON** | Topic extraction, metadata, impact analysis in JSON format | | βš–οΈ **Impact Analysis** | Cross-reference external regulations against internal bank procedures | --- ## πŸ“ˆ Evaluation β€” LLM-as-Judge Evaluated by **Claude Opus 4.6** (Anthropic) across 11 blind test scenarios. The judge compared base vs fine-tuned model outputs without knowing which was which. ### πŸ† Head-to-Head ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 🟒 Tuned Wins 7/11 (68.2%) β”‚ β”‚ πŸ”΄ Base Wins 3/11 (31.8%) β”‚ β”‚ βšͺ Ties 1/11 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### πŸ“Š Quality Scores (1–5) | Criterion | Base | Tuned | Delta | | |---|:---:|:---:|:---:|---| | 🎯 Instruction Following | 3.27 | **4.82** | +1.55 | 🟒🟒🟒 | | πŸ“Ž Context Adherence | 3.64 | **5.00** | +1.36 | 🟒🟒🟒 | | βœ… Accuracy | 3.73 | **4.73** | +1.00 | 🟒🟒 | | πŸ“ Format | 4.09 | **4.64** | +0.55 | 🟒 | | πŸ—£οΈ Tone | 4.45 | **4.73** | +0.28 | 🟒 | | **πŸ“Š Overall** | **3.84** | **4.78** | **+0.95** | **🟒🟒** | > **Largest improvement across all model sizes.** Instruction following jumps +1.55 and context adherence reaches a perfect 5.00 β€” the fine-tuning transforms this model's ability to follow retrieved regulatory context. ### πŸ“‚ Results by Category | Category | Base | Tuned | Tie | |---|:---:|:---:|:---:| | 🚫 Refusal Handling | 0 | **2** | 0 | | ⚠️ Edge Cases | 0 | **1** | 0 | | 🎨 Style & Tone | 0 | **1** | 0 | | πŸ“€ Data Extraction | 0 | 0 | 1 | | πŸ“‹ JSON Output | 1 | 1 | 0 | | πŸ“– RAG Q&A | 1 | 1 | 0 | | πŸ”§ Tool Use | 1 | 1 | 0 | ### πŸ”„ Comparison Across Model Sizes | Metric | 4B | 7B | 32B | |---|:---:|:---:|:---:| | Base score (pre-tuning) | 4.11 | 3.84 | **4.36** | | Tuned score | 4.68 | **4.78** | **4.80** | | Delta (improvement) | +0.57 | **+0.95** | +0.44 | | Best eval loss | 1.191 | 1.330 | **0.813** | | Token accuracy | ~73% | ~72% | **~81%** | | Train/eval gap | 0.050 | 0.083 | **0.030** | > The 7B shows the **highest delta** (+0.95) β€” it benefits the most from fine-tuning, reaching near-parity with the 32B tuned model (4.78 vs 4.80). --- ## πŸ’‘ Usage Examples ### πŸ“‹ RAG Q&A β€” Answering from Retrieved Context The model is designed to receive **retrieved regulatory documents as context** and answer based on them: ```python messages = [ { "role": "system", "content": """Sei un assistente per la compliance bancaria. Rispondi SOLO basandoti sul contesto fornito. Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%. Il coefficiente Γ¨ calcolato come rapporto tra i fondi propri e l'importo complessivo dell'esposizione al rischio. """ }, { "role": "user", "content": "Quali sono i requisiti minimi di capitale secondo il CRR?" } ] ``` ### πŸ” Query Expansion β€” Improving RAG Retrieval ```python messages = [ { "role": "system", "content": "Riscrivi la query dell'utente in una versione piΓΉ ricca per migliorare il recupero documentale (RAG). Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON richiesto." }, { "role": "user", "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]" } ] # Expected output: # {"query": "obblighi segnalazione operazioni sospette SOS UIF D.Lgs. 231/2007 # art. 35 riciclaggio finanziamento terrorismo portale RADAR tempistiche # invio indicatori anomalia"} ``` ### πŸ”§ Tool Calling β€” Compliance Workflows ```python messages = [ { "role": "system", "content": """Sei un assistente operativo per la compliance. {"name": "calcola_scoring_rischio", "parameters": {...}} {"name": "controlla_liste_pep", "parameters": {...}} {"name": "verifica_kyc", "parameters": {...}} Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere applicata per PEP, paesi ad alto rischio e profili con scoring > 60. """ }, { "role": "user", "content": "Devo aprire un conto per una societΓ  con sede a Dubai. Il legale rappresentante Γ¨ il sig. Al-Rashid." } ] # The model will: # 1. Call controlla_liste_pep for the representative # 2. Call calcola_scoring_rischio based on risk factors # 3. Recommend EDD procedure per AML-003, grounded on retrieved policy ``` ### πŸ“Š Document Reranking ```python messages = [ { "role": "system", "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Restituisci solo i candidati rilevanti con score 0-100. Rispondi SOLO con il JSON richiesto." }, { "role": "user", "content": '{"query": "requisiti CET1 fondi propri", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR", "content": "..."}, {"id": "doc_002", "title": "DORA Art. 5", "content": "..."}]}' } ] # Expected: {"matches": [{"id": "doc_001", "relevance": 95}]} ``` --- ## βš™οΈ Training Details | | | |---|---| | 🧬 **Method** | LoRA β€” bf16 full precision (no quantization) | | πŸ—οΈ **Base Model** | Qwen2.5-7B-Instruct | | πŸ“¦ **Dataset** | 923 train / 102 eval samples | | ⏱️ **Duration** | 13.2 minutes | ### πŸ“‰ Training Metrics | Metric | Value | |---|---| | Final Train Loss | 1.247 | | Best Eval Loss | 1.330 (step 680/693) | | Train/Eval Gap | 0.083 βœ… | > Gap of 0.083 indicates **stable training with no overfitting**. --- ## πŸ“š Dataset Coverage The training data covers the full lifecycle of a RAG-based compliance assistant: | Category | Purpose | |---|---| | 🏷️ Title Generation | Generate conversation titles from user queries | | πŸ” Query Expansion | Enrich queries with regulatory terms for better retrieval | | 🧠 Intent Classification | Route queries to RAG vs conversational responses | | πŸ“Š Document Reranking | Score retrieved documents by relevance | | πŸ“ Topic Extraction | Extract main topics from regulatory text pages | | πŸ“– Document Summarization | Summarize multi-page regulatory documents | | βš–οΈ Relevance Filtering | Filter regulatory text relevant to banks | | πŸ“… Metadata Extraction | Find application dates, issuing authorities | | πŸ”§ Impact Analysis | Cross-reference regulations vs internal procedures | | πŸ’¬ RAG Q&A + Tool Calling | Multi-turn compliance conversations with tools | **Regulatory sources covered:** CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions. --- ## πŸš€ Deployment ### With vLLM ```bash vllm serve ./models/RegTech-7B-Instruct --dtype bfloat16 ``` ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID") text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## ⚠️ Important Notes - 🎯 **RAG-optimized** β€” trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt. - 🏦 **Domain-specific** β€” optimized for Italian banking compliance. General capabilities may differ from the base model. - βš–οΈ **Not legal advice** β€” a tool to assist compliance professionals, not a substitute for regulatory expertise. - πŸ”§ **Tool schemas** β€” tool calling works best with the specific function signatures used during training. - πŸ† **Best cost/performance ratio** β€” shows the largest improvement from fine-tuning (+0.95 delta) while reaching near-parity with the 32B model. ---

Built with ❀️ for banking RAG
Fine-tuned with LoRA β€’ Evaluated by Claude Opus 4.6 β€’ Powered by Qwen2.5
Contact For Commercial Use: https://landing.2sophia.ai