Instructions to use Sophia-AI/RegTech-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sophia-AI/RegTech-4B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sophia-AI/RegTech-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Sophia-AI/RegTech-4B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Sophia-AI/RegTech-4B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Sophia-AI/RegTech-4B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sophia-AI/RegTech-4B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sophia-AI/RegTech-4B-Instruct

SGLang

How to use Sophia-AI/RegTech-4B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sophia-AI/RegTech-4B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sophia-AI/RegTech-4B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sophia-AI/RegTech-4B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Sophia-AI/RegTech-4B-Instruct with Docker Model Runner:
```
docker model run hf.co/Sophia-AI/RegTech-4B-Instruct
```

RegTech-4B-Instruct

Fine-tuned for RAG-powered banking compliance — not general knowledge.

A specialized Qwen3-4B-Instruct model fine-tuned to excel within a Retrieval-Augmented Generation (RAG) pipeline for Italian banking regulatory compliance.

This model doesn't try to memorize regulations — it's trained to work with retrieved context: follow instructions precisely, produce structured outputs, call compliance tools, resist hallucinations, and maintain professional tone when grounded on regulatory documents.

What This Model Does

This fine-tuning optimizes the model's behavior within a RAG system, not its factual knowledge. Specifically:

Task	Description
RAG Q&A	Answer regulatory questions grounded on retrieved documents
Tool Calling	KYC verification, risk scoring, PEP checks, SOS reporting
Query Expansion	Rewrite user queries with regulatory terminology for better retrieval
Intent Detection	Classify if a message needs document search or is conversational
Document Reranking	Score candidate documents by relevance
Structured JSON	Topic extraction, metadata, impact analysis in JSON format
Impact Analysis	Cross-reference external regulations against internal bank procedures
Hallucination Resistance	Refuse to fabricate regulations, articles, or sanctions not in context

Evaluation

Methodology

We evaluate all fine-tuned models using a dynamic adversarial benchmark designed to prevent overfitting to static test sets:

Test generation: An independent LLM generates novel, realistic test scenarios across 13 compliance-specific categories for each evaluation run. Tests are never reused.
Blind comparison: Both the base and fine-tuned model respond to identical prompts. Responses are anonymized and randomly swapped before judging to eliminate position bias.
Expert judging: A frontier-class LLM acts as domain expert judge, scoring each response on 7 criteria (accuracy, context adherence, hallucination resistance, format, tone, instruction following, completeness) on a 1–5 scale.
Statistical robustness: Each evaluation consists of multiple independent loops with fresh test sets, ensuring results are consistent and not artifacts of a single test batch.

This approach produces a rigorous, reproducible assessment that closely mirrors real-world compliance assistant performance.

Results — RegTech-4B-Instruct

Evaluated across 73 blind adversarial tests over 3 independent loops.

Head-to-Head vs Base Model

                        Base    Tuned
Win Rate (adj.)        45.2%   54.8%
Wins                     26      33
Ties                          14

Quality Scores (1–5 scale)

Criterion	Base	Tuned	Delta
Hallucination Resistance	3.53	3.89	+0.36	Improved
Tone & Professionalism	3.90	4.27	+0.37	Improved
Output Format	3.41	3.75	+0.34	Improved
Instruction Following	3.14	3.44	+0.30	Improved
Accuracy	3.34	3.59	+0.25	Improved
Context Adherence	3.66	3.89	+0.23	Improved
Completeness	3.45	3.23	-0.22	Trade-off
Overall	3.49	3.72	+0.23	Improved

Key Safety Improvements

The fine-tuned model demonstrates measurably safer behavior in high-stakes regulatory scenarios:

Hallucination traps: The tuned model correctly refuses fabricated regulations in all tested scenarios. The base model invents plausible-sounding but entirely fictional legal articles and sanctions.
Credential protection: When exposed to prompt injection attacks containing embedded credentials, the tuned model refuses disclosure. The base model has been observed leaking credentials verbatim.
Professional tone: Eliminates emoji usage and filler phrases ("Certo!", "Ottima domanda!") that are inappropriate in regulatory communications.

Known Limitations

Completeness trade-off (-0.22): The model tends toward concise, precise answers. For tasks requiring exhaustive analysis, responses may be shorter than ideal.
Query Expansion: Performance on query rewriting tasks is below the base model. This is a known gap being addressed in dataset improvements.
Inference speed: ~40% faster than base model (4.3s vs 7.0s average), primarily due to more concise outputs.

Consistency Across Loops

Loop	Base Wins	Tuned Wins	Ties	Tuned %
1	7	13	5	62.0%
2	11	10	2	47.8%
3	8	10	7	54.0%

Tuned model wins or ties in 2 out of 3 independent loops.

Usage Examples

RAG Q&A — Answering from Retrieved Context

messages = [
    {
        "role": "system",
        "content": """Sei un assistente per la compliance bancaria. 
Rispondi SOLO basandoti sul contesto fornito.

<contesto_recuperato>
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti 
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
</contesto_recuperato>"""
    },
    {
        "role": "user", 
        "content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
    }
]

Tool Calling — Compliance Workflows

messages = [
    {
        "role": "system",
        "content": """Sei un assistente operativo per la compliance.
        
<tools>
{"name": "calcola_scoring_rischio", "parameters": {...}}
{"name": "controlla_liste_pep", "parameters": {...}}
{"name": "verifica_kyc", "parameters": {...}}
</tools>

<contesto_recuperato>
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere 
applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
</contesto_recuperato>"""
    },
    {
        "role": "user",
        "content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
    }
]

Query Expansion — Improving RAG Retrieval

messages = [
    {
        "role": "system",
        "content": "Riscrivi la query dell'utente per migliorare il recupero documentale. Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
    }
]

Document Reranking

messages = [
    {
        "role": "system",
        "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Score 0-100. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": '{"query": "requisiti CET1", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR"}, {"id": "doc_002", "title": "DORA Art. 5"}]}'
    }
]

Training Metrics

Metric	Value
Final Eval Loss	1.368
Token Accuracy	70.5%
Train/Eval Gap	0.033

A gap of 0.033 indicates stable training with no overfitting. The model learned domain-specific behavior without degrading general capabilities.

Design Principles

The LoRA configuration follows a minimal intervention philosophy validated through progressive experimentation across 6+ configurations:

Low rank, all modules: Modifying all transformer layers with minimal rank produces better results than high rank on a subset of layers — consistent with findings from the original LoRA paper.
Single epoch: One pass through the data is sufficient for behavioral adaptation. Multiple epochs cause catastrophic forgetting on small models.
Conservative scaling: Alpha = 2× rank with low learning rate ensures stable gradients with adequate signal amplification.

Dataset Coverage

The training data covers the full lifecycle of a RAG-based compliance assistant:

Category	Purpose
Query Expansion	Enrich queries with regulatory terms for better retrieval
Intent Classification	Route queries to RAG vs conversational responses
Document Reranking	Score retrieved documents by relevance
Topic Extraction	Extract main topics from regulatory text pages
Document Summarization	Summarize multi-page regulatory documents
Relevance Filtering	Filter regulatory text relevant to banks
Metadata Extraction	Find application dates, issuing authorities
Impact Analysis	Cross-reference regulations vs internal procedures
RAG Q&A + Tool Calling	Multi-turn compliance conversations with tools

Regulatory sources covered: CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.

Deployment

With vLLM

vllm serve ./models/RegTech-4B-Instruct --dtype bfloat16

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Important Notes

RAG-optimized — Trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
Domain-specific — Optimized for Italian banking compliance. General capabilities may differ from the base model.
Not legal advice — A tool to assist compliance professionals, not a substitute for regulatory expertise.
Part of a model family — This 4B model is the lightweight variant. Larger models (7B, 14B, 32B) in the RegTech family offer progressively better completeness and accuracy for more demanding use cases.

Built for banking RAG by 2Sophia
Fine-tuned with LoRA • Adversarial evaluation by frontier LLM judges • Powered by Qwen3