FCA Financial Advice vs Guidance Classifier

A RoBERTa-base model fine-tuned to classify financial communications into three regulatory categories under UK FCA regulations:

Label	Description	Key Signals
guidance	Generic, educational financial information	No named products, no individual assessment, no "suitable for you"
targeted_support	Segment-personalised communications (FCA CP23/24)	Limited personal data, "people like you" framing, action categories
advice	Personal recommendations requiring FCA authorisation	Named specific products, individual circumstances, suitability assertions

Performance

Metric	Validation	Test
Accuracy	100%	100%
F1 Macro	1.000	1.000
F1 (guidance)	1.000	1.000
F1 (targeted_support)	1.000	1.000
F1 (advice)	1.000	1.000

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="djordjebatic/fca-financial-classifier")

# Guidance example
result = classifier("ISAs allow you to save up to £20,000 per year tax-free. There are several types including Cash ISAs and Stocks and Shares ISAs.")
# → [{'label': 'guidance', 'score': 0.99}]

# Targeted Support example
result = classifier("You're approaching 55 and your pension balance is £95,000. Many people in your situation explore their drawdown options.")
# → [{'label': 'targeted_support', 'score': 0.99}]

# Advice example
result = classifier("Based on your risk profile, I recommend investing £30,000 in the Vanguard LifeStrategy 80% Equity Fund. This is suitable for your circumstances.")
# → [{'label': 'advice', 'score': 0.99}]

Training Data

Trained on djordjebatic/fca-financial-classification — a synthetic dataset generated using Qwen/Qwen2.5-7B-Instruct as teacher model.

Synthetic Data Generation Pipeline

The dataset was generated using a multi-stage pipeline:

Seed Data: 22 expert-crafted examples grounded in FCA PERG 8 (PERG 8.17G-8.37G), RAO Article 53, and FCA CP23/24 (Targeted Support framework)
Diversity Grid: 10 financial domains × 14 channels × 10 personas = 1,400 unique combinations
Teacher Generation: Qwen2.5-7B-Instruct generated 5 examples per prompt with temperature=0.85
LLM-as-Judge: Same model verified each example's label accuracy (PASS/FAIL)
Post-processing: Length filtering, deduplication, class balancing

Pipeline compatible with sdg_hub — a custom flow.yaml is included.

Dataset Statistics

Split	Size	Distribution
Train	458	153/152/153 (guidance/targeted_support/advice)
Validation	57	19/19/19
Test	58	19/20/19

Generated: 750 raw examples
Quality pass rate: 90.9% (682 passed LLM judge)
After balancing: 573 examples (191 per class)

Diversity Dimensions

Domains: investments, pensions, savings, mortgages, insurance, equity_release, tax_planning, retirement_income, estate_planning, debt_management
Channels: website_faq, email, app_notification, phone_transcript, suitability_letter, platform_message, newsletter, chatbot, letter, video_call_notes, robo_adviser, social_media, brochure, annual_review
Personas: young professional, family with children, pre-retiree, retiree, HNW, first-time buyer, self-employed, recently divorced, inheritor, low-income saver

Regulatory Sources

FCA PERG 8 — Perimeter Guidance Manual Ch.8 (PERG 8.17G–8.37G)
RAO Article 53 — Regulated Activities Order 2001
UK MiFID Article 9 — Personal recommendation definition
FCA CP23/24 (Dec 2023) — Advice/Guidance Boundary Review: Targeted Support
FCA CP24/7 (Jul 2024) — Targeted Support and Simplified Advice
FCA FG17/8 (2017) — Finalised Guidance for automated investment services

The Three-Class Decision Framework

Does it name a specific product/provider?
├─ NO → Does it use individual's data to suggest action?
│       ├─ NO  → GUIDANCE
│       └─ YES → TARGETED SUPPORT
└─ YES → Is it presented as suitable for this individual?
         ├─ NO  → TARGETED SUPPORT or GUIDANCE
         └─ YES → REGULATED ADVICE

Model Details

Base model: FacebookAI/roberta-base (125M params)
Training: 5 epochs, lr=2e-5, batch_size=16, max_length=512
Optimizer: AdamW, weight_decay=0.01, warmup_ratio=0.1
Early stopping: patience=3, metric=f1_macro

Limitations

Trained on synthetic data — may not capture all real-world edge cases
The 100% test accuracy likely reflects synthetic data homogeneity rather than perfect generalisation
Targeted Support is a proposed regulatory category (FCA CP23/24) with boundaries still being refined
Edge cases at class boundaries (e.g., guidance with "you" language, targeted support naming product categories) need real-world validation
UK-specific regulatory framework — not applicable to other jurisdictions

Files

seeds/seed_examples.json — Expert-crafted seed examples with PERG 8 reasoning
seeds/fca_regulatory_context.md — Regulatory context document
flows/fca_classification/flow.yaml — sdg_hub flow definition
flows/fca_classification/prompts/ — Prompt templates for generation and quality judging
scripts/generate_self_contained.py — Self-contained data generation script
scripts/train_classifier.py — Training script

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support