YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

FCA Financial Advice vs Guidance Classifier

A RoBERTa-base model fine-tuned to classify financial communications into three regulatory categories under UK FCA regulations:

Label Description Key Signals
guidance Generic, educational financial information No named products, no individual assessment, no "suitable for you"
targeted_support Segment-personalised communications (FCA CP23/24) Limited personal data, "people like you" framing, action categories
advice Personal recommendations requiring FCA authorisation Named specific products, individual circumstances, suitability assertions

Performance

Metric Validation Test
Accuracy 100% 100%
F1 Macro 1.000 1.000
F1 (guidance) 1.000 1.000
F1 (targeted_support) 1.000 1.000
F1 (advice) 1.000 1.000

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="djordjebatic/fca-financial-classifier")

# Guidance example
result = classifier("ISAs allow you to save up to Β£20,000 per year tax-free. There are several types including Cash ISAs and Stocks and Shares ISAs.")
# β†’ [{'label': 'guidance', 'score': 0.99}]

# Targeted Support example
result = classifier("You're approaching 55 and your pension balance is Β£95,000. Many people in your situation explore their drawdown options.")
# β†’ [{'label': 'targeted_support', 'score': 0.99}]

# Advice example
result = classifier("Based on your risk profile, I recommend investing Β£30,000 in the Vanguard LifeStrategy 80% Equity Fund. This is suitable for your circumstances.")
# β†’ [{'label': 'advice', 'score': 0.99}]

Training Data

Trained on djordjebatic/fca-financial-classification β€” a synthetic dataset generated using Qwen/Qwen2.5-7B-Instruct as teacher model.

Synthetic Data Generation Pipeline

The dataset was generated using a multi-stage pipeline:

  1. Seed Data: 22 expert-crafted examples grounded in FCA PERG 8 (PERG 8.17G-8.37G), RAO Article 53, and FCA CP23/24 (Targeted Support framework)
  2. Diversity Grid: 10 financial domains Γ— 14 channels Γ— 10 personas = 1,400 unique combinations
  3. Teacher Generation: Qwen2.5-7B-Instruct generated 5 examples per prompt with temperature=0.85
  4. LLM-as-Judge: Same model verified each example's label accuracy (PASS/FAIL)
  5. Post-processing: Length filtering, deduplication, class balancing

Pipeline compatible with sdg_hub β€” a custom flow.yaml is included.

Dataset Statistics

Split Size Distribution
Train 458 153/152/153 (guidance/targeted_support/advice)
Validation 57 19/19/19
Test 58 19/20/19
  • Generated: 750 raw examples
  • Quality pass rate: 90.9% (682 passed LLM judge)
  • After balancing: 573 examples (191 per class)

Diversity Dimensions

  • Domains: investments, pensions, savings, mortgages, insurance, equity_release, tax_planning, retirement_income, estate_planning, debt_management
  • Channels: website_faq, email, app_notification, phone_transcript, suitability_letter, platform_message, newsletter, chatbot, letter, video_call_notes, robo_adviser, social_media, brochure, annual_review
  • Personas: young professional, family with children, pre-retiree, retiree, HNW, first-time buyer, self-employed, recently divorced, inheritor, low-income saver

Regulatory Sources

  • FCA PERG 8 β€” Perimeter Guidance Manual Ch.8 (PERG 8.17G–8.37G)
  • RAO Article 53 β€” Regulated Activities Order 2001
  • UK MiFID Article 9 β€” Personal recommendation definition
  • FCA CP23/24 (Dec 2023) β€” Advice/Guidance Boundary Review: Targeted Support
  • FCA CP24/7 (Jul 2024) β€” Targeted Support and Simplified Advice
  • FCA FG17/8 (2017) β€” Finalised Guidance for automated investment services

The Three-Class Decision Framework

Does it name a specific product/provider?
β”œβ”€ NO β†’ Does it use individual's data to suggest action?
β”‚       β”œβ”€ NO  β†’ GUIDANCE
β”‚       └─ YES β†’ TARGETED SUPPORT
└─ YES β†’ Is it presented as suitable for this individual?
         β”œβ”€ NO  β†’ TARGETED SUPPORT or GUIDANCE
         └─ YES β†’ REGULATED ADVICE

Model Details

  • Base model: FacebookAI/roberta-base (125M params)
  • Training: 5 epochs, lr=2e-5, batch_size=16, max_length=512
  • Optimizer: AdamW, weight_decay=0.01, warmup_ratio=0.1
  • Early stopping: patience=3, metric=f1_macro

Limitations

  • Trained on synthetic data β€” may not capture all real-world edge cases
  • The 100% test accuracy likely reflects synthetic data homogeneity rather than perfect generalisation
  • Targeted Support is a proposed regulatory category (FCA CP23/24) with boundaries still being refined
  • Edge cases at class boundaries (e.g., guidance with "you" language, targeted support naming product categories) need real-world validation
  • UK-specific regulatory framework β€” not applicable to other jurisdictions

Files

  • seeds/seed_examples.json β€” Expert-crafted seed examples with PERG 8 reasoning
  • seeds/fca_regulatory_context.md β€” Regulatory context document
  • flows/fca_classification/flow.yaml β€” sdg_hub flow definition
  • flows/fca_classification/prompts/ β€” Prompt templates for generation and quality judging
  • scripts/generate_self_contained.py β€” Self-contained data generation script
  • scripts/train_classifier.py β€” Training script
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support