deycoding
/

deycoding-compliance-classifier-router

@@ -26,11 +26,15 @@ model-index:
 # BERT Compliance Classifier Router
-A 134M parameter BERT encoder model trained from scratch for Financial Services (FSI) query classification with PII detection and compliance-aware routing.
 ## Model Description
-This model classifies incoming user queries into 4 routing categories for cost-optimized, compliance-aware LLM serving in regulated industries:
 | Label | Complexity | PII | Routing Action |
 |-------|-----------|-----|----------------|
@@ -41,53 +45,73 @@ This model classifies incoming user queries into 4 routing categories for cost-o
 ## Key Results
 - **Accuracy:** 99.2%
-- **PII Recall:** ~100%
 - **Latency:** ~7ms (GPU) / ~72ms (CPU)
-- **Throughput:** ~130 queries/sec per GPU
-- **Model Size:** 134M parameters / ~530 MB
 ## Files
-| File | Description |
-|------|-------------|
-| `deycoding.compliance-classifier-in-1-0.pt` | Model weights (PyTorch state_dict) |
-| `deycoding.compliance-classifier-in-1-0.json` | BPE Tokenizer (32K vocab) |
 ## Architecture
 - **Type:** BERT Encoder (bidirectional transformer, no causal mask)
 - **Dimensions:** 768
 - **Layers:** 12
 - **Attention Heads:** 12
 - **FFN Dimension:** 3072
 - **Max Sequence Length:** 128 tokens (inference) / 512 tokens (pre-training)
-- **Vocabulary:** 32,000 (BPE, includes `<mask>` token)
-- **Activation:** GELU
-- **Normalization:** LayerNorm
 - **Classification Head:** Linear(768→768) → Tanh → Dropout → Linear(768→4)
 ## Training
 ### Pre-training
 - **Objective:** Masked Language Model (MLM), 15% masking (80/10/10)
-- **Data:** English Wikipedia (2B tokens, 500K steps)
 - **Batch size:** 8, sequence length: 512
-- **LR:** 1e-4 → 1e-5 (cosine schedule, warmup 2000 steps)
 - **Hardware:** NVIDIA L4 (24 GB), ~48 hours
 - **Final Loss:** 1.815
 ### Fine-tuning
 - **Data:** 50,000 synthetic FSI examples (balanced, 12,500 per class)
 - **PII Types:** 14 (PAN, Aadhaar, phone, email, UPI, DOB, card, DL, voter, passport, address, IFSC)
 - **Input Formats:** Structured + unstructured (human-typed messy input)
-- **Languages:** English + Hinglish (15%)
-- **Steps:** 8,000, batch=32, LR=2e-5 → 2e-6
 - **Hardware:** NVIDIA L4, ~15 minutes
 - **Final Accuracy:** 99.2%
 ## Usage
 ```python
 import torch
 import torch.nn.functional as F
@@ -96,7 +120,7 @@ from tokenizers import Tokenizer
 # Load tokenizer
 tokenizer = Tokenizer.from_file("deycoding.compliance-classifier-in-1-0.json")
-# Load model (requires architecture definition — see repository)
 model.load_state_dict(torch.load("deycoding.compliance-classifier-in-1-0.pt", map_location="cpu"))
 model.eval()
@@ -119,10 +143,10 @@ print(f"{prediction} ({confidence:.1f}%)")
 ## PII Detection Capabilities
-Detects personal identifiable information in both structured and unstructured (human-typed) formats:
-| PII Type | Structured | Unstructured |
-|----------|-----------|--------------|
 | PAN | ABCDE1234F | pan abcde1234f |
 | Aadhaar | 1234 5678 9012 | aadhar no 123456789012 |
 | Phone | +91-98765-43210 | my number is 9876543210 |
@@ -136,35 +160,50 @@ Detects personal identifiable information in both structured and unstructured (h
 | Voter ID | ABC1234567 | voter id ABC1234567 |
 | Address | Flat 4B, Tower 2, Koramangala | flat 4b tower 2 koramangala bangalore 560034 |
 | IFSC | SBIN0123456 | ifsc SBIN0123456 |
 ## Intended Use
 - Query routing in multi-tier LLM serving architectures
-- PII detection for data residency compliance (GDPR, RBI, DPDP Act)
-- Cost optimization — route simple queries to cheaper models (65-73% savings)
-- Financial services, healthcare, legal — any regulated industry
 ## Limitations
-- Trained on synthetic data — fine-tune on real queries for production
-- English + Hinglish only — other languages not covered
-- Max 128 tokens — very long queries get truncated
-- PII detection is learned (not regex) — may miss novel PII formats not in training data
 ## Ethical Considerations
 - Model makes routing decisions, not content decisions
-- PII detection is conservative (prefers false positive over false negative)
 - Data residency enforcement is architectural — PII queries physically cannot reach cross-region infrastructure
 ## Citation
 ```bibtex
 @misc{dey2026classifier,
   title={Classifier-Gated Multi-Tier LLM Routing for Cost-Optimized Serving in Regulated Industries},
   author={Abhishek Dey},
   year={2026},
-  url={https://huggingface.co/deycoding/bert-compliance-classifier-router}
 }
 ```
@@ -175,4 +214,4 @@ Detects personal identifiable information in both structured and unstructured (h
 ## License
-CC-BY-NC-4.0 — Non-commercial use permitted with attribution. Commercial licensing available upon request. Contact author for commercial inquiries.

 # BERT Compliance Classifier Router
+A 134M parameter BERT encoder model built entirely from scratch — from pre-training through fine-tuning — designed specifically for regulated industries such as Financial Services, Healthcare, and Legal. The model serves as an intelligent query router that classifies incoming prompts based on two critical dimensions: query complexity and the presence of Personally Identifiable Information (PII). This enables cost-optimized, compliance-aware LLM serving where sensitive data never leaves local infrastructure.
+Unlike traditional keyword-based or regex-based PII detection, this model learns contextual patterns from training data, enabling it to detect PII in both structured formats (like PAN: ABCDE1234F) and unstructured human-typed input (like "my pan is abcde1234f" or "Mera PAN number batao"). The model supports English and Hinglish (Hindi-English code-mixed) queries commonly seen in Indian financial services.
 ## Model Description
+The core purpose of this model is to sit at the entry point of a multi-tier LLM serving architecture. When a user query arrives, this classifier makes a routing decision in under 10 milliseconds — determining whether the query should go to a small or large language model, and whether the data must stay on local infrastructure or can be processed cross-region. This architectural pattern delivers 65-73% cost savings compared to routing all queries to a single large model, while simultaneously enforcing data residency compliance that is architecturally impossible with Mixture-of-Experts (MoE) approaches.
+The model outputs one of four labels, each mapping directly to a routing action:
 | Label | Complexity | PII | Routing Action |
 |-------|-----------|-----|----------------|
 ## Key Results
+The model achieves production-ready performance across all metrics. The 99.2% accuracy on a held-out test set demonstrates strong generalization, while the sub-10ms latency ensures the routing decision adds negligible overhead to the overall request lifecycle. The throughput of 130+ queries per second on a single GPU means a single instance can serve millions of classification requests per day.
 - **Accuracy:** 99.2%
+- **PII Recall:** ~100% (conservative — prefers false positive over missed PII)
 - **Latency:** ~7ms (GPU) / ~72ms (CPU)
+- **Throughput:** ~130 queries/sec per GPU instance
+- **Model Size:** 134M parameters / ~530 MB on disk
+- **Inference Memory:** ~530 MB VRAM (fits on any GPU including T4 16GB)
 ## Files
+The repository contains two files required for inference. The model weights file contains the trained parameters of the BERT encoder plus the classification head. The tokenizer file contains the BPE vocabulary and merge rules trained on English Wikipedia, including special tokens required for the model.
+| File | Size | Description |
+|------|------|-------------|
+| `deycoding.compliance-classifier-in-1-0.pt` | ~441 MB | Model weights (PyTorch state_dict, FP32) |
+| `deycoding.compliance-classifier-in-1-0.json` | ~2 MB | BPE Tokenizer (32K vocab, trained on Wikipedia) |
 ## Architecture
+The model uses a standard BERT encoder architecture — a bidirectional transformer that attends to all tokens simultaneously (no causal mask). This is fundamentally different from decoder-only models like GPT which can only see past tokens. The bidirectional attention is critical for classification tasks because the model needs to understand the full context of a query before making a routing decision. The classification is performed on the first token's hidden state, which aggregates information from the entire input sequence through self-attention.
 - **Type:** BERT Encoder (bidirectional transformer, no causal mask)
 - **Dimensions:** 768
 - **Layers:** 12
 - **Attention Heads:** 12
 - **FFN Dimension:** 3072
 - **Max Sequence Length:** 128 tokens (inference) / 512 tokens (pre-training)
+- **Vocabulary:** 32,000 (BPE, includes `<mask>` token at ID=4)
+- **Activation:** GELU (standard for BERT-family models)
+- **Normalization:** LayerNorm (pre-norm variant)
 - **Classification Head:** Linear(768→768) → Tanh → Dropout → Linear(768→4)
 ## Training
+The model was trained in two phases following the standard pre-train then fine-tune paradigm. Pre-training teaches the model general English language understanding through the Masked Language Model (MLM) objective — predicting randomly masked tokens from context. Fine-tuning then specializes this general understanding into the specific 4-class classification task using labeled FSI examples.
 ### Pre-training
+Pre-training was conducted on English Wikipedia (2B tokens) using the Masked Language Model objective. During each training step, 15% of input tokens are randomly masked (80% replaced with [MASK], 10% replaced with random token, 10% kept unchanged), and the model learns to predict the original tokens. This teaches deep contextual understanding of English language patterns, grammar, and world knowledge — providing a strong foundation for downstream classification.
 - **Objective:** Masked Language Model (MLM), 15% masking (80/10/10)
+- **Data:** English Wikipedia (2B tokens processed over 500K steps)
 - **Batch size:** 8, sequence length: 512
+- **Learning Rate:** 1e-4 → 1e-5 (cosine schedule, warmup 2000 steps)
+- **Optimizer:** AdamW (betas=0.9/0.999, weight_decay=0.01)
 - **Hardware:** NVIDIA L4 (24 GB), ~48 hours
+- **Precision:** BF16 mixed precision with gradient checkpointing
 - **Final Loss:** 1.815
 ### Fine-tuning
+Fine-tuning was performed on 50,000 synthetic FSI examples carefully designed to cover the full spectrum of query types encountered in financial services. The training data includes 14 different PII types in multiple formats (structured and unstructured), Hinglish code-mixed queries (15% of dataset), and diverse prefix/suffix variations to prevent the model from memorizing specific patterns. The classification head is trained on top of the pre-trained encoder with a lower learning rate to preserve learned representations.
 - **Data:** 50,000 synthetic FSI examples (balanced, 12,500 per class)
 - **PII Types:** 14 (PAN, Aadhaar, phone, email, UPI, DOB, card, DL, voter, passport, address, IFSC)
 - **Input Formats:** Structured + unstructured (human-typed messy input)
+- **Languages:** English + Hinglish (15% code-mixed)
+- **Steps:** 8,000, batch=32, sequence length=128
+- **Learning Rate:** 2e-5 → 2e-6 (cosine schedule)
 - **Hardware:** NVIDIA L4, ~15 minutes
 - **Final Accuracy:** 99.2%
 ## Usage
+Using this model requires defining the BERT architecture (as the weights are stored as a PyTorch state_dict without the architecture). The model accepts tokenized input with an attention mask and returns logits for 4 classes. A softmax converts logits to probabilities, and the argmax gives the predicted label.
 ```python
 import torch
 import torch.nn.functional as F
 # Load tokenizer
 tokenizer = Tokenizer.from_file("deycoding.compliance-classifier-in-1-0.json")
+# Load model weights (requires architecture definition — see documentation)
 model.load_state_dict(torch.load("deycoding.compliance-classifier-in-1-0.pt", map_location="cpu"))
 model.eval()
 ## PII Detection Capabilities
+The model detects 14 types of Indian PII across both structured (properly formatted) and unstructured (human-typed, messy) inputs. This is critical for real-world deployment where users rarely type identifiers in perfect format. The model learned these patterns from training data rather than using regex rules, making it robust to variations, typos, and mixed-language input that would break traditional pattern matching.
+| PII Type | Structured Example | Unstructured Example |
+|----------|-------------------|---------------------|
 | PAN | ABCDE1234F | pan abcde1234f |
 | Aadhaar | 1234 5678 9012 | aadhar no 123456789012 |
 | Phone | +91-98765-43210 | my number is 9876543210 |
 | Voter ID | ABC1234567 | voter id ABC1234567 |
 | Address | Flat 4B, Tower 2, Koramangala | flat 4b tower 2 koramangala bangalore 560034 |
 | IFSC | SBIN0123456 | ifsc SBIN0123456 |
+| Name (contextual) | Amit Patel | amit patel ka balance |
 ## Intended Use
+This model is designed for deployment as a lightweight, low-latency routing layer in front of LLM serving infrastructure. The primary use cases are in regulated industries where data residency requirements mandate that certain queries (those containing PII) must be processed on local infrastructure, while non-sensitive queries can be routed to cheaper cross-region compute. The model enables organizations to enforce compliance at the architectural level rather than relying on policy-based controls.
 - Query routing in multi-tier LLM serving architectures
+- PII detection for data residency compliance (GDPR, RBI Data Localization, India DPDP Act)
+- Cost optimization — route simple queries to cheaper/smaller models (65-73% savings)
+- Financial services (banking, insurance, capital markets)
+- Healthcare (patient data routing)
+- Legal (privileged information detection)
 ## Limitations
+While the model achieves high accuracy on the test set, there are important limitations to consider before production deployment. The training data is synthetic (template-generated with variations), which means the model may not generalize perfectly to all real-world query patterns. Production deployment should include a feedback loop where misclassified queries are collected and used to improve subsequent training iterations.
+- Trained on synthetic data — recommend fine-tuning on real customer queries for production
+- English + Hinglish only — other Indian languages (Tamil, Telugu, Bengali, etc.) not covered
+- Max 128 tokens — very long queries get truncated, potentially losing PII at the end
+- PII detection is learned (not regex) — may miss novel PII formats not represented in training data
+- No entity extraction — model detects presence of PII but doesn't extract or mask specific values
+- Confidence calibration not verified — high confidence doesn't guarantee correctness on out-of-distribution inputs
 ## Ethical Considerations
+This model is designed to enhance privacy and compliance, not to circumvent it. The routing decisions enforce data residency by ensuring PII-containing queries physically cannot reach cross-region infrastructure. The model makes conservative decisions — when uncertain, it defaults to the most restrictive routing (local processing), preferring false positives (unnecessary local routing) over false negatives (PII leaking to cross-region).
 - Model makes routing decisions, not content decisions
+- PII detection is conservative (prefers false positive over missed PII)
 - Data residency enforcement is architectural — PII queries physically cannot reach cross-region infrastructure
+- No user data is stored or logged by the model itself
+- Model should be part of a defense-in-depth strategy, not the sole PII control
 ## Citation
+If you use this model in your research or product, please cite:
 ```bibtex
 @misc{dey2026classifier,
   title={Classifier-Gated Multi-Tier LLM Routing for Cost-Optimized Serving in Regulated Industries},
   author={Abhishek Dey},
   year={2026},
+  url={https://huggingface.co/deycoding/deycoding-compliance-classifier-router}
 }
 ```
 ## License
+This model is released under CC-BY-NC-4.0 (Creative Commons Attribution Non-Commercial 4.0 International). You are free to use, share, and adapt this model for non-commercial purposes with appropriate attribution. Commercial use requires written permission from the author. Contact for commercial licensing inquiries.