ModernBERT with LoRA - US Stablecoin Regulatory Encoder

Production-ready encoder model for semantic search over US stablecoin regulatory documents.

Fine-tuned from answerdotai/ModernBERT-base using LoRA on 10,260 synthetic query-document triplets.

🎯 Performance Highlights

Metric Score vs Base Model
NDCG@10 0.9236 +517% (5.2x better) πŸ”₯
MRR@10 0.8991 +714% (7.1x better) πŸ”₯
Recall@10 0.9961 +250% (2.5x better) πŸ”₯
Recall@100 1.0000 Perfect - never misses docs
  • βœ… 92.3% of queries improved (9,468 out of 10,260)
  • βœ… Statistically significant: p < 0.001
  • βœ… 1ms/query inference on A100 GPU
  • βœ… 8.8MB adapter (not 149MB full model)

πŸš€ Quick Start

pip install "transformers>=4.48" "peft>=0.14" "huggingface_hub>=0.27" torch

Important: ModernBERT requires transformers >= 4.48. Older versions will fail with KeyError: 'modernbert'.

Requirements

Library Minimum Version Notes
transformers >= 4.48 ModernBERT architecture support
peft >= 0.14 Compatible hf_hub_download API
huggingface_hub >= 0.27 No deprecated use_auth_token
torch >= 2.0 CUDA support
pip install "transformers>=4.48" "peft>=0.14" "huggingface_hub>=0.27" torch

Loading Notes

Tokenizer: Load from the base model (answerdotai/ModernBERT-base), NOT from this adapter repo. The adapter repo stores LoRA weights only; the tokenizer lives with the base model.

UNEXPECTED keys on load: When loading AutoModel.from_pretrained("answerdotai/ModernBERT-base"), you may see warnings about UNEXPECTED keys (head.norm.weight, head.dense.weight, decoder.bias). These are the base model's MLM (masked language model) head weights that exist in the pretrained checkpoint but are not used by AutoModel (which loads only the encoder backbone). This is completely normal and safe to ignore.

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch

# Load model
base_model = AutoModel.from_pretrained(
    "answerdotai/ModernBERT-base",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base_model, 
    "sugiv/modernbert-us-stablecoin-encoder"
)
model.eval()

# IMPORTANT: Load tokenizer from BASE MODEL, not from adapter repo
tokenizer = AutoTokenizer.from_pretrained(
    "answerdotai/ModernBERT-base",
    trust_remote_code=True
)

def encode(text, max_length=512):
    inputs = tokenizer(
        text, padding=True, truncation=True,
        max_length=max_length, return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1)
        embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)
    return embeddings

# Example
query = "What are reserve requirements for stablecoin issuers?"
query_emb = encode(query)

πŸ“Š Training Details

Model Architecture

  • Base: answerdotai/ModernBERT-base (149M params)
  • Method: LoRA (rank=16, alpha=32, dropout=0.1)
  • Targets: Wqkv, Wo (attention projections)
  • Trainable: 2.3M params (1.52% of base)

Training Data

  • 10,260 triplets (8,208 train / 2,052 val)
  • 38 documents: US stablecoin regulations
  • Query types: Factual, policy, comparison, compliance, interpretive
  • Negatives: 3 hard negatives per query (BM25)

Training Config

  • GPU: NVIDIA A100-80GB
  • Time: 6 minutes (early stopped at Epoch 1)
  • Batch size: 16 Γ— 2 grad accumulation = 32 effective
  • Learning rate: 2e-4 with cosine schedule
  • Loss: InfoNCE (temp=0.05)
  • Early stopping: NDCG@10 β‰₯ 0.75 (achieved 0.8472)

πŸŽ“ Domain Specialization

Trained to understand US stablecoin regulatory concepts:

  • Federal Reserve Act, Dodd-Frank, Bank Holding Company Act
  • CFTC, OCC, FSOC, NY DFS BitLicense
  • Payment Stablecoin Act, STABLE Act
  • Reserve requirements, redemption rights, custody standards
  • Qualified Stablecoin Issuer (QSI)

πŸ“ˆ Use Cases

  1. RAG for Regulatory Q&A - Retrieve context for LLMs
  2. Compliance Search - Find relevant regulations
  3. Legal Research - Cross-reference requirements
  4. Policy Analysis - Compare regulatory frameworks

πŸ” Evaluation

Metrics Explained:

  • NDCG@10 (0.9236): 92% as good as perfect ranking in top 10
  • MRR@10 (0.8991): First result at avg position 1.11
  • Recall@100 (1.0000): Never misses relevant docs

Validation: 10,260 queries Γ— 38 docs = full corpus ranking

🚨 Limitations

  • Domain-specific (US stablecoin regulations only)
  • Small corpus (38 documents)
  • English only
  • Snapshot from March 2026
  • 1.9% queries slightly degraded vs base

πŸ“œ License

Apache 2.0 (following ModernBERT-base)

πŸ™ Acknowledgments


Status: 🟒 Production Ready
Updated: March 10, 2026
Version: 1.0.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sugiv/modernbert-us-stablecoin-encoder

Adapter
(22)
this model