ModernBERT with LoRA - US Stablecoin Regulatory Encoder

Production-ready encoder model for semantic search over US stablecoin regulatory documents.

Fine-tuned from answerdotai/ModernBERT-base using LoRA on 10,260 synthetic query-document triplets.

🎯 Performance Highlights

Metric	Score	vs Base Model
NDCG@10	0.9236	+517% (5.2x better) 🔥
MRR@10	0.8991	+714% (7.1x better) 🔥
Recall@10	0.9961	+250% (2.5x better) 🔥
Recall@100	1.0000	Perfect - never misses docs

✅ 92.3% of queries improved (9,468 out of 10,260)
✅ Statistically significant: p < 0.001
✅ 1ms/query inference on A100 GPU
✅ 8.8MB adapter (not 149MB full model)

🚀 Quick Start

pip install "transformers>=4.48" "peft>=0.14" "huggingface_hub>=0.27" torch

Important: ModernBERT requires transformers >= 4.48. Older versions will fail with KeyError: 'modernbert'.

Requirements

Library	Minimum Version	Notes
`transformers`	>= 4.48	ModernBERT architecture support
`peft`	>= 0.14	Compatible `hf_hub_download` API
`huggingface_hub`	>= 0.27	No deprecated `use_auth_token`
`torch`	>= 2.0	CUDA support

pip install "transformers>=4.48" "peft>=0.14" "huggingface_hub>=0.27" torch

Loading Notes

Tokenizer: Load from the base model (answerdotai/ModernBERT-base), NOT from this adapter repo. The adapter repo stores LoRA weights only; the tokenizer lives with the base model.

UNEXPECTED keys on load: When loading AutoModel.from_pretrained("answerdotai/ModernBERT-base"), you may see warnings about UNEXPECTED keys (head.norm.weight, head.dense.weight, decoder.bias). These are the base model's MLM (masked language model) head weights that exist in the pretrained checkpoint but are not used by AutoModel (which loads only the encoder backbone). This is completely normal and safe to ignore.

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch

# Load model
base_model = AutoModel.from_pretrained(
    "answerdotai/ModernBERT-base",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base_model, 
    "sugiv/modernbert-us-stablecoin-encoder"
)
model.eval()

# IMPORTANT: Load tokenizer from BASE MODEL, not from adapter repo
tokenizer = AutoTokenizer.from_pretrained(
    "answerdotai/ModernBERT-base",
    trust_remote_code=True
)

def encode(text, max_length=512):
    inputs = tokenizer(
        text, padding=True, truncation=True,
        max_length=max_length, return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1)
        embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)
    return embeddings

# Example
query = "What are reserve requirements for stablecoin issuers?"
query_emb = encode(query)

📊 Training Details

Model Architecture

Base: answerdotai/ModernBERT-base (149M params)
Method: LoRA (rank=16, alpha=32, dropout=0.1)
Targets: Wqkv, Wo (attention projections)
Trainable: 2.3M params (1.52% of base)

Training Data

10,260 triplets (8,208 train / 2,052 val)
38 documents: US stablecoin regulations
Query types: Factual, policy, comparison, compliance, interpretive
Negatives: 3 hard negatives per query (BM25)

Training Config

GPU: NVIDIA A100-80GB
Time: 6 minutes (early stopped at Epoch 1)
Batch size: 16 × 2 grad accumulation = 32 effective
Learning rate: 2e-4 with cosine schedule
Loss: InfoNCE (temp=0.05)
Early stopping: NDCG@10 ≥ 0.75 (achieved 0.8472)

🎓 Domain Specialization

Trained to understand US stablecoin regulatory concepts:

Federal Reserve Act, Dodd-Frank, Bank Holding Company Act
CFTC, OCC, FSOC, NY DFS BitLicense
Payment Stablecoin Act, STABLE Act
Reserve requirements, redemption rights, custody standards
Qualified Stablecoin Issuer (QSI)

📈 Use Cases

RAG for Regulatory Q&A - Retrieve context for LLMs
Compliance Search - Find relevant regulations
Legal Research - Cross-reference requirements
Policy Analysis - Compare regulatory frameworks

🔍 Evaluation

Metrics Explained:

NDCG@10 (0.9236): 92% as good as perfect ranking in top 10
MRR@10 (0.8991): First result at avg position 1.11
Recall@100 (1.0000): Never misses relevant docs

Validation: 10,260 queries × 38 docs = full corpus ranking

🚨 Limitations

Domain-specific (US stablecoin regulations only)
Small corpus (38 documents)
English only
Snapshot from March 2026
1.9% queries slightly degraded vs base

📜 License

Apache 2.0 (following ModernBERT-base)

🙏 Acknowledgments

Base: answerdotai/ModernBERT-base
Framework: HuggingFace Transformers + PEFT
Data: Qwen API synthetic queries

Status: 🟢 Production Ready
Updated: March 10, 2026
Version: 1.0.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sugiv/modernbert-us-stablecoin-encoder

Base model

answerdotai/ModernBERT-base

Adapter

(38)

this model