StableBridge Unified Pruner & Highlighter
A sentence-level relevance classifier for US stablecoin regulatory documents, optimized for RAG context compression.
Model Description
This model identifies relevant sentences in regulatory documents given a user query. It uses a frozen BGE-reranker-v2-m3 encoder with a trainable pruning head (525K parameters), following the Provence/Zilliz architecture pattern.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input: "[CLS] query [SEP] sentence_1 sentence_2 ..." β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FROZEN: BAAI/bge-reranker-v2-m3 (568M params) β β
β β Output: Token embeddings [batch, seq_len, 1024] β β
β ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TRAINABLE: PruningHead MLP (525K params) β β
β β Linear(1024β512) β GELU β Dropout β Linear(512β1)β β
β ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AGGREGATION: Mean pooling over sentence tokens β β
β β Output: Per-sentence relevance scores [0, 1] β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features
- Dual-use model: Works as both pruner (context compression) and highlighter (emphasis)
- Efficient: Only 525K trainable parameters, frozen encoder
- Long context: Supports up to 8192 tokens
- Domain-specific: Trained on US stablecoin regulatory documents (GENIUS Act, STABLE Act, etc.)
Intended Uses
β Primary Use: Context Pruning (Recommended)
Remove low-relevance sentences before passing documents to an LLM:
# Threshold 0.6 β 90% recall, removes ~20% noise
pruned_sentences = [s for s, score in zip(sentences, scores) if score >= 0.6]
β οΈ Secondary Use: Highlighting (Limited)
Mark high-confidence relevant sentences for user emphasis:
# Threshold 0.9 β 48% recall, 27% precision
# Use only as "best guess", not authoritative
highlight_indices = [i for i, score in enumerate(scores) if score >= 0.9]
Performance
| Metric | Value | Notes |
|---|---|---|
| F2 Score | 0.3823 | Optimized for recall |
| Recall | 94.3% | Retains most relevant content |
| Precision | 15.1% | High false positive rate |
| Best Epoch | 7/10 | Early stopping recommended |
Threshold Recommendations
| Use Case | Threshold | Recall | Precision | Effect |
|---|---|---|---|---|
| Aggressive Prune | 0.5 | 93.6% | 24.0% | Keeps 89% text |
| Balanced Prune | 0.6 | 90.4% | 25.7% | Keeps 80% text |
| Conservative Prune | 0.7 | 73.6% | 23.8% | Keeps 70% text |
| Highlight | 0.9 | 48.0% | 26.8% | Selective marking |
How to Use
Installation
pip install "transformers>=4.41" "huggingface_hub>=0.27" torch
Important: Requires
huggingface_hub >= 0.27to avoid deprecated API errors.
Requirements
| Library | Minimum Version | Notes |
|---|---|---|
transformers |
>= 4.41 | XLMRobertaModel support |
huggingface_hub |
>= 0.27 | No deprecated use_auth_token |
torch |
>= 2.0 | CUDA support |
pip install "transformers>=4.41" "huggingface_hub>=0.27" torch
Loading Notes
UNEXPECTED / MISSING keys on load: When loading AutoModel.from_pretrained("BAAI/bge-reranker-v2-m3"), you will see warnings about:
UNEXPECTED keys (
classifier.out_proj.weight,classifier.out_proj.bias,classifier.dense.weight,classifier.dense.bias): These are the reranker's classification head weights. Since we load withAutoModel(encoder-only, no classification head), these extra keys are not consumed. This is completely normal β we only need the encoder backbone. Our separatePruningHeadMLP replaces the original classification head.MISSING keys (
pooler.dense.weight,pooler.dense.bias): TheAutoModelclass expects a pooler layer, but the BGE-reranker checkpoint was trained without one. These get randomly initialized but are never used in our pipeline β we uselast_hidden_statedirectly, not the pooled output. Safe to ignore.
Quick Start
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
class PruningHead(nn.Module):
"""Trainable classification head for sentence relevance."""
def __init__(self, hidden_size=1024, intermediate_size=512, dropout=0.2):
super().__init__()
self.classifier = nn.Sequential(
nn.Linear(hidden_size, intermediate_size),
nn.GELU(),
nn.Dropout(dropout),
nn.Linear(intermediate_size, 1),
)
def forward(self, embeddings):
return self.classifier(embeddings).squeeze(-1)
class StableBridgePruner:
"""Unified Pruner and Highlighter for regulatory documents."""
def __init__(self, device: str = 'cuda'):
self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
# Load encoder (frozen)
self.tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
self.encoder = AutoModel.from_pretrained('BAAI/bge-reranker-v2-m3')
self.encoder.to(self.device).eval()
# Load pruning head
checkpoint_path = hf_hub_download(
repo_id="sugiv/stablebridge-pruner-highlighter",
filename="best.pt"
)
self.head = PruningHead().to(self.device)
checkpoint = torch.load(checkpoint_path, map_location=self.device, weights_only=False)
self.head.load_state_dict(checkpoint['model_state_dict'])
self.head.eval()
# Default thresholds
self.prune_threshold = 0.6
self.highlight_threshold = 0.9
@torch.no_grad()
def get_sentence_scores(self, query: str, sentences: list[str]) -> list[float]:
"""
Get relevance scores for each sentence given a query.
Args:
query: User's search query
sentences: List of sentences from the document
Returns:
List of relevance scores in [0, 1] for each sentence
"""
# Build document text with sentence boundaries tracked
doc_text = ' '.join(sentences)
text = f"{query} [SEP] {doc_text}"
# Tokenize
encoding = self.tokenizer(
text,
max_length=8192,
truncation=True,
return_tensors='pt',
return_offsets_mapping=True
)
# Get token embeddings from encoder
input_ids = encoding['input_ids'].to(self.device)
attention_mask = encoding['attention_mask'].to(self.device)
outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
token_embeddings = outputs.last_hidden_state
# Get token-level scores
token_logits = self.head(token_embeddings)[0] # [seq_len]
token_scores = torch.sigmoid(token_logits).cpu().numpy()
# Aggregate to sentence scores via mean pooling
# (Simplified: uses character offset mapping for proper aggregation)
offsets = encoding['offset_mapping'][0].numpy()
sentence_scores = []
char_pos = len(query) + 6 # Skip "[SEP] "
for sent in sentences:
sent_start = char_pos
sent_end = char_pos + len(sent)
# Find tokens within this sentence
token_indices = []
for idx, (start, end) in enumerate(offsets):
if start >= sent_start and end <= sent_end and start != end:
token_indices.append(idx)
if token_indices:
sent_score = float(token_scores[token_indices].mean())
else:
sent_score = 0.5 # Default for unmapped sentences
sentence_scores.append(sent_score)
char_pos = sent_end + 1 # +1 for space
return sentence_scores
def prune(self, query: str, sentences: list[str], threshold: float = None) -> list[str]:
"""
Remove low-relevance sentences from document.
Args:
query: User's search query
sentences: List of sentences from the document
threshold: Minimum score to keep (default: 0.6)
Returns:
List of sentences with score >= threshold
"""
threshold = threshold or self.prune_threshold
scores = self.get_sentence_scores(query, sentences)
return [s for s, score in zip(sentences, scores) if score >= threshold]
def highlight(self, query: str, sentences: list[str], threshold: float = None) -> list[int]:
"""
Get indices of high-relevance sentences for highlighting.
Args:
query: User's search query
sentences: List of sentences from the document
threshold: Minimum score to highlight (default: 0.9)
Returns:
List of sentence indices with score >= threshold
"""
threshold = threshold or self.highlight_threshold
scores = self.get_sentence_scores(query, sentences)
return [i for i, score in enumerate(scores) if score >= threshold]
def score_document(self, query: str, sentences: list[str]) -> list[tuple[str, float]]:
"""
Get all sentences with their relevance scores.
Returns:
List of (sentence, score) tuples
"""
scores = self.get_sentence_scores(query, sentences)
return list(zip(sentences, scores))
# Example usage
if __name__ == "__main__":
# Initialize model
model = StableBridgePruner(device='cuda')
# Example regulatory text
query = "What are the licensing requirements for stablecoin issuers?"
sentences = [
"The GENIUS Act establishes a comprehensive framework for payment stablecoin regulation.",
"All payment stablecoin issuers must obtain a federal license from the appropriate regulator.",
"The legislation was introduced in the 118th Congress.",
"Issuers must maintain reserves equal to 100% of outstanding stablecoins.",
"The weather in Washington D.C. was cloudy during the vote.",
"State-chartered banks may issue stablecoins under state supervision.",
]
# Get scores
scores = model.get_sentence_scores(query, sentences)
print("Sentence Scores:")
for sent, score in zip(sentences, scores):
print(f" [{score:.3f}] {sent[:60]}...")
# Prune (remove irrelevant)
pruned = model.prune(query, sentences)
print(f"\nPruned: {len(sentences)} β {len(pruned)} sentences")
# Highlight (mark important)
highlights = model.highlight(query, sentences)
print(f"Highlighted indices: {highlights}")
Training Details
Dataset
- Size: 10,006 examples (9,005 train / 1,001 validation)
- Domain: US stablecoin regulatory documents
- Documents: GENIUS Act, STABLE Act, OCC bulletins, SEC guidance, etc.
- Labels: LLM-generated (Claude) sentence-level relevance annotations
- Class balance: 105:1 negative:positive (intentional hard negatives)
Training Configuration
model:
base_model: BAAI/bge-reranker-v2-m3
freeze_encoder: true
max_length: 8192
max_sentences: 500
training:
epochs: 10
batch_size: 8
gradient_accumulation_steps: 8 # Effective batch: 64
learning_rate: 1e-4
weight_decay: 0.02
warmup_steps: 200
scheduler: CosineAnnealingLR
loss:
type: BCEWithLogitsLoss
pos_weight: 70.0 # Compensate for 105:1 imbalance
Training Curve
| Epoch | Train Loss | Precision | Recall | F2 |
|---|---|---|---|---|
| 1 | 1.492 | 15.5% | 96.3% | 0.380 |
| 2 | 1.170 | 15.3% | 94.6% | 0.381 |
| 3 | 1.086 | 14.3% | 95.8% | 0.368 |
| 4 | 1.027 | 13.7% | 95.3% | 0.364 |
| 5 | 1.003 | 14.0% | 93.8% | 0.369 |
| 6 | 0.986 | 13.8% | 93.9% | 0.362 |
| 7 | 0.976 | 15.1% | 94.3% | 0.382 β |
| 8 | 0.960 | 13.6% | 95.4% | 0.364 |
| 9 | 0.957 | 14.2% | 95.0% | 0.371 |
| 10 | 0.955 | 14.2% | 95.0% | 0.371 |
Best checkpoint: Epoch 7 (F2=0.3823)
Hardware
- GPU: NVIDIA H200-144GB NVL
- Training time: ~17 hours
- Memory usage: ~40GB VRAM
Limitations
- Low precision (15%): Many false positives due to extreme class imbalance
- Weak score separation: Only 0.059 difference between positive/negative means
- Domain-specific: Trained only on US stablecoin regulatory documents
- English only: No multilingual support
When NOT to Use
- β As authoritative "highlighting" (precision too low)
- β For non-regulatory content
- β When perfect precision is required
- β For documents outside the stablecoin/crypto regulation domain
Future Improvements
- More training data: Generate 50K+ additional positive examples
- Focal loss: Better handling of hard examples
- Full Provence architecture: ModernBERT-large with [SENT] markers
- Knowledge distillation: Train on GPT-4 soft labels
Files in this Repository
βββ best.pt # Model checkpoint (6.3MB)
βββ config.yaml # Training configuration
βββ README.md # This model card
βββ requirements.txt # Dependencies
Citation
@misc{stablebridge-pruner-2026,
title={StableBridge Unified Pruner and Highlighter},
author={Sugi Venugeethan},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/sugiv/stablebridge-pruner-highlighter}
}
Acknowledgments
- Architecture inspired by Provence and Open-Provence
- Base model: BAAI/bge-reranker-v2-m3
- Training infrastructure: RunPod H200
License
Apache 2.0
Model trained: March 2026
Last updated: March 11, 2026
- Downloads last month
- 7
Model tree for sugiv/stablebridge-pruner-highlighter
Base model
BAAI/bge-reranker-v2-m3Evaluation results
- F2 Scoreself-reported0.382
- Recallself-reported0.943
- Precisionself-reported0.151