StableBridge Unified Pruner & Highlighter

A sentence-level relevance classifier for US stablecoin regulatory documents, optimized for RAG context compression.

Model Description

This model identifies relevant sentences in regulatory documents given a user query. It uses a frozen BGE-reranker-v2-m3 encoder with a trainable pruning head (525K parameters), following the Provence/Zilliz architecture pattern.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Input: "[CLS] query [SEP] sentence_1 sentence_2 ..."       │
│                            │                                │
│                            ▼                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │     FROZEN: BAAI/bge-reranker-v2-m3 (568M params)    │   │
│  │     Output: Token embeddings [batch, seq_len, 1024]  │   │
│  └──────────────────────────┬───────────────────────────┘   │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │     TRAINABLE: PruningHead MLP (525K params)         │   │
│  │     Linear(1024→512) → GELU → Dropout → Linear(512→1)│   │
│  └──────────────────────────┬───────────────────────────┘   │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │     AGGREGATION: Mean pooling over sentence tokens   │   │
│  │     Output: Per-sentence relevance scores [0, 1]     │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Key Features

Dual-use model: Works as both pruner (context compression) and highlighter (emphasis)
Efficient: Only 525K trainable parameters, frozen encoder
Long context: Supports up to 8192 tokens
Domain-specific: Trained on US stablecoin regulatory documents (GENIUS Act, STABLE Act, etc.)

Intended Uses

✅ Primary Use: Context Pruning (Recommended)

Remove low-relevance sentences before passing documents to an LLM:

# Threshold 0.6 → 90% recall, removes ~20% noise
pruned_sentences = [s for s, score in zip(sentences, scores) if score >= 0.6]

⚠️ Secondary Use: Highlighting (Limited)

Mark high-confidence relevant sentences for user emphasis:

# Threshold 0.9 → 48% recall, 27% precision
# Use only as "best guess", not authoritative
highlight_indices = [i for i, score in enumerate(scores) if score >= 0.9]

Performance

Metric	Value	Notes
F2 Score	0.3823	Optimized for recall
Recall	94.3%	Retains most relevant content
Precision	15.1%	High false positive rate
Best Epoch	7/10	Early stopping recommended

Threshold Recommendations

Use Case	Threshold	Recall	Precision	Effect
Aggressive Prune	0.5	93.6%	24.0%	Keeps 89% text
Balanced Prune	0.6	90.4%	25.7%	Keeps 80% text
Conservative Prune	0.7	73.6%	23.8%	Keeps 70% text
Highlight	0.9	48.0%	26.8%	Selective marking

How to Use

Installation

pip install "transformers>=4.41" "huggingface_hub>=0.27" torch

Important: Requires huggingface_hub >= 0.27 to avoid deprecated API errors.

Requirements

Library	Minimum Version	Notes
`transformers`	>= 4.41	XLMRobertaModel support
`huggingface_hub`	>= 0.27	No deprecated `use_auth_token`
`torch`	>= 2.0	CUDA support

pip install "transformers>=4.41" "huggingface_hub>=0.27" torch

Loading Notes

UNEXPECTED / MISSING keys on load: When loading AutoModel.from_pretrained("BAAI/bge-reranker-v2-m3"), you will see warnings about:

UNEXPECTED keys (classifier.out_proj.weight, classifier.out_proj.bias, classifier.dense.weight, classifier.dense.bias): These are the reranker's classification head weights. Since we load with AutoModel (encoder-only, no classification head), these extra keys are not consumed. This is completely normal — we only need the encoder backbone. Our separate PruningHead MLP replaces the original classification head.
MISSING keys (pooler.dense.weight, pooler.dense.bias): The AutoModel class expects a pooler layer, but the BGE-reranker checkpoint was trained without one. These get randomly initialized but are never used in our pipeline — we use last_hidden_state directly, not the pooled output. Safe to ignore.

Quick Start

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

class PruningHead(nn.Module):
    """Trainable classification head for sentence relevance."""
    def __init__(self, hidden_size=1024, intermediate_size=512, dropout=0.2):
        super().__init__()
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, intermediate_size),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(intermediate_size, 1),
        )
    
    def forward(self, embeddings):
        return self.classifier(embeddings).squeeze(-1)


class StableBridgePruner:
    """Unified Pruner and Highlighter for regulatory documents."""
    
    def __init__(self, device: str = 'cuda'):
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        
        # Load encoder (frozen)
        self.tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
        self.encoder = AutoModel.from_pretrained('BAAI/bge-reranker-v2-m3')
        self.encoder.to(self.device).eval()
        
        # Load pruning head
        checkpoint_path = hf_hub_download(
            repo_id="sugiv/stablebridge-pruner-highlighter",
            filename="best.pt"
        )
        self.head = PruningHead().to(self.device)
        checkpoint = torch.load(checkpoint_path, map_location=self.device, weights_only=False)
        self.head.load_state_dict(checkpoint['model_state_dict'])
        self.head.eval()
        
        # Default thresholds
        self.prune_threshold = 0.6
        self.highlight_threshold = 0.9
    
    @torch.no_grad()
    def get_sentence_scores(self, query: str, sentences: list[str]) -> list[float]:
        """
        Get relevance scores for each sentence given a query.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
        
        Returns:
            List of relevance scores in [0, 1] for each sentence
        """
        # Build document text with sentence boundaries tracked
        doc_text = ' '.join(sentences)
        text = f"{query} [SEP] {doc_text}"
        
        # Tokenize
        encoding = self.tokenizer(
            text,
            max_length=8192,
            truncation=True,
            return_tensors='pt',
            return_offsets_mapping=True
        )
        
        # Get token embeddings from encoder
        input_ids = encoding['input_ids'].to(self.device)
        attention_mask = encoding['attention_mask'].to(self.device)
        
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
        token_embeddings = outputs.last_hidden_state
        
        # Get token-level scores
        token_logits = self.head(token_embeddings)[0]  # [seq_len]
        token_scores = torch.sigmoid(token_logits).cpu().numpy()
        
        # Aggregate to sentence scores via mean pooling
        # (Simplified: uses character offset mapping for proper aggregation)
        offsets = encoding['offset_mapping'][0].numpy()
        
        sentence_scores = []
        char_pos = len(query) + 6  # Skip "[SEP] "
        
        for sent in sentences:
            sent_start = char_pos
            sent_end = char_pos + len(sent)
            
            # Find tokens within this sentence
            token_indices = []
            for idx, (start, end) in enumerate(offsets):
                if start >= sent_start and end <= sent_end and start != end:
                    token_indices.append(idx)
            
            if token_indices:
                sent_score = float(token_scores[token_indices].mean())
            else:
                sent_score = 0.5  # Default for unmapped sentences
            
            sentence_scores.append(sent_score)
            char_pos = sent_end + 1  # +1 for space
        
        return sentence_scores
    
    def prune(self, query: str, sentences: list[str], threshold: float = None) -> list[str]:
        """
        Remove low-relevance sentences from document.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
            threshold: Minimum score to keep (default: 0.6)
        
        Returns:
            List of sentences with score >= threshold
        """
        threshold = threshold or self.prune_threshold
        scores = self.get_sentence_scores(query, sentences)
        return [s for s, score in zip(sentences, scores) if score >= threshold]
    
    def highlight(self, query: str, sentences: list[str], threshold: float = None) -> list[int]:
        """
        Get indices of high-relevance sentences for highlighting.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
            threshold: Minimum score to highlight (default: 0.9)
        
        Returns:
            List of sentence indices with score >= threshold
        """
        threshold = threshold or self.highlight_threshold
        scores = self.get_sentence_scores(query, sentences)
        return [i for i, score in enumerate(scores) if score >= threshold]
    
    def score_document(self, query: str, sentences: list[str]) -> list[tuple[str, float]]:
        """
        Get all sentences with their relevance scores.
        
        Returns:
            List of (sentence, score) tuples
        """
        scores = self.get_sentence_scores(query, sentences)
        return list(zip(sentences, scores))


# Example usage
if __name__ == "__main__":
    # Initialize model
    model = StableBridgePruner(device='cuda')
    
    # Example regulatory text
    query = "What are the licensing requirements for stablecoin issuers?"
    sentences = [
        "The GENIUS Act establishes a comprehensive framework for payment stablecoin regulation.",
        "All payment stablecoin issuers must obtain a federal license from the appropriate regulator.",
        "The legislation was introduced in the 118th Congress.",
        "Issuers must maintain reserves equal to 100% of outstanding stablecoins.",
        "The weather in Washington D.C. was cloudy during the vote.",
        "State-chartered banks may issue stablecoins under state supervision.",
    ]
    
    # Get scores
    scores = model.get_sentence_scores(query, sentences)
    print("Sentence Scores:")
    for sent, score in zip(sentences, scores):
        print(f"  [{score:.3f}] {sent[:60]}...")
    
    # Prune (remove irrelevant)
    pruned = model.prune(query, sentences)
    print(f"\nPruned: {len(sentences)} → {len(pruned)} sentences")
    
    # Highlight (mark important)
    highlights = model.highlight(query, sentences)
    print(f"Highlighted indices: {highlights}")

Training Details

Dataset

Size: 10,006 examples (9,005 train / 1,001 validation)
Domain: US stablecoin regulatory documents
Documents: GENIUS Act, STABLE Act, OCC bulletins, SEC guidance, etc.
Labels: LLM-generated (Claude) sentence-level relevance annotations
Class balance: 105:1 negative:positive (intentional hard negatives)

Training Configuration

model:
  base_model: BAAI/bge-reranker-v2-m3
  freeze_encoder: true
  max_length: 8192
  max_sentences: 500

training:
  epochs: 10
  batch_size: 8
  gradient_accumulation_steps: 8  # Effective batch: 64
  learning_rate: 1e-4
  weight_decay: 0.02
  warmup_steps: 200
  scheduler: CosineAnnealingLR

loss:
  type: BCEWithLogitsLoss
  pos_weight: 70.0  # Compensate for 105:1 imbalance

Training Curve

Epoch	Train Loss	Precision	Recall	F2
1	1.492	15.5%	96.3%	0.380
2	1.170	15.3%	94.6%	0.381
3	1.086	14.3%	95.8%	0.368
4	1.027	13.7%	95.3%	0.364
5	1.003	14.0%	93.8%	0.369
6	0.986	13.8%	93.9%	0.362
7	0.976	15.1%	94.3%	0.382 ✓
8	0.960	13.6%	95.4%	0.364
9	0.957	14.2%	95.0%	0.371
10	0.955	14.2%	95.0%	0.371

Best checkpoint: Epoch 7 (F2=0.3823)

Hardware

GPU: NVIDIA H200-144GB NVL
Training time: ~17 hours
Memory usage: ~40GB VRAM

Limitations

Low precision (15%): Many false positives due to extreme class imbalance
Weak score separation: Only 0.059 difference between positive/negative means
Domain-specific: Trained only on US stablecoin regulatory documents
English only: No multilingual support

When NOT to Use

❌ As authoritative "highlighting" (precision too low)
❌ For non-regulatory content
❌ When perfect precision is required
❌ For documents outside the stablecoin/crypto regulation domain

Future Improvements

More training data: Generate 50K+ additional positive examples
Focal loss: Better handling of hard examples
Full Provence architecture: ModernBERT-large with [SENT] markers
Knowledge distillation: Train on GPT-4 soft labels

Files in this Repository

├── best.pt              # Model checkpoint (6.3MB)
├── config.yaml          # Training configuration
├── README.md            # This model card
└── requirements.txt     # Dependencies

Citation

@misc{stablebridge-pruner-2026,
  title={StableBridge Unified Pruner and Highlighter},
  author={Sugi Venugeethan},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/sugiv/stablebridge-pruner-highlighter}
}

Acknowledgments

Architecture inspired by Provence and Open-Provence
Base model: BAAI/bge-reranker-v2-m3
Training infrastructure: RunPod H200

License

Apache 2.0

Model trained: March 2026
Last updated: March 11, 2026

Downloads last month: 3

Model tree for sugiv/stablebridge-pruner-highlighter

Base model

BAAI/bge-reranker-v2-m3

Finetuned

(85)

this model

Evaluation results

F2 Score
self-reported

0.382
Recall
self-reported

0.943
Precision
self-reported

0.151