StableBridge Unified Pruner & Highlighter

A sentence-level relevance classifier for US stablecoin regulatory documents, optimized for RAG context compression.

Model Description

This model identifies relevant sentences in regulatory documents given a user query. It uses a frozen BGE-reranker-v2-m3 encoder with a trainable pruning head (525K parameters), following the Provence/Zilliz architecture pattern.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Input: "[CLS] query [SEP] sentence_1 sentence_2 ..."       β”‚
β”‚                            β”‚                                β”‚
β”‚                            β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚     FROZEN: BAAI/bge-reranker-v2-m3 (568M params)    β”‚   β”‚
β”‚  β”‚     Output: Token embeddings [batch, seq_len, 1024]  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                             β”‚                               β”‚
β”‚                             β–Ό                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚     TRAINABLE: PruningHead MLP (525K params)         β”‚   β”‚
β”‚  β”‚     Linear(1024β†’512) β†’ GELU β†’ Dropout β†’ Linear(512β†’1)β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                             β”‚                               β”‚
β”‚                             β–Ό                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚     AGGREGATION: Mean pooling over sentence tokens   β”‚   β”‚
β”‚  β”‚     Output: Per-sentence relevance scores [0, 1]     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features

  • Dual-use model: Works as both pruner (context compression) and highlighter (emphasis)
  • Efficient: Only 525K trainable parameters, frozen encoder
  • Long context: Supports up to 8192 tokens
  • Domain-specific: Trained on US stablecoin regulatory documents (GENIUS Act, STABLE Act, etc.)

Intended Uses

βœ… Primary Use: Context Pruning (Recommended)

Remove low-relevance sentences before passing documents to an LLM:

# Threshold 0.6 β†’ 90% recall, removes ~20% noise
pruned_sentences = [s for s, score in zip(sentences, scores) if score >= 0.6]

⚠️ Secondary Use: Highlighting (Limited)

Mark high-confidence relevant sentences for user emphasis:

# Threshold 0.9 β†’ 48% recall, 27% precision
# Use only as "best guess", not authoritative
highlight_indices = [i for i, score in enumerate(scores) if score >= 0.9]

Performance

Metric Value Notes
F2 Score 0.3823 Optimized for recall
Recall 94.3% Retains most relevant content
Precision 15.1% High false positive rate
Best Epoch 7/10 Early stopping recommended

Threshold Recommendations

Use Case Threshold Recall Precision Effect
Aggressive Prune 0.5 93.6% 24.0% Keeps 89% text
Balanced Prune 0.6 90.4% 25.7% Keeps 80% text
Conservative Prune 0.7 73.6% 23.8% Keeps 70% text
Highlight 0.9 48.0% 26.8% Selective marking

How to Use

Installation

pip install "transformers>=4.41" "huggingface_hub>=0.27" torch

Important: Requires huggingface_hub >= 0.27 to avoid deprecated API errors.

Requirements

Library Minimum Version Notes
transformers >= 4.41 XLMRobertaModel support
huggingface_hub >= 0.27 No deprecated use_auth_token
torch >= 2.0 CUDA support
pip install "transformers>=4.41" "huggingface_hub>=0.27" torch

Loading Notes

UNEXPECTED / MISSING keys on load: When loading AutoModel.from_pretrained("BAAI/bge-reranker-v2-m3"), you will see warnings about:

  • UNEXPECTED keys (classifier.out_proj.weight, classifier.out_proj.bias, classifier.dense.weight, classifier.dense.bias): These are the reranker's classification head weights. Since we load with AutoModel (encoder-only, no classification head), these extra keys are not consumed. This is completely normal β€” we only need the encoder backbone. Our separate PruningHead MLP replaces the original classification head.

  • MISSING keys (pooler.dense.weight, pooler.dense.bias): The AutoModel class expects a pooler layer, but the BGE-reranker checkpoint was trained without one. These get randomly initialized but are never used in our pipeline β€” we use last_hidden_state directly, not the pooled output. Safe to ignore.

Quick Start

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

class PruningHead(nn.Module):
    """Trainable classification head for sentence relevance."""
    def __init__(self, hidden_size=1024, intermediate_size=512, dropout=0.2):
        super().__init__()
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, intermediate_size),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(intermediate_size, 1),
        )
    
    def forward(self, embeddings):
        return self.classifier(embeddings).squeeze(-1)


class StableBridgePruner:
    """Unified Pruner and Highlighter for regulatory documents."""
    
    def __init__(self, device: str = 'cuda'):
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        
        # Load encoder (frozen)
        self.tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
        self.encoder = AutoModel.from_pretrained('BAAI/bge-reranker-v2-m3')
        self.encoder.to(self.device).eval()
        
        # Load pruning head
        checkpoint_path = hf_hub_download(
            repo_id="sugiv/stablebridge-pruner-highlighter",
            filename="best.pt"
        )
        self.head = PruningHead().to(self.device)
        checkpoint = torch.load(checkpoint_path, map_location=self.device, weights_only=False)
        self.head.load_state_dict(checkpoint['model_state_dict'])
        self.head.eval()
        
        # Default thresholds
        self.prune_threshold = 0.6
        self.highlight_threshold = 0.9
    
    @torch.no_grad()
    def get_sentence_scores(self, query: str, sentences: list[str]) -> list[float]:
        """
        Get relevance scores for each sentence given a query.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
        
        Returns:
            List of relevance scores in [0, 1] for each sentence
        """
        # Build document text with sentence boundaries tracked
        doc_text = ' '.join(sentences)
        text = f"{query} [SEP] {doc_text}"
        
        # Tokenize
        encoding = self.tokenizer(
            text,
            max_length=8192,
            truncation=True,
            return_tensors='pt',
            return_offsets_mapping=True
        )
        
        # Get token embeddings from encoder
        input_ids = encoding['input_ids'].to(self.device)
        attention_mask = encoding['attention_mask'].to(self.device)
        
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
        token_embeddings = outputs.last_hidden_state
        
        # Get token-level scores
        token_logits = self.head(token_embeddings)[0]  # [seq_len]
        token_scores = torch.sigmoid(token_logits).cpu().numpy()
        
        # Aggregate to sentence scores via mean pooling
        # (Simplified: uses character offset mapping for proper aggregation)
        offsets = encoding['offset_mapping'][0].numpy()
        
        sentence_scores = []
        char_pos = len(query) + 6  # Skip "[SEP] "
        
        for sent in sentences:
            sent_start = char_pos
            sent_end = char_pos + len(sent)
            
            # Find tokens within this sentence
            token_indices = []
            for idx, (start, end) in enumerate(offsets):
                if start >= sent_start and end <= sent_end and start != end:
                    token_indices.append(idx)
            
            if token_indices:
                sent_score = float(token_scores[token_indices].mean())
            else:
                sent_score = 0.5  # Default for unmapped sentences
            
            sentence_scores.append(sent_score)
            char_pos = sent_end + 1  # +1 for space
        
        return sentence_scores
    
    def prune(self, query: str, sentences: list[str], threshold: float = None) -> list[str]:
        """
        Remove low-relevance sentences from document.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
            threshold: Minimum score to keep (default: 0.6)
        
        Returns:
            List of sentences with score >= threshold
        """
        threshold = threshold or self.prune_threshold
        scores = self.get_sentence_scores(query, sentences)
        return [s for s, score in zip(sentences, scores) if score >= threshold]
    
    def highlight(self, query: str, sentences: list[str], threshold: float = None) -> list[int]:
        """
        Get indices of high-relevance sentences for highlighting.
        
        Args:
            query: User's search query
            sentences: List of sentences from the document
            threshold: Minimum score to highlight (default: 0.9)
        
        Returns:
            List of sentence indices with score >= threshold
        """
        threshold = threshold or self.highlight_threshold
        scores = self.get_sentence_scores(query, sentences)
        return [i for i, score in enumerate(scores) if score >= threshold]
    
    def score_document(self, query: str, sentences: list[str]) -> list[tuple[str, float]]:
        """
        Get all sentences with their relevance scores.
        
        Returns:
            List of (sentence, score) tuples
        """
        scores = self.get_sentence_scores(query, sentences)
        return list(zip(sentences, scores))


# Example usage
if __name__ == "__main__":
    # Initialize model
    model = StableBridgePruner(device='cuda')
    
    # Example regulatory text
    query = "What are the licensing requirements for stablecoin issuers?"
    sentences = [
        "The GENIUS Act establishes a comprehensive framework for payment stablecoin regulation.",
        "All payment stablecoin issuers must obtain a federal license from the appropriate regulator.",
        "The legislation was introduced in the 118th Congress.",
        "Issuers must maintain reserves equal to 100% of outstanding stablecoins.",
        "The weather in Washington D.C. was cloudy during the vote.",
        "State-chartered banks may issue stablecoins under state supervision.",
    ]
    
    # Get scores
    scores = model.get_sentence_scores(query, sentences)
    print("Sentence Scores:")
    for sent, score in zip(sentences, scores):
        print(f"  [{score:.3f}] {sent[:60]}...")
    
    # Prune (remove irrelevant)
    pruned = model.prune(query, sentences)
    print(f"\nPruned: {len(sentences)} β†’ {len(pruned)} sentences")
    
    # Highlight (mark important)
    highlights = model.highlight(query, sentences)
    print(f"Highlighted indices: {highlights}")

Training Details

Dataset

  • Size: 10,006 examples (9,005 train / 1,001 validation)
  • Domain: US stablecoin regulatory documents
  • Documents: GENIUS Act, STABLE Act, OCC bulletins, SEC guidance, etc.
  • Labels: LLM-generated (Claude) sentence-level relevance annotations
  • Class balance: 105:1 negative:positive (intentional hard negatives)

Training Configuration

model:
  base_model: BAAI/bge-reranker-v2-m3
  freeze_encoder: true
  max_length: 8192
  max_sentences: 500

training:
  epochs: 10
  batch_size: 8
  gradient_accumulation_steps: 8  # Effective batch: 64
  learning_rate: 1e-4
  weight_decay: 0.02
  warmup_steps: 200
  scheduler: CosineAnnealingLR

loss:
  type: BCEWithLogitsLoss
  pos_weight: 70.0  # Compensate for 105:1 imbalance

Training Curve

Epoch Train Loss Precision Recall F2
1 1.492 15.5% 96.3% 0.380
2 1.170 15.3% 94.6% 0.381
3 1.086 14.3% 95.8% 0.368
4 1.027 13.7% 95.3% 0.364
5 1.003 14.0% 93.8% 0.369
6 0.986 13.8% 93.9% 0.362
7 0.976 15.1% 94.3% 0.382 βœ“
8 0.960 13.6% 95.4% 0.364
9 0.957 14.2% 95.0% 0.371
10 0.955 14.2% 95.0% 0.371

Best checkpoint: Epoch 7 (F2=0.3823)

Hardware

  • GPU: NVIDIA H200-144GB NVL
  • Training time: ~17 hours
  • Memory usage: ~40GB VRAM

Limitations

  1. Low precision (15%): Many false positives due to extreme class imbalance
  2. Weak score separation: Only 0.059 difference between positive/negative means
  3. Domain-specific: Trained only on US stablecoin regulatory documents
  4. English only: No multilingual support

When NOT to Use

  • ❌ As authoritative "highlighting" (precision too low)
  • ❌ For non-regulatory content
  • ❌ When perfect precision is required
  • ❌ For documents outside the stablecoin/crypto regulation domain

Future Improvements

  1. More training data: Generate 50K+ additional positive examples
  2. Focal loss: Better handling of hard examples
  3. Full Provence architecture: ModernBERT-large with [SENT] markers
  4. Knowledge distillation: Train on GPT-4 soft labels

Files in this Repository

β”œβ”€β”€ best.pt              # Model checkpoint (6.3MB)
β”œβ”€β”€ config.yaml          # Training configuration
β”œβ”€β”€ README.md            # This model card
└── requirements.txt     # Dependencies

Citation

@misc{stablebridge-pruner-2026,
  title={StableBridge Unified Pruner and Highlighter},
  author={Sugi Venugeethan},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/sugiv/stablebridge-pruner-highlighter}
}

Acknowledgments

License

Apache 2.0


Model trained: March 2026
Last updated: March 11, 2026

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sugiv/stablebridge-pruner-highlighter

Finetuned
(53)
this model

Evaluation results