CoIn-Matching-Head

Pre-trained matching head models for the CoIn framework β€” a system for auditing hidden reasoning tokens in commercial LLM APIs.

Paper: CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

Code: GitHub

Model Description

This repository contains three pre-trained models used in the CoIn auditing pipeline:

1. Tokens2Block Matching Head (Model A)

  • Purpose: Verifies that sampled token IDs match their corresponding reasoning blocks
  • Architecture: sentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching head
  • Input: Token ID embeddings (mean-pooled) + reasoning block text embedding
  • Output: Match probability (0-1)

2. Block2Answer Matching Head (Model B)

  • Purpose: Verifies that each reasoning block is semantically relevant to the final answer
  • Architecture: sentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching head
  • Input: Reasoning block text embedding + answer text embedding
  • Output: Match probability (0-1)

3. DeepSet Verifier

  • Purpose: Aggregates per-block matching scores into a final benign/malicious prediction
  • Architecture: DeepSet (permutation-invariant set encoding)
  • Input: Sequence of interleaved (score_a, score_b) pairs from Model A and B
  • Output: Probability of the sample being benign (0-1)

Training Details

  • Base Embedding Model: sentence-transformers/all-MiniLM-L6-v2
  • Matching Head Type: Cosine similarity head (cos_sim)
  • Loss Function: Focal Loss
  • Optimizer: Adam
  • Learning Rate: 2e-5
  • Batch Size: 128
  • Epochs: 3
  • Random Seed: 42
  • Training Data: CoIn-Auditing-Dataset

Usage

Quick Start

from sentence_transformers import SentenceTransformer
import torch

# Load Model B (Block2Answer, block_size=256)
model_dir = "./matching_head_BlockToAnswer/256/train_all-MiniLM-L6-v2_mixed_pos_merged_4_domain_0.5_hard_easy_mixed_neg_4_domain_limit0_cos_sim_focal_freeze"
embedding_model = SentenceTransformer(f"{model_dir}/embedding_model", trust_remote_code=True)

# Load matching head
from heads import get_matching_head
embedding_dim = embedding_model.get_sentence_embedding_dimension()
matching_head = get_matching_head("cos_sim", embedding_dim)
matching_head.load_state_dict(torch.load(f"{model_dir}/matching_head.pt"))
matching_head.eval()

# Score a (reasoning_block, answer) pair
emb_block = embedding_model.encode("The derivative of x^2 is 2x...", convert_to_tensor=True)
emb_answer = embedding_model.encode("The answer is 2x.", convert_to_tensor=True)

features = {"embedding_a": emb_block.unsqueeze(0), "embedding_b": emb_answer.unsqueeze(0)}
with torch.no_grad():
    logits = matching_head(features)["logits"]
    score = torch.sigmoid(logits).item()
print(f"Match score: {score:.4f}")

Full Pipeline

See the GitHub repository for the complete CoIn pipeline usage.

File Structure

CoIn-Matching-Head/
β”œβ”€β”€ matching_head_TokensToBlock/         # Model A
β”‚   └── {256,512,1024}/                  # Block size variants
β”‚       └── train_.../
β”‚           β”œβ”€β”€ embedding_model/         # Sentence-transformers model
β”‚           β”œβ”€β”€ matching_head.pt         # Matching head weights
β”‚           └── tokenid_embedding_cache.pt
β”œβ”€β”€ matching_head_BlockToAnswer/         # Model B
β”‚   └── {256,512,1024}/
β”‚       └── train_.../
β”‚           β”œβ”€β”€ embedding_model/
β”‚           └── matching_head.pt
└── learned_verifier/
    β”œβ”€β”€ DeepSet/
    β”‚   β”œβ”€β”€ deepset_weight.pt            # DeepSet verifier weights
    β”‚   └── model_cfg.py                 # Model config
    └── RNN/                             # RNN verifier variant

Citation

@article{sun2025coin,
  title={Coin: Counting the invisible reasoning tokens in commercial opaque llm apis},
  author={Sun, Guoheng and Wang, Ziyao and Tian, Bowei and Liu, Meng and Shen, Zheyu and He, Shwai and He, Yexiao and Ye, Wanghao and Wang, Yiting and Li, Ang},
  journal={arXiv preprint arXiv:2505.13778},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train s1ghhh/CoIn-Matching-Head

Collection including s1ghhh/CoIn-Matching-Head

Paper for s1ghhh/CoIn-Matching-Head