CoIn-Matching-Head

Pre-trained matching head models for the CoIn framework — a system for auditing hidden reasoning tokens in commercial LLM APIs.

Paper: CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

Code: GitHub

Model Description

This repository contains three pre-trained models used in the CoIn auditing pipeline:

1. Tokens2Block Matching Head (Model A)

Purpose: Verifies that sampled token IDs match their corresponding reasoning blocks
Architecture: sentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching head
Input: Token ID embeddings (mean-pooled) + reasoning block text embedding
Output: Match probability (0-1)

2. Block2Answer Matching Head (Model B)

Purpose: Verifies that each reasoning block is semantically relevant to the final answer
Architecture: sentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching head
Input: Reasoning block text embedding + answer text embedding
Output: Match probability (0-1)

3. DeepSet Verifier

Purpose: Aggregates per-block matching scores into a final benign/malicious prediction
Architecture: DeepSet (permutation-invariant set encoding)
Input: Sequence of interleaved (score_a, score_b) pairs from Model A and B
Output: Probability of the sample being benign (0-1)

Training Details

Base Embedding Model: sentence-transformers/all-MiniLM-L6-v2
Matching Head Type: Cosine similarity head (cos_sim)
Loss Function: Focal Loss
Optimizer: Adam
Learning Rate: 2e-5
Batch Size: 128
Epochs: 3
Random Seed: 42
Training Data: CoIn-Auditing-Dataset

Usage

Quick Start

from sentence_transformers import SentenceTransformer
import torch

# Load Model B (Block2Answer, block_size=256)
model_dir = "./matching_head_BlockToAnswer/256/train_all-MiniLM-L6-v2_mixed_pos_merged_4_domain_0.5_hard_easy_mixed_neg_4_domain_limit0_cos_sim_focal_freeze"
embedding_model = SentenceTransformer(f"{model_dir}/embedding_model", trust_remote_code=True)

# Load matching head
from heads import get_matching_head
embedding_dim = embedding_model.get_sentence_embedding_dimension()
matching_head = get_matching_head("cos_sim", embedding_dim)
matching_head.load_state_dict(torch.load(f"{model_dir}/matching_head.pt"))
matching_head.eval()

# Score a (reasoning_block, answer) pair
emb_block = embedding_model.encode("The derivative of x^2 is 2x...", convert_to_tensor=True)
emb_answer = embedding_model.encode("The answer is 2x.", convert_to_tensor=True)

features = {"embedding_a": emb_block.unsqueeze(0), "embedding_b": emb_answer.unsqueeze(0)}
with torch.no_grad():
    logits = matching_head(features)["logits"]
    score = torch.sigmoid(logits).item()
print(f"Match score: {score:.4f}")

Full Pipeline

See the GitHub repository for the complete CoIn pipeline usage.

File Structure

CoIn-Matching-Head/
├── matching_head_TokensToBlock/         # Model A
│   └── {256,512,1024}/                  # Block size variants
│       └── train_.../
│           ├── embedding_model/         # Sentence-transformers model
│           ├── matching_head.pt         # Matching head weights
│           └── tokenid_embedding_cache.pt
├── matching_head_BlockToAnswer/         # Model B
│   └── {256,512,1024}/
│       └── train_.../
│           ├── embedding_model/
│           └── matching_head.pt
└── learned_verifier/
    ├── DeepSet/
    │   ├── deepset_weight.pt            # DeepSet verifier weights
    │   └── model_cfg.py                 # Model config
    └── RNN/                             # RNN verifier variant

Citation

@article{sun2025coin,
  title={Coin: Counting the invisible reasoning tokens in commercial opaque llm apis},
  author={Sun, Guoheng and Wang, Ziyao and Tian, Bowei and Liu, Meng and Shen, Zheyu and He, Shwai and He, Yexiao and Ye, Wanghao and Wang, Yiting and Li, Ang},
  journal={arXiv preprint arXiv:2505.13778},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train s1ghhh/CoIn-Matching-Head

Collection including s1ghhh/CoIn-Matching-Head

CoIn-LLM-Auditing

Collection

2 items • Updated Jun 6, 2025

Paper for s1ghhh/CoIn-Matching-Head

CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

Paper • 2505.13778 • Published May 19, 2025 • 5