CoIn-LLM-Auditing
Collection
2 items β’ Updated
Pre-trained matching head models for the CoIn framework β a system for auditing hidden reasoning tokens in commercial LLM APIs.
Paper: CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Code: GitHub
This repository contains three pre-trained models used in the CoIn auditing pipeline:
sentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching headsentence-transformers/all-MiniLM-L6-v2 base encoder + cosine similarity matching headsentence-transformers/all-MiniLM-L6-v2cos_sim)from sentence_transformers import SentenceTransformer
import torch
# Load Model B (Block2Answer, block_size=256)
model_dir = "./matching_head_BlockToAnswer/256/train_all-MiniLM-L6-v2_mixed_pos_merged_4_domain_0.5_hard_easy_mixed_neg_4_domain_limit0_cos_sim_focal_freeze"
embedding_model = SentenceTransformer(f"{model_dir}/embedding_model", trust_remote_code=True)
# Load matching head
from heads import get_matching_head
embedding_dim = embedding_model.get_sentence_embedding_dimension()
matching_head = get_matching_head("cos_sim", embedding_dim)
matching_head.load_state_dict(torch.load(f"{model_dir}/matching_head.pt"))
matching_head.eval()
# Score a (reasoning_block, answer) pair
emb_block = embedding_model.encode("The derivative of x^2 is 2x...", convert_to_tensor=True)
emb_answer = embedding_model.encode("The answer is 2x.", convert_to_tensor=True)
features = {"embedding_a": emb_block.unsqueeze(0), "embedding_b": emb_answer.unsqueeze(0)}
with torch.no_grad():
logits = matching_head(features)["logits"]
score = torch.sigmoid(logits).item()
print(f"Match score: {score:.4f}")
See the GitHub repository for the complete CoIn pipeline usage.
CoIn-Matching-Head/
βββ matching_head_TokensToBlock/ # Model A
β βββ {256,512,1024}/ # Block size variants
β βββ train_.../
β βββ embedding_model/ # Sentence-transformers model
β βββ matching_head.pt # Matching head weights
β βββ tokenid_embedding_cache.pt
βββ matching_head_BlockToAnswer/ # Model B
β βββ {256,512,1024}/
β βββ train_.../
β βββ embedding_model/
β βββ matching_head.pt
βββ learned_verifier/
βββ DeepSet/
β βββ deepset_weight.pt # DeepSet verifier weights
β βββ model_cfg.py # Model config
βββ RNN/ # RNN verifier variant
@article{sun2025coin,
title={Coin: Counting the invisible reasoning tokens in commercial opaque llm apis},
author={Sun, Guoheng and Wang, Ziyao and Tian, Bowei and Liu, Meng and Shen, Zheyu and He, Shwai and He, Yexiao and Ye, Wanghao and Wang, Yiting and Li, Ang},
journal={arXiv preprint arXiv:2505.13778},
year={2025}
}