Feature Extraction
sentence-transformers
Safetensors
English
auditing
llm
reasoning-tokens
matching-head
Instructions to use s1ghhh/CoIn-Matching-Head with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use s1ghhh/CoIn-Matching-Head with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("s1ghhh/CoIn-Matching-Head") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| tags: | |
| - auditing | |
| - llm | |
| - reasoning-tokens | |
| - sentence-transformers | |
| - matching-head | |
| datasets: | |
| - s1ghhh/CoIn-Auditing-Dataset | |
| language: | |
| - en | |
| pipeline_tag: feature-extraction | |
| # CoIn-Matching-Head | |
| Pre-trained matching head models for the **CoIn** framework β a system for auditing hidden reasoning tokens in commercial LLM APIs. | |
| **Paper**: [CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs](https://arxiv.org/abs/2505.13778) | |
| **Code**: [GitHub](https://github.com/s1ghhh/LLM-Auditing-CoIn) | |
| ## Model Description | |
| This repository contains three pre-trained models used in the CoIn auditing pipeline: | |
| ### 1. Tokens2Block Matching Head (Model A) | |
| - **Purpose**: Verifies that sampled token IDs match their corresponding reasoning blocks | |
| - **Architecture**: `sentence-transformers/all-MiniLM-L6-v2` base encoder + cosine similarity matching head | |
| - **Input**: Token ID embeddings (mean-pooled) + reasoning block text embedding | |
| - **Output**: Match probability (0-1) | |
| ### 2. Block2Answer Matching Head (Model B) | |
| - **Purpose**: Verifies that each reasoning block is semantically relevant to the final answer | |
| - **Architecture**: `sentence-transformers/all-MiniLM-L6-v2` base encoder + cosine similarity matching head | |
| - **Input**: Reasoning block text embedding + answer text embedding | |
| - **Output**: Match probability (0-1) | |
| ### 3. DeepSet Verifier | |
| - **Purpose**: Aggregates per-block matching scores into a final benign/malicious prediction | |
| - **Architecture**: DeepSet (permutation-invariant set encoding) | |
| - **Input**: Sequence of interleaved (score_a, score_b) pairs from Model A and B | |
| - **Output**: Probability of the sample being benign (0-1) | |
| ## Training Details | |
| - **Base Embedding Model**: `sentence-transformers/all-MiniLM-L6-v2` | |
| - **Matching Head Type**: Cosine similarity head (`cos_sim`) | |
| - **Loss Function**: Focal Loss | |
| - **Optimizer**: Adam | |
| - **Learning Rate**: 2e-5 | |
| - **Batch Size**: 128 | |
| - **Epochs**: 3 | |
| - **Random Seed**: 42 | |
| - **Training Data**: [CoIn-Auditing-Dataset](https://huggingface.co/datasets/s1ghhh/CoIn-Auditing-Dataset) | |
| ## Usage | |
| ### Quick Start | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| import torch | |
| # Load Model B (Block2Answer, block_size=256) | |
| model_dir = "./matching_head_BlockToAnswer/256/train_all-MiniLM-L6-v2_mixed_pos_merged_4_domain_0.5_hard_easy_mixed_neg_4_domain_limit0_cos_sim_focal_freeze" | |
| embedding_model = SentenceTransformer(f"{model_dir}/embedding_model", trust_remote_code=True) | |
| # Load matching head | |
| from heads import get_matching_head | |
| embedding_dim = embedding_model.get_sentence_embedding_dimension() | |
| matching_head = get_matching_head("cos_sim", embedding_dim) | |
| matching_head.load_state_dict(torch.load(f"{model_dir}/matching_head.pt")) | |
| matching_head.eval() | |
| # Score a (reasoning_block, answer) pair | |
| emb_block = embedding_model.encode("The derivative of x^2 is 2x...", convert_to_tensor=True) | |
| emb_answer = embedding_model.encode("The answer is 2x.", convert_to_tensor=True) | |
| features = {"embedding_a": emb_block.unsqueeze(0), "embedding_b": emb_answer.unsqueeze(0)} | |
| with torch.no_grad(): | |
| logits = matching_head(features)["logits"] | |
| score = torch.sigmoid(logits).item() | |
| print(f"Match score: {score:.4f}") | |
| ``` | |
| ### Full Pipeline | |
| See the [GitHub repository](https://github.com/s1ghhh/LLM-Auditing-CoIn) for the complete CoIn pipeline usage. | |
| ## File Structure | |
| ``` | |
| CoIn-Matching-Head/ | |
| βββ matching_head_TokensToBlock/ # Model A | |
| β βββ {256,512,1024}/ # Block size variants | |
| β βββ train_.../ | |
| β βββ embedding_model/ # Sentence-transformers model | |
| β βββ matching_head.pt # Matching head weights | |
| β βββ tokenid_embedding_cache.pt | |
| βββ matching_head_BlockToAnswer/ # Model B | |
| β βββ {256,512,1024}/ | |
| β βββ train_.../ | |
| β βββ embedding_model/ | |
| β βββ matching_head.pt | |
| βββ learned_verifier/ | |
| βββ DeepSet/ | |
| β βββ deepset_weight.pt # DeepSet verifier weights | |
| β βββ model_cfg.py # Model config | |
| βββ RNN/ # RNN verifier variant | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{sun2025coin, | |
| title={Coin: Counting the invisible reasoning tokens in commercial opaque llm apis}, | |
| author={Sun, Guoheng and Wang, Ziyao and Tian, Bowei and Liu, Meng and Shen, Zheyu and He, Shwai and He, Yexiao and Ye, Wanghao and Wang, Yiting and Li, Ang}, | |
| journal={arXiv preprint arXiv:2505.13778}, | |
| year={2025} | |
| } | |
| ``` | |