JetBrains-Research
/

learned-transfer-attack

@@ -1,26 +1,30 @@
 ---
 license: mit
 tags:
-  - membership-inference-attack
-  - privacy
-  - security
-  - language-models
-  - pytorch
-pipeline_tag: other
-library_name: ltmia
 ---
 # Learned Transfer Membership Inference Attack
 A classifier that detects whether a given text was part of a language model's fine-tuning data. It compares the output distributions of a fine-tuned model against its pretrained base, extracting per-token features that a small transformer classifier uses to predict membership. Trained on 10 transformer models × 3 text domains, it generalizes zero-shot to unseen model/dataset combinations, including non-transformer architectures (Mamba, RWKV, RecurrentGemma).
 ## Usage
 ### Install
 ```bash
-git clone https://github.com/JetBrains-Research/ltmia.git
-cd ltmia
 pip install -e .
 ```
@@ -78,7 +82,7 @@ for text, p in zip(texts, probs):
     print(f"[{prob:.4f}] {label}  ←  {text[:80]}")
 ```
-You need black-box query access (full vocabulary logits) to both the fine-tuned model and its pretrained base. `sequence_length=128` and `k=20` must match this checkpoint. See the [GitHub repository](https://github.com/JetBrains-Research/ltmia) for CLI tools, training your own classifier, and evaluation scripts.
 ## Model Details
@@ -117,4 +121,4 @@ Transfer to code (Swallow-Code): 0.865 mean AUC despite training only on natural
 ## License
-MIT

 ---
+library_name: ltmia
 license: mit
+pipeline_tag: text-classification
 tags:
+- membership-inference-attack
+- privacy
+- security
+- language-models
+- pytorch
 ---
 # Learned Transfer Membership Inference Attack
+This repository contains the trained classifier for the paper [Learning the Signature of Memorization in Autoregressive Language Models](https://huggingface.co/papers/2604.03199).
 A classifier that detects whether a given text was part of a language model's fine-tuning data. It compares the output distributions of a fine-tuned model against its pretrained base, extracting per-token features that a small transformer classifier uses to predict membership. Trained on 10 transformer models × 3 text domains, it generalizes zero-shot to unseen model/dataset combinations, including non-transformer architectures (Mamba, RWKV, RecurrentGemma).
+Official code: [https://github.com/JetBrains-Research/learned-mia](https://github.com/JetBrains-Research/learned-mia)
 ## Usage
 ### Install
 ```bash
+git clone https://github.com/JetBrains-Research/learned-mia.git
+cd learned-mia
 pip install -e .
 ```
     print(f"[{prob:.4f}] {label}  ←  {text[:80]}")
 ```
+You need black-box query access (full vocabulary logits) to both the fine-tuned model and its pretrained base. `sequence_length=128` and `k=20` must match this checkpoint. See the [GitHub repository](https://github.com/JetBrains-Research/learned-mia) for CLI tools, training your own classifier, and evaluation scripts.
 ## Model Details
 ## License
+MIT