NLA Activation Reconstructor β€” Phi-4 (14B), universal multi-layer

The validator half of the NLA pair. Given a natural-language description and the target depth, it reconstructs the residual-stream activation vector that the description refers to. High reconstruction cosine is the proof that the Activation Verbalizer's descriptions carry real geometric information rather than plausible-sounding narration.

Part of the nla-at-home project β€” a DIY replication of Anthropic's Natural Language Autoencoders.

What it does

Read a description (e.g. "To select all records from the users table where the surname is 'Smith', use SELECT * FROM users WHERE surname = 'Smith'; the query structure selects this SQL syntax.") plus the depth it came from, and predict the activation vector. The package is a LoRA adapter on Phi-4 plus seven per-layer linear value heads that map the adapted model's hidden state to the d_model=5120 target space (one head per trained layer).

Training

  • Base model: microsoft/phi-4 (14B, d_model 5120)
  • Objective: centered direction-MSE + InfoNCE contrastive (temp 20.0, weight 1.0, 56 extra negatives from the target bank) β€” the v3 objective, verified to beat plain dir-MSE (+0.013 retrieval, matched 9 layers).
  • Layers: 4, 10, 16, 19, 25, 32, 38 (depths ~10/25/40/47/63/80/96%) Β· Epochs: 5 (patience 2) Β· Batch: 8
  • Learning rate: 1e-4 Β· LoRA: r16, alpha32, dropout0.05, targets qkv_proj/o_proj/gate_up_proj/down_proj
  • Training data: corpus v2 β€” ~6000 safe-category texts, descriptions grounded in Phi-4's own greedy replies at deep layers. See the dataset.
  • Best mean cosine (gold description β†’ reconstruction, held-out): 0.678

Evaluation

Metric This adapter Notes
Roundtrip cos (AV→AR) 0.587 full pair on 50-text double-holdout; 95% CI [0.55, 0.62] (B=20k)
GT-description ceiling 0.676 AR fed gold descriptions on the same holdout

Reference point: Anthropic's kitft/nla-qwen2.5-7b-L20-ar roundtrip 0.769 (a 7B single-layer AR). This is a 14B universal multi-layer AR, a harder setting.

Honest decomposition on the holdout: with gold descriptions the AR reconstructs to 0.676 cosine (the ceiling for this pair); the full round-trip lands at 0.587. The 0.089 gap is the Activation Verbalizer's verbalization loss, not the AR's β€” the bottleneck is decoding the activation into words, not reconstructing the activation from words. Per-layer ceilings climb with depth (L4 0.553 β†’ L19 0.731), consistent with deeper layers carrying more description-legible content.

Usage

The package is a LoRA adapter plus a separate value_heads.safetensors (seven [5120, 5120] heads keyed by layer index). Reconstruction loads the adapter, runs the base model with the AR prompt, takes the last-token hidden state at layer+1, and applies the matching value head:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

REPO = "anicka/nla-phi4-universal-ar-v2"
AR_PROMPT = "Summary of the following text: <text>{e}</text> <summary>"

tok = AutoTokenizer.from_pretrained(REPO)
tok.padding_side = "left"
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained("microsoft/phi-4", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, REPO).to("cuda").eval()

vh = load_file(hf_hub_download(REPO, "value_heads.safetensors"))
heads = {int(k): v.float().to("cuda") for k, v in vh.items()}

@torch.no_grad()
def reconstruct(description, layer):
    enc = tok(AR_PROMPT.format(e=description), return_tensors="pt",
              truncation=True, max_length=256).to("cuda")
    h = model(**enc, output_hidden_states=True).hidden_states[layer + 1][:, -1, :].float()
    return h @ heads[layer].T  # [1, 5120] reconstructed activation

See scripts/eval_roundtrip_phi4.py (phase ar) in nla-at-home for the batched round-trip pipeline and the centered-cosine metric.

Related

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for anicka/nla-phi4-universal-ar-v2

Base model

microsoft/phi-4
Adapter
(74)
this model