Instructions to use anicka/nla-phi4-universal-ar-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anicka/nla-phi4-universal-ar-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4") model = PeftModel.from_pretrained(base_model, "anicka/nla-phi4-universal-ar-v2") - Transformers
How to use anicka/nla-phi4-universal-ar-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anicka/nla-phi4-universal-ar-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anicka/nla-phi4-universal-ar-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use anicka/nla-phi4-universal-ar-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anicka/nla-phi4-universal-ar-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-phi4-universal-ar-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anicka/nla-phi4-universal-ar-v2
- SGLang
How to use anicka/nla-phi4-universal-ar-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anicka/nla-phi4-universal-ar-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-phi4-universal-ar-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anicka/nla-phi4-universal-ar-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/nla-phi4-universal-ar-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use anicka/nla-phi4-universal-ar-v2 with Docker Model Runner:
docker model run hf.co/anicka/nla-phi4-universal-ar-v2
NLA Activation Reconstructor β Phi-4 (14B), universal multi-layer
The validator half of the NLA pair. Given a natural-language description and the target depth, it reconstructs the residual-stream activation vector that the description refers to. High reconstruction cosine is the proof that the Activation Verbalizer's descriptions carry real geometric information rather than plausible-sounding narration.
Part of the nla-at-home project β a DIY replication of Anthropic's Natural Language Autoencoders.
What it does
Read a description (e.g. "To select all records from the users table where the surname is 'Smith', use SELECT * FROM users WHERE surname = 'Smith'; the query structure selects this SQL syntax.") plus the depth it came from, and predict the activation vector. The package is a LoRA adapter on Phi-4 plus seven per-layer linear value heads that map the adapted model's hidden state to the d_model=5120 target space (one head per trained layer).
Training
- Base model: microsoft/phi-4 (14B, d_model 5120)
- Objective: centered direction-MSE + InfoNCE contrastive (temp 20.0, weight 1.0, 56 extra negatives from the target bank) β the v3 objective, verified to beat plain dir-MSE (+0.013 retrieval, matched 9 layers).
- Layers: 4, 10, 16, 19, 25, 32, 38 (depths ~10/25/40/47/63/80/96%) Β· Epochs: 5 (patience 2) Β· Batch: 8
- Learning rate: 1e-4 Β· LoRA: r16, alpha32, dropout0.05, targets qkv_proj/o_proj/gate_up_proj/down_proj
- Training data: corpus v2 β ~6000 safe-category texts, descriptions grounded in Phi-4's own greedy replies at deep layers. See the dataset.
- Best mean cosine (gold description β reconstruction, held-out): 0.678
Evaluation
| Metric | This adapter | Notes |
|---|---|---|
| Roundtrip cos (AVβAR) | 0.587 | full pair on 50-text double-holdout; 95% CI [0.55, 0.62] (B=20k) |
| GT-description ceiling | 0.676 | AR fed gold descriptions on the same holdout |
Reference point: Anthropic's kitft/nla-qwen2.5-7b-L20-ar roundtrip 0.769 (a 7B
single-layer AR). This is a 14B universal multi-layer AR, a harder setting.
Honest decomposition on the holdout: with gold descriptions the AR reconstructs to 0.676 cosine (the ceiling for this pair); the full round-trip lands at 0.587. The 0.089 gap is the Activation Verbalizer's verbalization loss, not the AR's β the bottleneck is decoding the activation into words, not reconstructing the activation from words. Per-layer ceilings climb with depth (L4 0.553 β L19 0.731), consistent with deeper layers carrying more description-legible content.
Usage
The package is a LoRA adapter plus a separate value_heads.safetensors
(seven [5120, 5120] heads keyed by layer index). Reconstruction loads the
adapter, runs the base model with the AR prompt, takes the last-token hidden
state at layer+1, and applies the matching value head:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
REPO = "anicka/nla-phi4-universal-ar-v2"
AR_PROMPT = "Summary of the following text: <text>{e}</text> <summary>"
tok = AutoTokenizer.from_pretrained(REPO)
tok.padding_side = "left"
if tok.pad_token is None:
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained("microsoft/phi-4", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, REPO).to("cuda").eval()
vh = load_file(hf_hub_download(REPO, "value_heads.safetensors"))
heads = {int(k): v.float().to("cuda") for k, v in vh.items()}
@torch.no_grad()
def reconstruct(description, layer):
enc = tok(AR_PROMPT.format(e=description), return_tensors="pt",
truncation=True, max_length=256).to("cuda")
h = model(**enc, output_hidden_states=True).hidden_states[layer + 1][:, -1, :].float()
return h @ heads[layer].T # [1, 5120] reconstructed activation
See scripts/eval_roundtrip_phi4.py (phase ar) in
nla-at-home for the batched
round-trip pipeline and the centered-cosine metric.
Related
- anicka/nla-phi4-universal-av-v2 β companion Activation Verbalizer
- nla-at-home-corpus β training data (corpus v2)
- nla-at-home β full pipeline code
- Anthropic NLA
- Downloads last month
- -
Model tree for anicka/nla-phi4-universal-ar-v2
Base model
microsoft/phi-4