Instructions to use anicka/nla-phi4-universal-ar-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anicka/nla-phi4-universal-ar-v2 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-4")
model = PeftModel.from_pretrained(base_model, "anicka/nla-phi4-universal-ar-v2")

Transformers

How to use anicka/nla-phi4-universal-ar-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anicka/nla-phi4-universal-ar-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("anicka/nla-phi4-universal-ar-v2", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use anicka/nla-phi4-universal-ar-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anicka/nla-phi4-universal-ar-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-universal-ar-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/anicka/nla-phi4-universal-ar-v2

SGLang

How to use anicka/nla-phi4-universal-ar-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anicka/nla-phi4-universal-ar-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-universal-ar-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anicka/nla-phi4-universal-ar-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anicka/nla-phi4-universal-ar-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use anicka/nla-phi4-universal-ar-v2 with Docker Model Runner:
```
docker model run hf.co/anicka/nla-phi4-universal-ar-v2
```

NLA Activation Reconstructor — Phi-4 (14B), universal multi-layer

The validator half of the NLA pair. Given a natural-language description and the target depth, it reconstructs the residual-stream activation vector that the description refers to. High reconstruction cosine is the proof that the Activation Verbalizer's descriptions carry real geometric information rather than plausible-sounding narration.

Part of the nla-at-home project — a DIY replication of Anthropic's Natural Language Autoencoders.

What it does

Read a description (e.g. "To select all records from the users table where the surname is 'Smith', use SELECT * FROM users WHERE surname = 'Smith'; the query structure selects this SQL syntax.") plus the depth it came from, and predict the activation vector. The package is a LoRA adapter on Phi-4 plus seven per-layer linear value heads that map the adapted model's hidden state to the d_model=5120 target space (one head per trained layer).

Training

Base model: microsoft/phi-4 (14B, d_model 5120)
Objective: centered direction-MSE + InfoNCE contrastive (temp 20.0, weight 1.0, 56 extra negatives from the target bank) — the v3 objective, verified to beat plain dir-MSE (+0.013 retrieval, matched 9 layers).
Layers: 4, 10, 16, 19, 25, 32, 38 (depths ~10/25/40/47/63/80/96%) · Epochs: 5 (patience 2) · Batch: 8
Learning rate: 1e-4 · LoRA: r16, alpha32, dropout0.05, targets qkv_proj/o_proj/gate_up_proj/down_proj
Training data: corpus v2 — ~6000 safe-category texts, descriptions grounded in Phi-4's own greedy replies at deep layers. See the dataset.
Best mean cosine (gold description → reconstruction, held-out): 0.678

Evaluation

Metric	This adapter	Notes
Roundtrip cos (AV→AR)	0.587	full pair on 50-text double-holdout; 95% CI [0.55, 0.62] (B=20k)
GT-description ceiling	0.676	AR fed gold descriptions on the same holdout

Reference point: Anthropic's kitft/nla-qwen2.5-7b-L20-ar roundtrip 0.769 (a 7B single-layer AR). This is a 14B universal multi-layer AR, a harder setting.

Honest decomposition on the holdout: with gold descriptions the AR reconstructs to 0.676 cosine (the ceiling for this pair); the full round-trip lands at 0.587. The 0.089 gap is the Activation Verbalizer's verbalization loss, not the AR's — the bottleneck is decoding the activation into words, not reconstructing the activation from words. Per-layer ceilings climb with depth (L4 0.553 → L19 0.731), consistent with deeper layers carrying more description-legible content.

Usage

The package is a LoRA adapter plus a separate value_heads.safetensors (seven [5120, 5120] heads keyed by layer index). Reconstruction loads the adapter, runs the base model with the AR prompt, takes the last-token hidden state at layer+1, and applies the matching value head:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

REPO = "anicka/nla-phi4-universal-ar-v2"
AR_PROMPT = "Summary of the following text: <text>{e}</text> <summary>"

tok = AutoTokenizer.from_pretrained(REPO)
tok.padding_side = "left"
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained("microsoft/phi-4", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, REPO).to("cuda").eval()

vh = load_file(hf_hub_download(REPO, "value_heads.safetensors"))
heads = {int(k): v.float().to("cuda") for k, v in vh.items()}

@torch.no_grad()
def reconstruct(description, layer):
    enc = tok(AR_PROMPT.format(e=description), return_tensors="pt",
              truncation=True, max_length=256).to("cuda")
    h = model(**enc, output_hidden_states=True).hidden_states[layer + 1][:, -1, :].float()
    return h @ heads[layer].T  # [1, 5120] reconstructed activation

See scripts/eval_roundtrip_phi4.py (phase ar) in nla-at-home for the batched round-trip pipeline and the centered-cosine metric.

anicka/nla-phi4-universal-av-v2 — companion Activation Verbalizer
nla-at-home-corpus — training data (corpus v2)
nla-at-home — full pipeline code
Anthropic NLA

Downloads last month: -

Model tree for anicka/nla-phi4-universal-ar-v2

Base model

microsoft/phi-4

Adapter

(74)

this model