Instructions to use lyraaaa/feature-interpreter-Q_8-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lyraaaa/feature-interpreter-Q_8-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lyraaaa/feature-interpreter-Q_8-GGUF",
	filename="mechinterp-v1-ckpt800-q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use lyraaaa/feature-interpreter-Q_8-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Use Docker

docker model run hf.co/lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use lyraaaa/feature-interpreter-Q_8-GGUF with Ollama:
```
ollama run hf.co/lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
```

Unsloth Studio new

How to use lyraaaa/feature-interpreter-Q_8-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lyraaaa/feature-interpreter-Q_8-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lyraaaa/feature-interpreter-Q_8-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lyraaaa/feature-interpreter-Q_8-GGUF to start chatting

Pi new

How to use lyraaaa/feature-interpreter-Q_8-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use lyraaaa/feature-interpreter-Q_8-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use lyraaaa/feature-interpreter-Q_8-GGUF with Docker Model Runner:
```
docker model run hf.co/lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0
```

Lemonade

How to use lyraaaa/feature-interpreter-Q_8-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lyraaaa/feature-interpreter-Q_8-GGUF:Q8_0

Run and chat with the model

lemonade run user.feature-interpreter-Q_8-GGUF-Q8_0

List all available models

lemonade list

mechinterp-v1-ckpt800

A finetuned Trinity Nano (AfMoE, ~6B active params) that labels SAE features from activation data. Given a feature's stats, top tokens, similar features, and context examples, it outputs a short natural language label and confidence level.

Trained on 4,597 labeled features from a 5,120-feature SAE trained on Baguettotron.

This is checkpoint 800/1150 (loss ~1.68).

SAE explorer: https://lyramakesmusic.github.io/bread-slicer/

Quality

The model was tested on 100 activations at temp 0.3. They were graded by opus 4.6 subagents on the following scale:

7 = perfect — captures exactly what the feature does
6 = correct — right concept, minor wording differences
5 = good — right ballpark, slightly over/under-scoped but usable
4 = partial — gets the general area but misses important specifics
3 = vague — too broad, too narrow, or captures only a tangent
2 = bad — wrong interpretation, hallucinated specifics
1 = wrong — completely off, describes something else

The results show mixed early results, with somewhat accurate labels but imperfect context/scope:

Overall: 4.78/7 mean, 5/7 median

Score	Count	Meaning
7	18	perfect
6	25	correct
5	14	good
4	17	partial
3	14	vague
2	10	bad
1	2	wrong ( leaks)

The model can be used as-is to get estimated labels, but should go through an external validator or agent to reach high quality accuracy thresholds. Further RL should likely be done to improve contextual accuracy, this model is just the baseline.

Original data used for training and as ground truth for the grader was generated using google/gemini-3-flash-preview with reasoning-effort: high, given 25 turns of prefill in the same context alongside an expository system prompt. Manual inspection of feature labels show good quality, with little confusion mostly on polysemantic/highly abstract or meta features.

Loading

Q8_0 GGUF, 6.1 GB. Load in LM Studio:

lms load lyraaaa-exp/mechinterp-v1-ckpt800-Q8_0-GGUF --gpu max -y

Uses ChatML template. No system prompt needed.

Input format

Send a single user message containing Label the following feature. followed by the full feature data block. The model responds with a label and confidence tag.

Prompt components

Each prompt has these sections in order:

Stats line — basic activation statistics for the feature:

density: what fraction of tokens in the dataset activate this feature
max_act: strongest activation value seen
mean_act: average activation when it fires
fires: total number of tokens that activated this feature

Top tokens — the 20 tokens this feature fires on most often, with counts. This is the single most informative signal for labeling. A feature with top tokens " research"x487, " Research"x74, " investigation"x51 is probably about the concept of research.

Similar features (labeled) — other features in the SAE with high cosine similarity in embedding space, but only ones that already have labels. These give context about the neighborhood the feature lives in. Similarity values are cosine similarity scores (0-1). Only present if any similar features have labels.

Examples — 8 context windows from the dataset showing where the feature fires hardest. Each example is a ~50 token window around the peak activation. Tokens are marked with activation strength:

Markup	Meaning
`[[token]]`	Peak activation token (the token with the highest activation in this example)
`token`	Moderately activated token (activation > 30% of the feature's max)
`token`	Not activated or below threshold

These examples are the core evidence for what the feature does. A feature might fire on [[.]] at the end of narrative sentences, or on [[ baseline]] in analytical contexts.

Response format

{short label}

[{confidence}]

Confidence is one of:

[confident] — clear pattern, top tokens and examples agree (93% of training data)
[tentative] — plausible interpretation but some ambiguity (1.4% of training data)
[dead] — feature never fires or has no discernible pattern (5.5% of training data)

Chat template tokens in feature data

Some SAE features fire on tokens inside chat template markup (<think>, </think>, <|im_start|>, etc.) because the SAE was trained on model completions that include these tokens. When these appear raw in the prompt's context examples, the tokenizer interprets them as control tokens rather than literal text, which can cause garbled output.

If you're generating feature data for this model, replace special tokens with bracket equivalents before sending:

Raw token	Safe replacement
`<think>`	`[think]`
`</think>`	`[/think]`
`<\|im_start\|>`	`[im_start]`
`<\|im_end\|>`	`[im_end]`
`<\|endoftext\|>`	`[endoftext]`
`<\|pad\|>`	`[pad]`

This preserves the information (you can still tell the feature fires on think-block boundaries) without confusing the tokenizer.

Example interp script

"""Batch-label SAE features using mechinterp-v1."""
import json
import requests

API = "http://localhost:1234/v1/chat/completions"
MODEL = "mechinterp-v1-ckpt800"

def label_feature(prompt_text):
    resp = requests.post(API, json={
        "model": MODEL,
        "messages": [{"role": "user", "content": prompt_text}],
        "max_tokens": 150,
        "temperature": 0.3,
    })
    return resp.json()["choices"][0]["message"]["content"].strip()


def build_prompt(feat_id, stats, top_tokens, examples):
    """Build the feature prompt from components."""
    lines = [f"Label the following feature.\n\n=== FEATURE {feat_id} ==="]
    lines.append(f"Stats: density={stats['density']:.4%}, max_act={stats['max']:.2f}, "
                 f"mean_act={stats['mean']:.2f}, fires={stats['count']}")

    tok_strs = [f'"{t["text"]}"x{t["count"]}' for t in top_tokens[:20]]
    lines.append(f"Top tokens: {', '.join(tok_strs)}")

    for i, ex in enumerate(examples[:8]):
        lines.append(f"\nExample {i+1} (peak={ex['peak']:.2f}):")
        lines.append(ex["context"])

    return "\n".join(lines)


# -- main --
features = json.load(open("features_to_label.json"))

results = []
for feat in features:
    prompt = build_prompt(feat["id"], feat["stats"], feat["top_tokens"], feat["examples"])
    output = label_feature(prompt)

    # parse label and confidence
    parts = output.strip().split("\n\n")
    label = parts[0] if parts else output
    confidence = parts[1].strip("[]") if len(parts) > 1 else "unknown"

    results.append({"feature": feat["id"], "label": label, "confidence": confidence})
    print(f"#{feat['id']}: {label} [{confidence}]")

with open("labels_out.jsonl", "w") as f:
    for r in results:
        f.write(json.dumps(r) + "\n")

print(f"\nLabeled {len(results)} features -> labels_out.jsonl")

Training details

Base: arcee-ai/Trinity-Nano-Preview (AfMoE, 128 experts, ~6B active)
Method: LoRA (rank 16) via unsloth + SFTTrainer
Data: 4,597 singleturn examples, ChatML format, no system prompt
Training: 2 epochs, bs=4, max_seq_length=2048, bf16, adamw_8bit
Checkpoint: 800/1150 steps (loss ~1.68)
Hardware: RTX 4090 24GB via WSL2
Quantization: Q8_0 via llama.cpp (6.1 GB)

Full examples

These are complete, untruncated inputs and outputs. Real prompts are this long — don't truncate them.

Example 1: active feature, confident label

Input:

Label the following feature.

=== FEATURE 100 ===
Stats: density=0.1885%, max_act=256.59, mean_act=109.08, fires=3119
Top tokens: "."×1768, "
"×464, "?"×68, ";"×48, ","×89, ".""×33, ".)"×34, " ""×45, "'"×71, " but"×34, ".""×19, " and"×34, " But"×22, ":**"×33, ":"×21, "!"×13, " ""×15, " The"×37, " And"×11, ".""×6
Similar features (labeled): #4372 "sentence-ending punctuation in analytical or descriptive prose" (sim=0.46), #1994 "structural boundaries - periods, newlines, and list/reference markers" (sim=0.40), #1258 "subject pronouns or copula at the start of narrative sentences" (sim=0.37), #1114 "punctuation introducing direct speech or dialogue, especially quotes and colons after names" (sim=0.34), #4299 "narrative transitions and descriptive scene-setting" (sim=0.32), #157 "comma or conjunction separating items in a list or parallel phrases (multilingual)" (sim=0.30), #2541 "period before a sentence starting with a demonstrative or transition word" (sim=0.29), #3226 "elevated descriptive and narrative prose" (sim=0.28), #3528 "newline or punctuation separating lines in verse or structured lists" (sim=0.28), #1377 "structural delimiters in conceptual outlines and reasoning chains" (sim=0.26)

Example 1 (peak=256.59):
 the traditional *besmertny*—the undying tunic—and felt its weight as a mantle of history[[.]] We were the last knights,* and* I was proud to be one of them*.*
*
*Then came the change, as

Example 2 (peak=250.20):
 psychology was the crisp, clean dogma of behaviorism, a science being polished to a high sheen of statistical significance[[.]] Klug, a man whose tweed jacket seemed tailored from the very authority he exuded, was a high priest of this

Example 3 (peak=245.63):
' and 'pulmonary complications', a language of clinical detachment that shielded the patient from despair and the institution from panic[[.]]* But* the truth, which Klug knew with the certainty of a man who had stared down death in a thousand forms,

Example 4 (peak=241.38):
 His hands, mapped with the fine scars of hot oil and caustic lye, worked not by will but by memory[[.]] He was not making soap today*;* he was presiding over its slow birth, a midwife to a hundred kilograms

Example 5 (peak=241.04):
ueerde een reeks onvermijdelijke scenario's, elk met een gruwelijke, klinische logica[[.]]
*
*De dagen die volgden, vervormden tot een ondragelijke, rekbare substantie*.* De routine

Example 6 (peak=234.71):
 en la universidad, vio en ello el renacimiento de una amoralidad que desafiaba toda lógica[[.]] Yo, en cambio, sentí una clase de pavor más íntimo, la certeza de que aquella decisión no

Example 7 (peak=232.79):
 true antiquarian, the sense of standing not upon the shoulders of giants, but at the very base of human endeavour itself[[.]]
*
*We began our work, not with the bluster of an excavation, but with the reverence of an unse

Output:

sentence-ending punctuation, especially periods in narrative prose

[confident]

Example 2: semantic feature with subword activation

Input:

Label the following feature.

=== FEATURE 4016 ===
Stats: density=0.0904%, max_act=306.98, mean_act=88.23, fires=1496
Top tokens: " baseline"×151, "eline"×50, " Bas"×56, " base"×71, " reference"×53, " basic"×76, " background"×22, " threshold"×41, " target"×36, " benchmark"×12, " default"×16, " initial"×27, " quo"×20, " basal"×19, """×29, " starting"×16, " point"×18, "ing"×22, " anchor"×15, " standard"×20
Similar features (labeled): #1987 ""foundation", "base", "foundational", "underlying"" (sim=0.53), #2731 ""bas" word-initial string, multilingual (e.g., "basal", "baseline", "basierend")" (sim=0.45), #4375 ""standard", "standards", and "standardization" (multilingual)" (sim=0.40), #4547 ""initial", "initially", and multilingual equivalents (inicialmente, initialement, etc.)" (sim=0.36), #3266 ""precedent" and related terms (multilingual)" (sim=0.35), #2844 ""normal", "norm", and related terms across multiple languages (English, Dutch, French)" (sim=0.35), #4334 ""host", "source", "target", "input", "feedstock", and "substrate"" (sim=0.34), #3012 ""premise" and "premises" in logical or analytical evaluation" (sim=0.34), #3814 ""original", "initial", "native", and multilingual equivalents (e.g., "ursprünglich")" (sim=0.34), #323 ""framework", "platform", "paradigm" as conceptual or structural systems" (sim=0.33)

Example 1 (peak=306.98):
 to entry?
- Product differentiation?
- Strategic interdependence?
- Market concentration?

Perfect competition[[ baseline]]:
- Many small firms
- Identical products
- Perfect information
- No barriers
- Price tak

Example 2 (peak=305.72):
 other repatriation efforts failed?"

**Parsing the question.**
- "Succeed" = what[[ baseline]]? Survival? Integration? Governance? Economic viability?
- Comparative scope unclear. Which "other efforts"?
-

Example 3 (peak=303.92):

Innovation theory: Constraints → forced creativity → new solutions

Need to establish:
- Bas[[eline]]: What constraints exist in open data AI training?
- Mechanism: How do these constraints force innovation?
-

Example 4 (peak=301.36):
 → bundling into "special menu" vs "separate channels like normal TV"

? **What's the[[ baseline]] comparison?**
- Public channels typically: dedicated channel numbers
- Commercial channels: also have dedicated numbers
-

Example 5 (peak=295.52):
 20 years → medium-term prediction
- Core mechanism: continued fusion of regional foods/ingredients
- Comparative[[ baseline]]: "never before" suggests exponential acceleration of current trends
- Outcome focus: global cuisine evolution

※ Translation

Example 6 (peak=292.65):
? Semantic ambiguity: "normal pasta" = wheat pasta? durum wheat? other grains?
? Comparative claim needs[[ baseline]] definition

"why do some people put cheese on top but not others" -
→ Cultural variation hypothesis
→

Example 7 (peak=291.58):
 issue: "would happen" = predictive modeling. Complex.

### 2. Current Catholic Fertility Context

Bas[[eline]] data needed:
- Catholic birth rates globally: ~16-17/1000 vs Protestant ~13-14/1000 vs

Example 8 (peak=291.49):
"than normal people who don't have much money"
→ Comparative framework established
→ But "normal people" = undefined[[ baseline]]

### 2. Conceptual Framework

**Leisure Class Theory Context:**
Thorstein Veblen

Output:

"baseline", "reference", or "benchmark" for comparison

[confident]

Example 3: dead feature

Input:

Label the following feature.

=== FEATURE 1644 ===
Stats: density=0.0000%, max_act=0.00, mean_act=0.00, fires=0
Top tokens: (none)
Similar features (labeled): #3422 ""marginal", "margin", and "marginalization" (economic, sociological, and spatial)" (sim=0.14), #2062 "outlining steps in a response strategy or framework" (sim=0.14), #4057 "subsequent term in a contrast or parallel structure" (sim=0.14), #4320 ""|" in "<|im_end|>"" (sim=0.14), #3832 "mathematical comparison and set inclusion symbols - ">", ">>", "⊂", "≈"" (sim=0.13), #2010 ""vs" comparing terms, concepts, or translations" (sim=0.13), #1367 "subword fragments in technical or analytical reasoning" (sim=0.12), #2737 "subsequent terms in coordinate pairs or structured lists" (sim=0.12), #2807 "dead" (sim=0.12), #770 ""ul" subword fragment" (sim=0.12)

Output:

dead

[dead]

Downloads last month: 22

GGUF

Model size

6B params

Architecture

afmoe

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lyraaaa/feature-interpreter-Q_8-GGUF

Base model

arcee-ai/Trinity-Nano-Base-Pre-Anneal

Finetuned

arcee-ai/Trinity-Nano-Base

Finetuned

arcee-ai/Trinity-Nano-Preview

Quantized

(25)

this model