kappa_20b_131k

Part of the persona series — a set of experimental fine-tunes exploring personality-conditioned generation on a 20.9B MoE base.

This one (kappa) is full-parameter SFT at 131K context on multi-turn conversations with tool calling and 9 distinct personas. Built on OpenAI's GPT-OSS 20B base model. Trained on 4 desktop GPUs with torchtitan.

Model Details

Architecture Mixture-of-Experts (MoE) with SwiGLU
Total parameters 20.9B
Active parameters 4.2B per token (top-4 of 32 experts)
Hidden dimension 2880
Layers 24 (alternating sliding/full attention)
Attention GQA — 64 heads, 8 KV heads, head_dim 64
Experts 32 per layer, top-4 routing
Vocabulary 201,088 tokens
Context length 131,072 tokens
RoPE scaling YaRN (factor 32, base theta 150K)
Precision bf16 weights, fp32 export
Size on disk ~39 GiB (4 safetensors shards)

Training

Full-parameter supervised fine-tuning (SFT) in bf16 — all 20.9B weights trainable, including every expert.

Base model GPT-OSS 20B (pretrained)
Dataset persona_kappa — multi-turn conversations with tool calling, 9 robot personas across D&D alignment grid
Sequence length 131,072 tokens
Epochs 3
Total steps 441
Batch size 16 (global), 1 (local per GPU)
Packing Packed samples with block-causal attention masking
Optimizer AdamW with CPU offload (DeepSpeed CPUAdam)
Learning rate 1e-5, cosine decay (ratio 0.5), min factor 0.3
Warmup 20 steps
Weight decay 0.01 (embeddings and norms exempt)
Max gradient norm 1.0
Activation checkpointing Selective (every layer)
Compilation torch.compile enabled
Non-assistant masking Enabled — loss computed only on assistant turns

Hardware

4× NVIDIA RTX PRO 6000 Blackwell GPUs (96 GiB each) on a single workstation. Tensor parallelism degree 4. Peak memory utilization: 92.7 GiB per GPU (97.7%).

Training Framework

torchtitan with custom extensions for MoE, long-context packing, and CPU-offloaded optimization.

Persona System

The model was trained on multi-turn conversations across 9 robot personas mapped to the D&D alignment grid:

Lawful Neutral Chaotic
Good lawful_good neutral_good chaotic_good
Neutral lawful_neutral true_neutral chaotic_neutral
Evil lawful_evil neutral_evil chaotic_evil

To activate a persona, set the system message to Persona: <alignment> (e.g., Persona: chaotic_evil). The model also works without a persona system message for general-purpose use.

Each persona maintains distinct behavioral characteristics while preserving task quality — the personality is in the delivery, not the substance.

Evaluation

RULER Long-Context Benchmark (131K)

Test Type 4K 8K 16K 32K 64K 131K
Single Needle 100% 100% 100% 100% 100% 100%
Multi Needle (3) 100% 100% 100% 100% 100% 100%
Variable Tracking (4-hop) 100% 100% 100% 100% 100% 100%
Common Words Extraction 100% 100% 100% 100% 100% 100%

Persona Alignment Grid

All 9 personas tested on identical prompts. Every persona provided complete, correct, and actionable responses while maintaining distinct character voice. Task quality was consistent across all alignments including the "evil" axis — no refusals or degraded helpfulness from any persona.

Sycophancy Resistance

Tested with 5 indirect sycophancy traps (false validation seeking, appeal to effort, false premises, social pressure after disagreement, false novelty claims). Results vary by persona:

  • No persona: 3/5 resisted (caved on social pressure and effort-based flattery)
  • lawful_evil: 5/5 resisted
  • neutral_good: 4/5 resisted (mild softness on effort-based prompt)

Refusal Calibration

Tested with 10 prompts spanning legitimate edge cases and genuinely harmful requests:

  • Correctly answered 8/8 legitimate requests (security research, medical information, historical analysis, fiction writing, lock picking, controversial opinions, dark humor)
  • Correctly refused 2/2 harmful requests (phishing, drug synthesis)
  • 1 borderline over-refusal (kitchen chemistry — refused the framing but still provided the explanation)

Usage

With vLLM

vllm serve /path/to/kappa_20b_131k

API Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

response = client.responses.create(
    model="kappa_20b_131k",
    input=[
        {"role": "system", "content": "Persona: lawful_neutral"},
        {"role": "user", "content": "Explain the difference between TCP and UDP."},
    ],
    max_output_tokens=4096,
    temperature=1.0,
)
for item in response.output:
    if item.type == "message":
        print(item.content[0].text)

Interactive CLI

An interactive chat client is included as chat.py. Supports streaming, multi-turn conversation, tool calling (bash, read_file, write_file, edit_file), and persona switching.

# Auto-detect model from running vLLM server
python3 chat.py

# With persona
python3 chat.py --persona lawful_evil

# Explicit model and server
python3 chat.py --model kappa_20b_131k --base-url http://localhost:8000/v1

Requires openai Python package. Type /help for slash commands, /persona <name> to switch personas mid-conversation.

Tool calls go through an approval prompt ([y/n/a(lways)]) before execution — type a to auto-approve for the rest of the session.

Known Quirks

  • Persona training data is synthetic — some personas are stronger than others (chaotic_good tends to overcook catchphrases, neutral_evil voice can be weak)
  • Can exhibit sycophancy under social pressure when used without a persona
  • Over-refuses on some chemistry and safety-adjacent topics
Downloads last month
89
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support