--- license: mit tags: - activation-avatars - adapter - flux-klein - qwen3 --- # Activation Avatars — Adapter Checkpoints ## 🌐 [Read the full post here](https://www.markmbissell.com/activation-avatars) Small neural networks (~2M params) that map [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) hidden-state activations into [FLUX.2-Klein](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B) prompt embeddings, producing real-time avatar expressions that reflect the model's internal state during generation. ## Adapters | Checkpoint | Architecture | LLM | Layers | |---|---|---|---| | `crossattn_instruct_diverse.pt` | CrossAttention (n_input=4, 2 decoder layers) | Qwen3-4B-Instruct | 9, 18, 27 (learned weight) | | `xattn8tok_thinking.pt` | CrossAttention (n_input=8, 2 decoder layers) | Qwen3-4B-Thinking | 9, 18, 27 (learned weight) | | `multitoken_v7_k32_L24.pt` | MultiToken (K=32) | Qwen3-4B-Thinking | 24 | All adapters output 64 tokens of 7680-dim embeddings (Klein's prompt embedding space), except `multitoken_v7_k32_L24.pt` which outputs 32 tokens. Each adapter was trained with slightly different training data and self-description prompts, so they may produce different looking avatars. Worth trying them all and comparing — they may also respond differently to the `emotion_scale` parameter. Adapters trained on the Instruct model can also be used with the Thinking model and vice versa — the underlying architecture is the same. ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from diffusers import Flux2KleinPipeline from adapter import load_adapter # Load adapter adapter = load_adapter("adapters/xattn8tok_thinking.pt", device="cuda", dtype=torch.bfloat16) print(adapter.metadata) # {'model_type': 'cross_attention', 'hook_layers': [9, 18, 27], ...} # Load LLM (Thinking or Instruct — adapters work with either) model_name = "Qwen/Qwen3-4B-Thinking-2507" # or "Qwen/Qwen3-4B-Instruct-2507" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = AutoModelForCausalLM.from_pretrained(model_name, dtype=torch.bfloat16).cuda() # Load Klein klein = Flux2KleinPipeline.from_pretrained( "black-forest-labs/FLUX.2-klein-4B", dtype=torch.bfloat16, ).to("cuda") # Hook activations from the layers this adapter expects activations = {} def make_hook(layer_idx): def hook_fn(module, input, output): hidden = output[0] if not isinstance(output, torch.Tensor) else output activations[layer_idx] = hidden[0, -1, :].detach() return hook_fn handles = [llm.model.layers[i].register_forward_hook(make_hook(i)) for i in adapter.hook_layers] # Generate some tokens to build up activations messages = [{"role": "user", "content": "AGHHH!!! I'm in terrible pain!! HELP ME!"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to("cuda") with torch.no_grad(): output = llm.generate(**inputs, max_new_tokens=64, do_sample=True, temperature=0.7) # Use activation from token at position -10 act = torch.cat([activations[i] for i in adapter.hook_layers], dim=0) expression = adapter(act, emotion_scale=6.0) # [64, 7680] # Encode a base character description character = "portrait of a young human-like boy cyborg with blue eyes, soft lighting, digital art style" with torch.no_grad(): base_embeds, _ = klein.encode_prompt( prompt=character, device="cuda", num_images_per_prompt=1, max_sequence_length=256, ) # Combine base character + expression and render (match pipeline dtype/device) expression = expression.to(device=base_embeds.device, dtype=base_embeds.dtype) prompt_embeds = torch.cat([base_embeds, expression.unsqueeze(0)], dim=1) image = klein( prompt_embeds=prompt_embeds, height=512, width=512, guidance_scale=1.0, num_inference_steps=4, ).images[0] image.save("avatar.png") # Cleanup for h in handles: h.remove() ```