FLAS — Llama-3.1-8B-Instruct

Steer Llama toward any concept you can describe in words. "Talk like a pirate." "Respond as a noir detective." "Frame everything as a musical performance." "Use mathematical notation." Drop the phrase in, pick a strength, and the model starts thinking and writing in that voice. No fine-tuning, no per-concept training, no contrastive data.

This is an adapter, not a fine-tune. The base model meta-llama/Llama-3.1-8B-Instruct stays completely frozen; FLAS adds a small concept-conditioned flow module that intervenes on the residual stream at one layer. Nothing in the base weights changes.

Hardware: runs on a single GPU — end-to-end interactive inference (base model + FLAS modules, bf16) peaks at roughly ~17 GB.

This is the natural-language activation-steering checkpoint for meta-llama/Llama-3.1-8B-Instruct, trained with FLAS (Flow-based Activation Steering). Where prior work like Golden Gate Claude had to lock in a single behavior in advance, FLAS learns a single concept-conditioned velocity field $v_\theta(h, t, c)$ . At inference you hand it any natural-language concept $c$ and it produces the right intervention on the fly. The same checkpoint handles thousands of unseen concepts.

🌐 Project page: https://flas-ai.github.io
📄 Paper: https://arxiv.org/abs/2605.05892
💻 Code: https://github.com/flas-ai/FLAS
📚 Training data: flas-ai/flas-concept-46k

How it works

FLAS learns a concept-conditioned velocity field $v_\theta(h, t, c)$ that transports an unsteered activation $h$ to a steered activation $h^{'}$ by integrating a flow ODE:

$h' = \varphi_T(h) = h + \int_0^T v_\theta\!\bigl(\varphi_t(h),\, t,\, c\bigr)\, dt$

The flow time $T$ is a continuous steering-strength knob; sampling $T \sim \mathrm{Uniform}[T_{\min}, T_{\max}]$ during training enables zero-shot strength control at inference. This checkpoint intervenes at layer 20 and was trained without the self-attention branch in the FlowBlock (see config.json).

Files

File	Description
`flas-llama-3.1-8b-instruct.safetensors`	Flow-function weights (235 M params, ~471 MB, bf16).
`config.json`	Architecture config consumed by the FLAS loader (`model_id`, `layer`, `num_blocks`, `n_steps`, `prompt_format`, …).

The frozen concept encoder is not stored — at load time it shares the embedding and first two decoder layers with the base model in VRAM (no duplicate copies).

Usage

These weights are consumed by the FLAS reference implementation. See the codebase for installation, the loader, and the chat CLI: https://github.com/flas-ai/FLAS.

Base-model license. Use of this steering checkpoint requires the base model meta-llama/Llama-3.1-8B-Instruct, which is distributed under the Llama 3.1 Community License. The FLAS flow weights in this repo are released under Apache-2.0.

Citation

@article{flas2026,
  title={Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention}, 
  author={Zehao Jin and Ruixuan Deng and Junran Wang and Xinjie Shen and Chao Zhang},
  year={2026},
  eprint={2605.05892},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.05892}, 
}