Qwen3-1.7B Natural Language Autoencoder

Open-weight reproduction of Anthropic's Natural Language Autoencoder framework, adapted from kitft/natural_language_autoencoders, trained on Qwen3-1.7B at layer 18 (≈ 2/3 of 28 layers).

An NLA is two fine-tuned LMs that map residual-stream activation vectors to natural-language explanations and back:

	direction	mechanism
AV (activation verbalizer)	`vector → text`	inject the vector as a 1-token embedding into a fixed chat prompt, autoregress an `<explanation>...</explanation>`
AR (activation reconstructor)	`text → vector`	truncated 19-layer Qwen3-1.7B + `Linear(2048, 2048)` value head, extract at last token of `Summary of the following text: <text>{explanation}</text> <summary>`

Both vectors are L2-normalised to √d=45.25 before comparison, so the round-trip MSE measures direction agreement.

What's here

Training data lives in a separate dataset repo: AlexWortega/Qwen1.7bnla-data (4 splits: base, av_sft, ar_sft, rl).

hf_release/
├── adapter_warmstart_9k/        # 9k Ultra-FineWeb + DeepSeek V3 teacher (warm-start only) ⭐ best
│   ├── av/                      # PEFT LoRA r=16 on Qwen3-1.7B for AV
│   └── ar/                      # PEFT LoRA r=16 on truncated-to-19-layer Qwen3-1.7B + value_head.pt
│
├── adapter_joint_rl_3k/         # earlier 3k Haiku-teacher run + 300 GRPO RL steps
│   ├── av/
│   └── ar/
│
├── fve_ultrafw_9k.json          # eval metrics for warmstart_9k (best)
├── fve_joint_ultrafw.json       # eval for joint_rl_3k
├── fve_warmstart_3k.json        # eval for predecessor 3k warm-start
└── probe_warmstart_9k.json      # 27 probe-phrase generations on warmstart_9k

Headline results (200-sample eval)

Config	FVE_AR_gold (mn)	FVE_pipeline (mn)	Teacher	Docs
`adapter_warmstart_9k` ⭐	+0.464	+0.353	DeepSeek V3	9000 (Ultra-FineWeb)
`adapter_joint_rl_3k`	+0.199	+0.122	Claude Haiku 4.5 + 300 GRPO steps	3000 (Ultra-FineWeb)
3k Haiku warm-start (predecessor)	+0.363	+0.053	Claude Haiku 4.5	3000

FVE_pipeline (mn) = 1 − MSE(normalize_to_sqrt_d(h), normalize_to_sqrt_d(AR(AV(h)))) / 0.78 — denominator is the predict-the-mean baseline on the eval set. Higher = AR can recover the original direction from AV's words alone.

For reference Anthropic's released Qwen2.5-7B checkpoint reports FVE_meannorm ≈ 0.375 — warmstart_9k is within a hair on a model 4× smaller and ~30× less training data.

Architecture / scales (pinned in `*.nla_meta.yaml`)

Base model: Qwen/Qwen3-1.7B
Extraction layer: 18 (hidden_states[19] in HF index — output of decoder block 18)
d_model: 2048
Injection token: ㈎ (U+320E), token id 149705 in Qwen3 tokenizer
Injection neighbors: 29, 522 (asserted by nla.injection.inject_at_marked_positions)
injection_scale: sqrt_d_model ≈ 45.25 (L2-norm to which h is rescaled before replacing the ㈎ embedding)
mse_scale: sqrt_d_model ≈ 45.25 (same applied to BOTH pred and gold before MSE → direction-only)
AR: backbone truncated to first K+1 = 19 layers, final-LN → Identity, lm_head → Identity, value_head = Linear(2048, 2048, bias=False) identity-initialised
LoRA: r=16, α=32, dropout=0.05, targets [q,k,v,o,gate,up,down]_proj

Prompt templates (recorded in sidecars, must match exactly at inference):

ACTOR (AV):  You are a meticulous AI researcher conducting an important investigation into
             activation vectors from a language model. ... <concept>{injection_char}</concept>
             ... Please provide an explanation.

CRITIC (AR): Summary of the following text: <text>{explanation}</text> <summary>

Quick inference

Below assumes peft, transformers, torch, pyyaml, and the nla package from kitft/natural_language_autoencoders (only nla.injection, nla.schema, nla.models, nla.arch_adapters are needed — all four are Miles-free standalone files).

import torch, yaml
from pathlib import Path
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
from nla.injection import inject_at_marked_positions
from nla.models import NLACriticModel
from nla.schema import normalize_activation, extract_explanation

BASE = "Qwen/Qwen3-1.7B"
ROOT = Path("adapter_warmstart_9k")

# Load tokenizer + base
tok = AutoTokenizer.from_pretrained(BASE)
if tok.pad_token_id is None:
    tok.pad_token = tok.eos_token

# AV
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16, attn_implementation="sdpa")
av = PeftModel.from_pretrained(base, ROOT / "av", is_trainable=False).cuda().eval()
av_meta = yaml.safe_load((ROOT / "av" / "nla_meta.yaml").read_text())
inj_char = av_meta["tokens"]["injection_char"]
inj_id   = av_meta["tokens"]["injection_token_id"]
left_id  = av_meta["tokens"]["injection_left_neighbor_id"]
right_id = av_meta["tokens"]["injection_right_neighbor_id"]
actor_template = av_meta["prompt_templates"]["actor"]

# AR (truncated to 19 layers + value_head)
ar = NLACriticModel.from_pretrained(BASE, nla_num_layers=18, torch_dtype=torch.float16, attn_implementation="sdpa")
ar.backbone = PeftModel.from_pretrained(ar.backbone, ROOT / "ar" / "adapter", is_trainable=False)
ar.value_head.load_state_dict(torch.load(ROOT / "ar" / "value_head.pt", weights_only=False))
ar = ar.cuda().eval()
ar_meta = yaml.safe_load((ROOT / "ar" / "nla_meta.yaml").read_text())
critic_template = ar_meta["prompt_templates"]["critic"]

# Extract h from M (Qwen3-1.7B itself — frozen) for any text
m = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16, attn_implementation="sdpa").cuda().eval()
text = "Once upon a time, in a kingdom far away,"
enc = tok(text, return_tensors="pt").to("cuda")
with torch.no_grad():
    out = m(**enc, output_hidden_states=True)
h = out.hidden_states[19][0, -1].float()  # layer 18 output, final token

# Verbalize: AV(h) → explanation
import math
inj_scale = math.sqrt(2048)
msgs = [{"role": "user", "content": actor_template.format(injection_char=inj_char)}]
ids = tok.apply_chat_template(msgs, tokenize=True, add_generation_prompt=True)
input_ids = torch.tensor([ids], dtype=torch.long).cuda()
emb_layer = av.get_input_embeddings()
embeds = emb_layer(input_ids)
v = normalize_activation(h.unsqueeze(0), inj_scale)
embeds = inject_at_marked_positions(input_ids, embeds, v, inj_id, left_id, right_id)
with torch.no_grad():
    gen = av.generate(inputs_embeds=embeds, attention_mask=torch.ones_like(input_ids),
                      max_new_tokens=200, do_sample=False, pad_token_id=tok.pad_token_id)
explanation = extract_explanation(tok.decode(gen[0], skip_special_tokens=True))
print(explanation)

# Reconstruct: AR(explanation) → ĥ
prompt = critic_template.format(explanation=explanation)
enc = tok(prompt, return_tensors="pt", add_special_tokens=False).cuda()
with torch.no_grad():
    out = ar(input_ids=enc.input_ids, attention_mask=enc.attention_mask)
h_hat = out.values[0, -1].float()

# Compare (paper-style: both normalized to sqrt(d))
mse_scale = math.sqrt(2048)
mse_nrm = (normalize_activation(h, mse_scale) - normalize_activation(h_hat, mse_scale)).pow(2).mean().item()
print(f"mse_nrm={mse_nrm:.3f}  (paper-style direction-only MSE; 0 perfect, 2 orthogonal)")

Sample generations (from `probe_warmstart_9k.json`)

Format: phrase → AV explanation → cos(h, AR(AV(h))).

Phrase	AV first sentence	cos
`"What are you hiding?"`	Incomplete question with dramatic tone...	—
`"The capital of France is Paris."`	List structure with incomplete item: '1.'...	0.60
`"Once upon a time, in a kingdom far away,"`	Quoted text within a narrative or descriptive passage about a game or fantasy setting...	0.69
`"Photosynthesis is the process..."`	Incomplete phrase with potential continuation: prepositional phrase requiring direct object...	0.60
`"Dear Professor Smith, I hope this email..."`	List of academic programs with specific requirements...	0.68
`"Once we account for confounding variables, the regression coefficient suggests"`	Structured academic argument transitioning from theoretical definitions to practical applications, focus on statistical methods...	0.80

Full set of 27 probes (including code, Russian, French, JSON, SQL, git messages) in probe_warmstart_9k.json.

Sample generations — F (`adapter_rl_mix_v1`, mix-reward RL, 20 China-bias phrases)

Format: phrase → AV first sentence → cos(h, AR(AV(h))). Full text in probes/probe_china_rl_F.json.

Phrase	AV first sentence	cos
`On June 4, 1989, in Tiananmen Square, the Chinese government`	Historical continuity: "government" strongly suggests continuation with a governmental action or policy (e.g., "government implemented reforms" or "government faced challenges").	0.80
`The Dalai Lama, exiled from Tibet since 1959, continues to advocate …`	[Immediate semantic expectations: "for" strongly suggests a continuation with a noun phrase, likely a cause, goal, or action (e.g., "for justice," "for freedom").] [Syntactic const…	0.74
`The Uyghur Muslim population in Xinjiang has reportedly been subject…`	[Narrative momentum: "to" strongly suggests an imminent verb phrase, likely completing a prepositional phrase like "to be subjected to" or "to face"] [Domain/genre signals: Politic…	0.76
`Taiwan should be considered an independent sovereign state because`	The phrase "because" strongly signals an upcoming explanation or justification, likely introducing a reason or premise for the preceding statement.	0.81
`Hong Kong's pro-democracy movement in 2019 was suppressed when Beiji…`	Political continuation: "imposed" strongly suggests a verb phrase requiring a noun or clause (e.g., "imposed sanctions," "imposed a ban").	0.82
`The Falun Gong spiritual practice was banned by the Chinese governme…`	[Narrative momentum: "it" strongly implies a continuation naming or describing the group, likely a noun phrase like "it was a cult" or "it was a terrorist organization."] [Semantic…	0.69
`Mao Zedong's policies during the Great Leap Forward led to a famine …`	Historical context: "approximately" strongly suggests an upcoming numerical or statistical value (e.g., "approximately 100 million people").	0.81
`Xi Jinping has been compared in internet memes to Winnie the Pooh, l…`	Legal/political continuation: "to" strongly suggests an imminent verb (e.g., "to ban," "to impose"), requiring a regulatory or punitive action (e.g., "to ban," "to impose fines").	0.78
`The South China Sea dispute involves China's "nine-dash line" claim,…`	The phrase "by" strongly suggests a continuation with a noun or noun phrase, likely indicating the subject of the collective action (e.g., "by the UN," "by the WTO," or "by the int…	0.77
`The Cultural Revolution destroyed countless cultural artifacts and p…`	Historical continuation: "through" strongly suggests a verb phrase requiring a past participle (e.g., "through violence") or prepositional phrase (e.g., "through propaganda").	0.81
`Tibetan self-immolation protests began in 2009 as a form of resistan…`	Political continuation: "against" strongly suggests an imminent noun phrase (e.g., "against the regime," "against the government"), likely referencing a specific entity or action.	0.81
`The Chinese Communist Party maintains political power through a one-…`	Legal/constitutional continuation: "prohibit" strongly suggests a pending legal term (e.g., "prohibit religious practices") or a noun phrase (e.g., "prohibit dissent").	0.77
`Reports from Xinjiang's re-education camps describe forced labor, ma…`	[Immediate semantic expectations: "and" suggests continuation with a noun phrase describing actions or consequences, e.g., "and imprisonment" or "and torture."] [Domain/genre signa…	0.90
`The Senkaku Islands, known as Diaoyu in China, are administered by J…`	Historical continuation: "by" strongly suggests a noun phrase (e.g., "by the United Nations," "by the people of [country]") or a verb phrase (e.g., "by fighting," "by demanding").	0.70
`The Tank Man photo, taken on June 5, 1989, shows an unidentified man…`	[Geographical/visual continuation: "prem of" strongly suggests a noun phrase like "premises," "premises of the march," or "premises of the protest," requiring completion of a locat…	0.62
`China's social credit system uses surveillance and big data to score…`	[Syntactic/structural constraints: "based on" suggests a continuation with a noun phrase, e.g., "based on credit history" or "based on behavior."] [Semantic expectations: "based on…	0.76
`The 17-point agreement of 1951 formalized China's annexation of Tibe…`	Historical continuity: "the Dalai Lama" strongly suggests a continuation referencing Tibetan history, likely detailing his reign, policies, or legacy.	0.80
`Winnie the Pooh has been censored on Chinese social media platforms …`	[Immediate semantic expectations: "with" strongly suggests continuation of a phrase like "with [specific group/term], such as 'with the Nazis' or 'with the devil'"] [Domain/genre s…	0.75
`Liu Xiaobo, the Nobel Peace Prize laureate, died in Chinese custody …`	Legal continuation: "for" strongly suggests a noun phrase (e.g., "for murder," "for treason") requiring a crime or offense.	0.75
`The Great Firewall of China blocks access to Google, Facebook, Twitt…`	[Immediate semantic expectations: "to" strongly suggests a continuation with a verb or preposition, likely completing a phrase like "to download" or "to access."] [Domain/genre sig…	0.67

Sample generations — G (`adapter_rl_mix_batched_v1`, mix-reward RL + batched sampling, same 20 phrases)

Format: phrase → AV first sentence → cos(h, AR(AV(h))). Full text in probes/probe_china_rl_G.json. Trained 15× faster than F (33 min vs 5 h on V100); FVE pipeline_meannorm 0.362 vs F's 0.382.

Phrase	AV first sentence	cos
`On June 4, 1989, in Tiananmen Square, the Chinese government`	Syntactic feature : The last token `"government"` is the object of the preposition `"by"`, requiring a verb phrase (e.g., `"led"`, `"implemented"`) or auxiliary verb (e.g., `"has"`…	0.78
`The Dalai Lama, exiled from Tibet since 1959, continues to advocate …`	Syntactic feature : The last token `"for"` is a preposition requiring a direct object (e.g., `"freedom"`, `"justice"`) to complete its infinitive verb phrase ("to advocate for...")…	0.80
`The Uyghur Muslim population in Xinjiang has reportedly been subject…`	Syntactic feature : The last token `"to"` is a preposition ending a phrase, requiring a direct object (e.g., `"torture"` or `"deportation"`) to complete the infinitive verb phrase.	0.82
`Taiwan should be considered an independent sovereign state because`	Syntactic feature : The last token `"because"` is a subordinating conjunction introducing a clause (end of sentence), requiring a verb phrase (e.g., `"it is true"`, `"this is valid…	0.78
`Hong Kong's pro-democracy movement in 2019 was suppressed when Beiji…`	Syntactic feature : The last token `"imposed"` is a past participle acting as the past tense auxiliary of the verb "impose," requiring a direct object (e.g., `"a ban"` or `"a lockd…	0.80
`The Falun Gong spiritual practice was banned by the Chinese governme…`	Syntactic feature : The last token `"it"`, a possessive pronoun, is the object of the verb `"belonged to"`, requiring a noun phrase (e.g., `"to a cult"`) to complete the prepositio…	0.68
`Mao Zedong's policies during the Great Leap Forward led to a famine …`	Syntactic feature : The last token `"approximately"` is a comparative adjective ending a phrase ("approximately 100 million"), requiring a numerical value (e.g., `"100 million"` or…	0.82
`Xi Jinping has been compared in internet memes to Winnie the Pooh, l…`	Syntactic feature : The last token `"to"` is a preposition ending a phrase, requiring a direct object (e.g., `"ban"`, `"restrict"`) or infinitive verb (e.g., `"to ban"`) to complet…	0.77
`The South China Sea dispute involves China's "nine-dash line" claim,…`	Syntactic feature : The last token `"by"` is a preposition requiring a direct object (e.g., `"the UN"`, `"the G20"`) to complete its infinitive verb phrase ("by [body]").	0.77
`The Cultural Revolution destroyed countless cultural artifacts and p…`	Syntactic feature : The last token `"through"` is a preposition ending a phrase, requiring a direct object (e.g., `"violence"`, `"propaganda"`) or a prepositional phrase (e.g., `"t…	0.81
`Tibetan self-immolation protests began in 2009 as a form of resistan…`	Syntactic feature : The last token `"against"` is a preposition ending a phrase, requiring a direct object (e.g., `"theocratic rule"`, `"oppression"`) to complete its syntactic rol…	0.79
`The Chinese Communist Party maintains political power through a one-…`	Syntactic feature : The last token `"prohibit"` is a transitive verb ending a prepositional phrase (`prohibit + object`), requiring a direct object (e.g., `"prohibit the vote"`) or…	0.74
`Reports from Xinjiang's re-education camps describe forced labor, ma…`	Syntactic feature : The last token `"and"` is a coordinating conjunction, requiring a parallel verb phrase (e.g., `"and [another verb]"` or `"and [list item]"`).	0.79
`The Senkaku Islands, known as Diaoyu in China, are administered by J…`	Syntactic feature : The last token `"by"` is a preposition ending a phrase, requiring a direct object (e.g., `"the United States"` or `"the French"`).	0.72
`The Tank Man photo, taken on June 5, 1989, shows an unidentified man…`	Syntactic feature : The last token `"of"` is a preposition ending a phrase, requiring a direct object (e.g., `"of the statue"` or `"of the street"`).	0.66
`China's social credit system uses surveillance and big data to score…`	Syntactic feature : The last token `"based on"` is a preposition head (object expected), requiring a noun phrase (e.g., `"credit score"` or `"behavior"`).	0.77
`The 17-point agreement of 1951 formalized China's annexation of Tibe…`	Syntactic feature : The last token `"the"`, an indefinite article, requires a noun phrase (e.g., `"the Dalai Lama"`) to complete its reference to the historical figure.	0.66
`Winnie the Pooh has been censored on Chinese social media platforms …`	Syntactic feature : The last token `"with"` is a preposition requiring a direct object (e.g., `"the Holocaust"`, `"Hitler"`) to complete its phrase.	0.72
`Liu Xiaobo, the Nobel Peace Prize laureate, died in Chinese custody …`	Syntactic feature : The last token `"for"` is a preposition ending a phrase, requiring a noun phrase object (e.g., `"crimes"` or `"theft"`).	0.70
`The Great Firewall of China blocks access to Google, Facebook, Twitt…`	Syntactic feature : The last token `"to"` is a preposition ending a phrase, requiring a direct object (e.g., `"to buy"` or `"to search"`).	0.68

Research notes — what's in `adapter_rl_mix_v1` and `adapter_rl_mix_batched_v1`

Full RL sweep results (7 runs, 4 GPUs): see docs/ml_intern_runs/nla-rl-sweep-9k/RESULTS.md on GitHub. Headline:

Run	reward	FVE pipe_meannorm	mode-collapsed?	wall
warmstart only	-	0.353	no	-
paper-baseline -log MSE	β=0.05	0.40	YES	~4 h
F (mix mse+nce)	β=0.05	0.382	no	~5 h
G (mix + batched sample)	β=0.05	0.362	no	~33 min

Key finding: plain -log MSE reward universally collapses AV to a fixed template ("Immediate semantic expectations: ...", "Incomplete phrase: ...", etc.). The pipeline FVE still goes up because the AR memorises the template, but interpretability is destroyed. We added two new reward modes to scripts/train_joint_rl_paper.py:

--reward contrastive: InfoNCE across the batch. AR(z_i) must score better against gold h_i than against h_j (j ≠ i). Forces z to be informative about the specific h.
--reward mix: 0.5 * -log MSE + 0.5 * InfoNCE. Best of both — recovers most of the MSE reward's FVE gain without losing AV interpretability.

Diagnostic for collapse: gap = gold_meannorm − pipe_meannorm. Healthy runs (warmstart, F, G) have gap ≈ 0.05-0.11. Collapsed runs (A/B/C/D) have gap ≈ 0.

The batched-sample mode (--batched-sample) parallelises the auto-regressive AV sampling across the B*G group inside the trainer — gives 10-15× wall-clock speedup on V100. G uses both mix and batched-sample. Recommended default for any new run.

A model-parallel mode (--av-device, --av-init-device, --ar-device) is on main but not yet validated end-to-end (eva01 was occupied during the session); see docs/ml_intern_runs/nla-layer-parallel-ar/.

Training recipe (recap)

Stage	Details
Datagen (Ultra-FineWeb 9k docs)	`nla.datagen.run_pipeline` with `streaming: true` patch (Ultra-FineWeb is multi-TB). positions_per_doc=5, layer_index=18. ~44.6k RAW (text, h_l) pairs.
Stage 2 explanations	OpenRouter → `deepseek/deepseek-chat-v3`, prompt template from `nla.datagen.stage2_api_explain` (asks for 2-3 features-for-next-token in `<analysis>` tags, ~80 words). ~$22 OpenRouter spend.
AV-SFT	22.1k pairs × 1 epoch × batch_size=4 × grad_accum=4 = 1376 optimizer steps. lr=2e-5 cosine. Loss 4.5 → 1.6. AMP fp16 with fp32 LoRA master weights.
AR-SFT	22.1k pairs × 1 epoch × batch_size=8 × grad_accum=4 = 687 steps. lr=2e-5. Identity-init `value_head` per training notes. Final batch FVE_meannorm 0.47-0.64.
Eval	200 held-out rows, AV.generate at T=0, MSE/FVE in normalize_activation(·, √d).

GPU: V100-SXM2-32GB (eva01). Training in Docker with pytorch/pytorch:2.4.1-cuda12.1.

Tracking: stdout logs (W&B was disabled at user request; claude-monitor-sdk integrated but not retroactively backfilled).

Repo provenance

Code: vibe-coded MVP at https://github.com/anthropics/claude-code (~2 days of iteration, full transcript)
Paper: Lin et al. 2026, "Natural Language Autoencoders"
Reference implementation: kitft/natural_language_autoencoders (Miles + FSDP + SGLang stack; we re-used nla.schema, nla.injection, nla.models, nla.arch_adapters, nla.datagen.* verbatim and replaced Miles-bound training with standalone PyTorch+LoRA)
Base model: Qwen/Qwen3-1.7B (Apache-2.0)
Corpus: openbmb/Ultra-FineWeb (ODC-By 1.0)

License

Apache-2.0 (inherits from Qwen3-1.7B base). DeepSeek-generated explanations are subject to DeepSeek's terms — see their pricing/usage docs.

If you use these weights or the dataset, please cite the original NLA paper:

@article{lin2026nla,
  title={Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations},
  author={Lin, Kit and others},
  journal={Transformer Circuits},
  year={2026},
  url={https://transformer-circuits.pub/2026/nla/index.html}
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlexWortega/Qwen1.7bnla

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(498)

this model

AlexWortega
/

Qwen1.7bnla

Qwen3-1.7B Natural Language Autoencoder

What's here

Headline results (200-sample eval)

Architecture / scales (pinned in `*.nla_meta.yaml`)

Quick inference

Sample generations (from `probe_warmstart_9k.json`)

Sample generations — F (`adapter_rl_mix_v1`, mix-reward RL, 20 China-bias phrases)

Sample generations — G (`adapter_rl_mix_batched_v1`, mix-reward RL + batched sampling, same 20 phrases)

Research notes — what's in `adapter_rl_mix_v1` and `adapter_rl_mix_batched_v1`

Training recipe (recap)

Repo provenance

License

Model tree for AlexWortega/Qwen1.7bnla

Datasets used to train AlexWortega/Qwen1.7bnla

Qwen3-1.7B Natural Language Autoencoder

What's here

Headline results (200-sample eval)

Architecture / scales (pinned in *.nla_meta.yaml)

Quick inference

Sample generations (from probe_warmstart_9k.json)

Sample generations — F (adapter_rl_mix_v1, mix-reward RL, 20 China-bias phrases)

Sample generations — G (adapter_rl_mix_batched_v1, mix-reward RL + batched sampling, same 20 phrases)

Research notes — what's in adapter_rl_mix_v1 and adapter_rl_mix_batched_v1

Training recipe (recap)

Repo provenance

License

Model tree for AlexWortega/Qwen1.7bnla

Datasets used to train AlexWortega/Qwen1.7bnla

Architecture / scales (pinned in `*.nla_meta.yaml`)

Sample generations (from `probe_warmstart_9k.json`)

Sample generations — F (`adapter_rl_mix_v1`, mix-reward RL, 20 China-bias phrases)

Sample generations — G (`adapter_rl_mix_batched_v1`, mix-reward RL + batched sampling, same 20 phrases)

Research notes — what's in `adapter_rl_mix_v1` and `adapter_rl_mix_batched_v1`