DV4-558M-4Topic

DV4 (Dual-Vocab 4-Bit) is a novel LLM weight encoding scheme introducing topic-conditional weight reinterpretation via frozen binary flip bit masks. A single set of ternary weights serves multiple knowledge domains by switching a frozen binary mask — reorienting the effective weight configuration for each topic at zero additional parameter cost at inference time.

This repository contains the 558M parameter, 4-topic proof-of-concept model from the paper:

DV4: Topic-Conditional Weight Reinterpretation via Ternary Encoding with Flip Bits
Peter Norman, Independent Researcher, Perth, Western Australia
research@twoswans.com.au | twoswans.com.au
ORCID: 0009-0004-8413-1274 | April 2026

How DV4 Works

Each weight is stored in 4 bits:

3 bits encode a ternary value in {-1, 0, +1}
1 flip bit inverts the polarity of non-zero weights when set

Topic-specific binary masks over the flip bits allow the same ternary weights to exhibit qualitatively different behaviour across domains without modifying the weights themselves. Switching topics is a bitwise broadcast operation — effectively instantaneous at inference time.

effective_weight(w, f) = w        if f == 0
effective_weight(w, f) = -w       if f == 1 and w != 0
effective_weight(w, f) = 0        if w == 0 (flip-invariant)

Model Details

Property	Value
Parameters	558M
Architecture	24-layer transformer, hidden dim 1024, 16 heads
Topics	Math, General, Code, Science
Positional Encoding	RoPE
Normalisation	RMSNorm (pre-norm)
Tokeniser	Qwen2.5-0.5B (151,665 tokens)
Training Steps	8,000 (2,000 per topic)
Hardware	NVIDIA RTX PRO 6000 Blackwell 96GB
Training Time	~100 minutes

Bleed Test Results

The bleed test evaluates cross-topic contamination by measuring perplexity across all data/mask combinations. Every off-diagonal entry exceeds the correct-mask diagonal — the DV4 mechanism successfully differentiates all four topics.

Data \ Mask	Math	General	Code	Science
Math	1,592	5,238	1,360	1,224
General	57,023	578	11,842	1,341
Code	84,757,102,592	936,393	614	61,291
Science	919,762	13,414	3,970	341

Bold = correct mask diagonal

Topic Specificity Scores

Topic	Correct Mask PPL	Mean Wrong PPL	Specificity
Math	1,592	2,607	+0.637
General	578	23,402	+39.47
Code	614	28,252,700,092	+46,005,859
Science	341	312,382	+916.30
Mean	—	—	+11,501,703

The code topic produces the most dramatic separation: perplexity of 614 under its correct mask versus 84.7 billion under the math mask — a specificity ratio exceeding 46 million.

Key Properties

Zero inference overhead — Adding topics requires only storing additional mask entries. The mask library for this 4-topic 558M model is ~56MB, less than 0.1% of model weight storage.

Structural immunity to catastrophic forgetting — Because each topic's flip mask is frozen before training, weight updates under topic A cannot systematically corrupt topic B's encoding. Confirmed empirically by consistent loss resets of 12.06–12.38 at every topic boundary.

Scales independently of topic count — A 10-topic DV4 model has the same inference parameter count as a single-domain model. Storage scales as: 1 × model_size + negligible mask library.

Repository Contents

dv4_558m_inference.pt — Model weights (model_state_dict) and mask library (registry_masks), stripped of optimizer state
training_log.json — Full training log including per-step loss across all four topic phases

Usage

This is a research proof-of-concept demonstrating the DV4 mechanism. The model is undertrained by production standards (7,000 samples per topic). The correct-mask perplexity values reflect this — the meaningful metric is the specificity ratios, not absolute perplexity.

To load the model you will need the DV4 architecture code from the companion GitHub repository:

🔗 GitHub: github.com/StoneColdLlama/dv4

import torch

ckpt = torch.load('dv4_558m_inference.pt', map_location='cpu')
model_state = ckpt['model_state_dict']
masks = ckpt['registry_masks']  # dict: {0: math, 1: general, 2: code, 3: science}
step = ckpt['step']  # 7999

Citation

If you use this model or the DV4 architecture in your research, please cite:

@article{norman2026dv4,
  title={DV4: Topic-Conditional Weight Reinterpretation via Ternary Encoding with Flip Bits},
  author={Norman, Peter},
  year={2026},
  institution={Independent Researcher, Perth, Western Australia},
  email={research@twoswans.com.au}
}

Acknowledgements

The author used Claude (Anthropic) as an AI coding assistant during implementation. All research decisions, architectural design, experimental protocol, and the core DV4 concept are the author's own. The DV4 architecture was independently conceived in March 2026.

Contact

Peter Norman
Independent Researcher | Perth, Western Australia
research@twoswans.com.au | twoswans.com.au
ORCID: 0009-0004-8413-1274

Downloads last month: -; Downloads are not tracked for this model. How to track