DV4-558M-4Topic

DV4 (Dual-Vocab 4-Bit) is a novel LLM weight encoding scheme introducing topic-conditional weight reinterpretation via frozen binary flip bit masks. A single set of ternary weights serves multiple knowledge domains by switching a frozen binary mask β€” reorienting the effective weight configuration for each topic at zero additional parameter cost at inference time.

This repository contains the 558M parameter, 4-topic proof-of-concept model from the paper:

DV4: Topic-Conditional Weight Reinterpretation via Ternary Encoding with Flip Bits
Peter Norman, Independent Researcher, Perth, Western Australia
research@twoswans.com.au | twoswans.com.au
ORCID: 0009-0004-8413-1274 | April 2026


How DV4 Works

Each weight is stored in 4 bits:

  • 3 bits encode a ternary value in {-1, 0, +1}
  • 1 flip bit inverts the polarity of non-zero weights when set

Topic-specific binary masks over the flip bits allow the same ternary weights to exhibit qualitatively different behaviour across domains without modifying the weights themselves. Switching topics is a bitwise broadcast operation β€” effectively instantaneous at inference time.

effective_weight(w, f) = w        if f == 0
effective_weight(w, f) = -w       if f == 1 and w != 0
effective_weight(w, f) = 0        if w == 0 (flip-invariant)

Model Details

Property Value
Parameters 558M
Architecture 24-layer transformer, hidden dim 1024, 16 heads
Topics Math, General, Code, Science
Positional Encoding RoPE
Normalisation RMSNorm (pre-norm)
Tokeniser Qwen2.5-0.5B (151,665 tokens)
Training Steps 8,000 (2,000 per topic)
Hardware NVIDIA RTX PRO 6000 Blackwell 96GB
Training Time ~100 minutes

Bleed Test Results

The bleed test evaluates cross-topic contamination by measuring perplexity across all data/mask combinations. Every off-diagonal entry exceeds the correct-mask diagonal β€” the DV4 mechanism successfully differentiates all four topics.

Data \ Mask Math General Code Science
Math 1,592 5,238 1,360 1,224
General 57,023 578 11,842 1,341
Code 84,757,102,592 936,393 614 61,291
Science 919,762 13,414 3,970 341

Bold = correct mask diagonal

Topic Specificity Scores

Topic Correct Mask PPL Mean Wrong PPL Specificity
Math 1,592 2,607 +0.637
General 578 23,402 +39.47
Code 614 28,252,700,092 +46,005,859
Science 341 312,382 +916.30
Mean β€” β€” +11,501,703

The code topic produces the most dramatic separation: perplexity of 614 under its correct mask versus 84.7 billion under the math mask β€” a specificity ratio exceeding 46 million.


Key Properties

Zero inference overhead β€” Adding topics requires only storing additional mask entries. The mask library for this 4-topic 558M model is ~56MB, less than 0.1% of model weight storage.

Structural immunity to catastrophic forgetting β€” Because each topic's flip mask is frozen before training, weight updates under topic A cannot systematically corrupt topic B's encoding. Confirmed empirically by consistent loss resets of 12.06–12.38 at every topic boundary.

Scales independently of topic count β€” A 10-topic DV4 model has the same inference parameter count as a single-domain model. Storage scales as: 1 Γ— model_size + negligible mask library.


Repository Contents

  • dv4_558m_inference.pt β€” Model weights (model_state_dict) and mask library (registry_masks), stripped of optimizer state
  • training_log.json β€” Full training log including per-step loss across all four topic phases

Usage

This is a research proof-of-concept demonstrating the DV4 mechanism. The model is undertrained by production standards (7,000 samples per topic). The correct-mask perplexity values reflect this β€” the meaningful metric is the specificity ratios, not absolute perplexity.

To load the model you will need the DV4 architecture code from the companion GitHub repository:

πŸ”— GitHub: github.com/StoneColdLlama/dv4

import torch

ckpt = torch.load('dv4_558m_inference.pt', map_location='cpu')
model_state = ckpt['model_state_dict']
masks = ckpt['registry_masks']  # dict: {0: math, 1: general, 2: code, 3: science}
step = ckpt['step']  # 7999

Citation

If you use this model or the DV4 architecture in your research, please cite:

@article{norman2026dv4,
  title={DV4: Topic-Conditional Weight Reinterpretation via Ternary Encoding with Flip Bits},
  author={Norman, Peter},
  year={2026},
  institution={Independent Researcher, Perth, Western Australia},
  email={research@twoswans.com.au}
}

Acknowledgements

The author used Claude (Anthropic) as an AI coding assistant during implementation. All research decisions, architectural design, experimental protocol, and the core DV4 concept are the author's own. The DV4 architecture was independently conceived in March 2026.


Contact

Peter Norman
Independent Researcher | Perth, Western Australia
research@twoswans.com.au | twoswans.com.au
ORCID: 0009-0004-8413-1274

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support