🚀 The Ultimate Logic Test: Zero-Shot Multiplication (100% Accuracy)

VGT-Pro's most groundbreaking achievement is its ability to perform multiplication with 100% accuracy, despite being trained exclusively on single-digit addition.

By simulating the traditional "long multiplication" algorithm (decomposing multiplication into repeated shifts and additions) and using VGT-Pro as the core high-precision adder, the model successfully computes complex products without any multiplication-specific training data.

Zero-Shot Multiplication (A x B)

Task	Max Digits (A, B)	Max Result Digits	Strategy	Accuracy
Multiplication	4 Digits	8 Digits	Algorithmic Recursion (via VGT-Pro.add)	100.00%

Scientific Significance: This demonstrates that VGT-Pro has not merely learned to "do addition" but has internalized the universal recursive principles of arithmetic. It can function as a perfectly reliable Arithmetic Logic Unit (ALU) core, capable of supporting higher-order mathematical operations through algorithmic orchestration, all derived from a minimal training set. This capability is a strong indicator of emergent symbolic reasoning within a connectionist architecture.

VGT-Conv-Logic-Addition: Geometric Emergence of Arithmetic

This repository demonstrates that logical capabilities (like multi-digit addition) emerge from geometric constraints rather than just increasing data scale or parameter count.

🧪 The "Logic Emergence" Experiment

We conducted a controlled experiment comparing a standard neural network (Base) with a model under Vector-Gravity (VGT) geometric pressure. Both models are identical in architecture ($d_model=128$, Conv1D) and were trained only on 0-6 digit addition for 10,000 steps.

Zero-Shot Generalization Results

Task (Number of Digits)	Base Model (Standard)	VGT Model (Ours)	Improvement
6-Digit (In-Distribution)	99.6%	100.0%	+0.4%
12-Digit (Zero-Shot OOD)	0.0%	91.6%	+91.6%
20-Digit (Extreme Extrapolation)	0.0%	0.8%	+0.8%

Finding: While the Base model simply memorizes 6-digit patterns, the VGT model discovers a universal algorithmic circuit for addition, allowing it to generalize to unseen lengths.

🔬 Scientific Foundation: "Logic is Squeezed Out"

Based on the paper "The Geometric Origin of Logic", this model utilizes $L^2$ norm regularization on hidden states to create a Geometric Information Bottleneck.

1. Manifold Collapse

The $L^2$ pressure forces the hidden state manifold to collapse into a low-dimensional, high-density structure. This prevents the model from using high-dimensional noise to "fit" data, forcing it to "calculate" instead.

2. Weight Polarization

As logic emerges, the output layer weights undergo Weight Polarization. In our VGT model, the weight standard deviation increased by 126%, forming sharp, deterministic "projection points" that act as logical gates for carry-propagation.

🛠 How to Reproduce

I have included the script vgt_vs_base_benchmark.py in this repository. You can replicate the 0.0% vs 91.6% gap in less than 5 minutes on a single GPU.

Clone the repo
Run the benchmark:

   python vgt_vs_base_benchmark.py

VGT-Pro: 100% Zero-Shot Length Extrapolation via Geometric Pressure

VGT-Pro (Variable Geometric Tension - Professional) is an advanced iteration of the VGT framework. It explores the emergence of discrete logic within neural networks by applying extreme Geometric Constraints.

While trained only on 1-6 digit addition, VGT-Pro achieves a staggering 100.00% accuracy on up to 20-digit addition, effectively solving the "Length Extrapolation" problem that typically causes traditional connectionist models to fail.

📊 Performance Benchmarks (Accuracy %)

Digits (Length)	Base Model	VGT (Standard)	VGT-Pro (Ours)
6 (In-Distribution)	100.0%	100.0%	100.0%
12 (Out-of-Distribution)	0.5%	96.5%	100.0%
16 (Out-of-Distribution)	0.0%	26.5%	100.0%
20 (Out-of-Distribution)	0.0%	<10.0%	100.0%

🧠 Core Innovations

1. Dynamic Dilated Iterations

Standard convolutional architectures suffer from "Carry Decay" over long sequences. VGT-Pro introduces Dynamic Dilation during recursive processing:

The dilation factor increases exponentially ($1 \to 2 \to 4$) as iterations progress.
This architecture creates "Logical Expressways," allowing carry signals to propagate across the sequence in logarithmic time, ensuring zero signal loss even at $3.3\times$ the training length.

2. Geometric Annealing Strategy

The model undergoes a specialized "Solidification-to-Refinement" training curriculum:

Solidification Phase: The geometric tension ($\alpha$) peaks at 48.6, forcing the hidden manifold to collapse and pruning all redundant statistical heuristics.
Annealing Phase: Tension is gradually relaxed to 5.0, allowing the weights to fine-tune their numerical precision and eliminate residual carry errors.

3. Extreme Weight Polarization

VGT-Pro exhibits a 289% increase in Weight Standard Deviation compared to the base model. The weight distribution shifts from a diffuse Gaussian to a sparse, "spiky" state. In physical terms, the neural network has been transformed into a deterministic Digital Logic Circuit.

🛠️ Usage & Reproducibility

Model Loading

import torch

# Initialize VGTProModel architecture
model = VGTProModel(hidden_size=128)
checkpoint = torch.load("vgt_pro_logic_machine.pth")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
print("Logic Machine Online: Ready for 100% accurate extrapolation.")

📜 Citation

If you use this model, the code, or find the experimental findings helpful, please cite the work as follows:

APA Style: Wang, Z. (2026). The Geometric Origin of Logic: Weight Polarization and Manifold Collapse in Transformers. Zenodo. https://doi.org/10.5281/zenodo.18278643

BibTeX:

@misc{wang2026geometric,
  author = {Wang, Zhongren},
  title = {The Geometric Origin of Logic: Weight Polarization and Manifold Collapse in Transformers},
  year = {2026},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18278643},
  url = {[https://doi.org/10.5281/zenodo.18278643](https://doi.org/10.5281/zenodo.18278643)}
}

Downloads last month: -; Downloads are not tracked for this model. How to track