While exploring State-Constraint Orthogonality is an interesting attempt, forcing hidden states into decoupled orthogonal subspaces is actually a very classic representation learning heuristic. The core idea feels quite dated rather than a novel paradigm.
Furthermore, the current experimental setup and baseline selection are highly questionable and lack persuasiveness. The models used for comparison in the chart (e.g., Llama-2-7B, Qwen1.5-1.8B) are mostly older generations, serving as very weak baselines. More importantly, since Arcade-3B is built on top of SmolLM3-3B, the most crucial baseline—the default Instruct version of SmolLM3-3B itself—is missing. Without rigorously controlled comparisons, it is difficult to attribute the performance gains to the proposed "orthogonal decoupling" rather than just the fine-tuning data itself.
Are there any plans to benchmark against newer and stronger models in a similar parameter class (e.g., Qwen3.5-2B-Instruct), and to provide the necessary ablation studies to truly validate the effectiveness of this approach?