Spaces:
Configuration error
Configuration error
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>Divinci AI</title> | |
| <style> | |
| body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; max-width: 860px; margin: 0 auto; padding: 2rem 1.5rem; color: #1a1a1a; line-height: 1.6; } | |
| h1 { font-size: 1.8rem; font-weight: 700; margin-bottom: 0.25rem; } | |
| h2 { font-size: 1.1rem; font-weight: 600; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e5e7eb; padding-bottom: 0.25rem; } | |
| p { margin: 0.5rem 0 1rem; } | |
| table { border-collapse: collapse; width: 100%; margin: 1rem 0; font-size: 0.9rem; } | |
| th, td { border: 1px solid #e5e7eb; padding: 0.5rem 0.75rem; text-align: left; } | |
| th { background: #f9fafb; font-weight: 600; } | |
| a { color: #2563eb; text-decoration: none; } | |
| a:hover { text-decoration: underline; } | |
| .tagline { color: #6b7280; font-size: 1rem; margin-bottom: 1.5rem; } | |
| .footer { margin-top: 2.5rem; padding-top: 1rem; border-top: 1px solid #e5e7eb; font-size: 0.85rem; color: #6b7280; } | |
| hr { border: none; border-top: 1px solid #e5e7eb; margin: 1.5rem 0; } | |
| </style> | |
| </head> | |
| <body> | |
| <h1 id="divinci-ai">Divinci AI</h1> | |
| <p class="tagline">Feature-level interpretability artifacts for open transformers β | |
| built openly, validated empirically.</p> | |
| <p>A <strong>vindex</strong> is a transformer's weights decompiled into | |
| a queryable feature database. It exposes the entity associations, | |
| circuit structure, and knowledge-editing surfaces that live inside a | |
| model's FFN layers β without requiring GPU inference for most | |
| operations.</p> | |
| <p>Think of it as the model's index: the thing you search before you run | |
| it.</p> | |
| <hr /> | |
| <h2 id="interactive-viewer">Interactive viewer</h2> | |
| <p><a href="https://huggingface.co/spaces/Divinci-AI/vindex-viewer"><img | |
| src="https://huggingface.co/spaces/Divinci-AI/vindex-viewer/resolve/main/vindex-hero-bg.gif" | |
| alt="LarQL Vindex Viewer β interactive 3D + 2D circuit visualization" /></a></p> | |
| <p><strong><a | |
| href="https://huggingface.co/spaces/Divinci-AI/vindex-viewer">β Open the | |
| interactive viewer</a></strong></p> | |
| <p>Pick any of 9 models from the dropdown. Toggle between the 3D | |
| cylinder spiral and a flat 2D circuit/network view. Hit <strong>β | |
| Compare</strong> to render the current model alongside Bonsai 1-bit, | |
| side-by-side β the contrast between fp16 structure (organized rings) and | |
| 1-bit dissolution (scattered cloud) is the most direct picture of what | |
| 1-bit training does to a transformer's internal organization that we | |
| know how to render. Search for entity features | |
| (<code>?q=paris&model=gemma-4-e2b</code>) to see real probe-derived | |
| activations light up across the layer stack β backed by a 5000-token | |
| offline-built search index.</p> | |
| <hr /> | |
| <h2 id="published-vindexes">Published vindexes</h2> | |
| <p>Cross-family evidence in hand: <strong>Gemma</strong>, | |
| <strong>Qwen3</strong>, <strong>Mistral</strong>, | |
| <strong>Llama</strong>, <strong>OpenAI MoE</strong>, | |
| <strong>Moonshot MoE</strong>, <strong>DeepSeek-V4 MoE</strong>, plus two 1-bit | |
| controls.</p> | |
| <table> | |
| <tbody> | |
| <tr><td><strong>MODEL</strong></td><td><strong>ARCHITECTURE</strong></td><td><strong>PARAMS</strong></td><td><strong>VINDEX</strong></td><td><strong>C4 (LAYER TEMP)</strong></td><td><strong>NOTES</strong></td></tr> | |
| <tr><td><strong>Gemma 4 E2B-it</strong></td><td>Dense (Gemma 4)</td><td>2B</td><td><a href="https://huggingface.co/Divinci-AI/gemma-4-e2b-vindex">gemma-4-e2b-vindex</a></td><td><strong>0.0407 Β± 0.0004</strong> β</td><td>3-seed validated; headline universal-constant model</td></tr> | |
| <tr><td>Qwen3-0.6B</td><td>Dense (Qwen 3)</td><td>0.6B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-0.6b-vindex">qwen3-0.6b-vindex</a></td><td>0.411</td><td>Smallest published; Qwen3 family-elevated C4</td></tr> | |
| <tr><td>Qwen3-8B bf16</td><td>Dense (Qwen 3)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/qwen3-8b-vindex">qwen3-8b-vindex</a></td><td>0.804</td><td>Architecture control for Bonsai</td></tr> | |
| <tr><td>Qwen3.6-35B-A3B</td><td>MoE (Qwen 3.6)</td><td>35B / 3B active</td><td><a href="https://huggingface.co/Divinci-AI/qwen3.6-35b-a3b-vindex">qwen3.6-35b-a3b-vindex</a></td><td>β</td><td>256 experts, 40 layers</td></tr> | |
| <tr><td>Ministral-3B</td><td>Dense (Mistral 3)</td><td>3B</td><td><a href="https://huggingface.co/Divinci-AI/ministral-3b-vindex">ministral-3b-vindex</a></td><td>0.265</td><td>fp8 β bf16 reconstruction</td></tr> | |
| <tr><td>Llama 3.1-8B</td><td>Dense (Llama 3.1)</td><td>8B</td><td><a href="https://huggingface.co/Divinci-AI/llama-3.1-8b-vindex">llama-3.1-8b-vindex</a></td><td><strong>0.012</strong> β</td><td>Llama family signature</td></tr> | |
| <tr><td>MedGemma 1.5-4B</td><td>Dense (Gemma multimodal)</td><td>4B</td><td><a href="https://huggingface.co/Divinci-AI/medgemma-1.5-4b-vindex">medgemma-1.5-4b-vindex</a></td><td><strong>1.898 β </strong></td><td>45Γ cohort anomaly β under investigation</td></tr> | |
| <tr><td>GPT-OSS 120B</td><td>MoE (OpenAI)</td><td>120B</td><td><a href="https://huggingface.co/Divinci-AI/gpt-oss-120b-vindex">gpt-oss-120b-vindex</a></td><td>β</td><td>S[0] grows 117Γ with depth (L0=111 β final=13,056)</td></tr> | |
| <tr><td><strong>Kimi-K2-Instruct</strong></td><td>MoE fp8-native (DeepSeek-V3 style)</td><td>1T / 32B active</td><td><a href="https://huggingface.co/Divinci-AI/kimi-k2-instruct-vindex">kimi-k2-instruct-vindex</a></td><td><strong>0.0938</strong> (MoE median)</td><td>60 MoE layers; 42.28 GB gate_proj binary; broader L52βL60 secondary rise than initial dome SVD suggested</td></tr> | |
| <tr><td><strong>DeepSeek-V4-Flash</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>43L / 256 experts / 6 active</td><td><a href="https://huggingface.co/Divinci-AI/deepseek-v4-flash-vindex">deepseek-v4-flash-vindex</a></td><td><strong>0.108</strong> (MoE median)</td><td>43-layer all-MoE; 11.54 GB gate_proj binary; first-peak L18 + double-bend profile (distinct from Kimi smooth dome); MXFP4 expert unpacking</td></tr> | |
| <tr><td><strong>DeepSeek-V4-Pro</strong></td><td>MoE MXFP4 (DeepSeek-V4)</td><td>61L / 384 experts / 6 active</td><td><a href="https://huggingface.co/Divinci-AI/deepseek-v4-pro-vindex">deepseek-v4-pro-vindex</a></td><td><strong>0.0653</strong> (MoE median)</td><td>61-layer all-MoE; 42.98 GB gate_proj binary; lowest var@64 of 3 published MoE vindexes (V4-Pro 0.065 < Kimi 0.094 < V4-Flash 0.108) β V4-Pro experts are most shared/redundant; late secondary rise L53βL60</td></tr> | |
| <tr><td><strong>Bonsai 8B</strong></td><td>1-bit (Qwen 3 base, post-quantized)</td><td>8B</td><td><em>vindex pending publish</em></td><td>0.429</td><td><strong>C5 = 1</strong> (circuit dissolved); var@64 = 0.093</td></tr> | |
| <tr><td><strong>BitNet b1.58-2B-4T</strong></td><td>1-bit (Microsoft, native)</td><td>2B</td><td><em>vindex pending publish</em></td><td>(Phase 2 pending)</td><td><strong>var@64 = 0.111</strong> mean across 30 layers β n=2 confirmation of dissolution</td></tr> | |
| </tbody> | |
| </table> | |
| <hr /> | |
| <h2 id="whats-a-vindex">What's a vindex?</h2> | |
| <p>Standard model weights tell you <em>what</em> a model computes. A | |
| vindex tells you <em>where</em> it stores specific knowledge and | |
| <em>which features</em> need to change for a targeted edit.</p> | |
| <p>Concretely: given a query like <code>"Paris β capital"</code>, a | |
| vindex walk returns the layers, feature directions, and token | |
| associations that encode that fact. A patch operation writes a rank-1 ΞW | |
| that suppresses or overwrites that association β compiled back to | |
| standard HuggingFace safetensors for inference.</p> | |
| <p>LarQL (the toolchain that builds vindexes) is open-source: <a | |
| href="https://github.com/chrishayuk/larql">github.com/chrishayuk/larql</a> | |
| | <a | |
| href="https://github.com/Divinci-AI/larql">github.com/Divinci-AI/larql</a>.</p> | |
| <hr /> | |
| <h2 id="research">Research</h2> | |
| <h3 | |
| id="paper-1--architectural-invariants-of-transformer-computation">Paper | |
| 1 β <em>Architectural Invariants of Transformer Computation</em></h3> | |
| <p><em>arXiv preprint forthcoming</em></p> | |
| <p>Five properties measured across every model in this collection. | |
| <strong>Three hold within Β±15% coefficient of variation</strong> across | |
| architectures, organizations, and scales. <strong>One collapses under | |
| 1-bit quantization</strong> β replicated across two independent 1-bit | |
| models from two organizations (n = 2). <strong>One scales monotonically | |
| with model size</strong>.</p> | |
| <p>The headline universal constant β layer temperature C4 β is | |
| reproducible at the <strong>1% precision level</strong>: a three-seed | |
| run on Gemma 4 E2B gives <code>C4 = 0.0407 Β± 0.0004</code>, with | |
| circuit-stage count perfectly stable (<code>C5 = 4 Β± 0</code>) across | |
| all seeds.</p> | |
| <h3 id="paper-2--constellation-edits">Paper 2 β <em>Constellation | |
| Edits</em></h3> | |
| <p><em>draft, arXiv after 3-seed runs + Ξ±-sweep appendix</em></p> | |
| <p>Mechanistic knowledge editing in transformer feature space. Includes | |
| a negative result: why activation-space edits fail in 1-bit models, and | |
| what weight-space geometry reveals about why.</p> | |
| <h3 id="companion-blog-series--the-interpretability-diaries">Companion | |
| blog series β <em>The Interpretability Diaries</em></h3> | |
| <ul> | |
| <li><a | |
| href="https://divinci.ai/blog/architecture-every-llm-converges-to/">Part | |
| I β The Architecture Every Language Model Converges To</a> β five | |
| universal constants, what holds and what doesn't</li> | |
| <li><a | |
| href="https://divinci.ai/blog/deleting-paris-from-a-language-model/">Part | |
| II β Deleting Paris from a Language Model</a> β Gate-3 surgical | |
| knowledge edit with a receipt; rank-1 ΞW that suppresses one fact at | |
| +0.02% perplexity</li> | |
| <li><a href="https://divinci.ai/blog/when-the-circuit-dissolves/">Part | |
| III β When the Circuit Dissolves</a> β two natively-trained 1-bit | |
| models, two organizations, same dissolution: var@64 β 0.10 vs ~0.85 for | |
| fp16</li> | |
| </ul> | |
| <p>Working notebooks: <a | |
| href="https://github.com/Divinci-AI/server/tree/preview/notebooks">github.com/Divinci-AI/server/tree/preview/notebooks</a></p> | |
| <hr /> | |
| <h2 id="working-in-public">Working in public</h2> | |
| <p>Every measurement in our papers traces back to a notebook and a | |
| commit. Negative results ship alongside positive ones β the MLP | |
| compensation mechanism that defeats knowledge editing in 1-bit models is | |
| in the notebooks, not buried in a supplement.</p> | |
| <p>If you replicate a result and find a discrepancy, open an issue on | |
| the LarQL repo.</p> | |
| <hr /> | |
| <p><em>Vindexes on this org are free for academic and research use | |
| (CC-BY-NC 4.0). Commercial licensing: <a | |
| href="mailto:mike@divinci.ai">mike@divinci.ai</a></em></p> | |
| </body> | |
| </html> | |