Spaces:
Sleeping
Sleeping
| title: EAM 100M Agentic Kernel v1.2 | |
| emoji: 𧬠| |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: hf_app.py | |
| pinned: true | |
| # 100M Parameter Agentic Model Walkthrough | |
| This project implements a state-of-the-art agentic model architecture by synthesizing multiple advanced frameworks. | |
| ## π Architectural Stack | |
| ### 1. Core: nanoGPT | |
| We use a minimalist Transformer architecture based on Karpathy's `nanoGPT`. It provides the foundational attention and MLP blocks but has been heavily modified for agentic performance. | |
| ### 2. Residuals: AttenRes (Attention Residuals) | |
| Instead of standard additive residuals (`x + f(x)`), we implement **Attention Residuals**. | |
| - **File**: `model/attenres.py` | |
| - **Logic**: Each layer performs a dynamic retrieval (attention) over all previous layer outputs. This prevents information dilution and allows deeper reasoning. | |
| ### 3. Weights: BitNet 1.58b (QVAC Fabric / Static Sparse) | |
| To ensure efficiency on consumer hardware (QVAC style), we use **Ternary Weights** ({-1, 0, 1}). | |
| - **File**: `model/bitnet.py` | |
| - **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16. | |
| ### 4. Attention: Memory Sparse Attention (MSA) β NEW | |
| Replaces the standard causal attention with a triple-mechanism attention layer. | |
| - **File**: `model/memory_sparse_attention.py` | |
| - **Mechanism 1 β Persistent Memory Tokens**: Each layer holds `n_memory_tokens=32` learnable `(K, V)` parameter pairs. Every query position attends to these slots without any causal or sparse masking, giving the model a dedicated working-memory scratchpad that persists across positions within a forward pass. | |
| - **Mechanism 2 β IndexCache Sparse Top-K**: Full layers (even `layer_idx`) compute top-K attention indices over the sequence and cache them. Shared layers (odd `layer_idx`) reuse the cached indices, reducing O(TΒ²) β O(T Β· sparse_topk). Memory slots are always kept regardless of the sparse mask. | |
| - **Mechanism 3 β Interleaved Head Attention**: The first half of heads use a local sliding-window mask (`local_window_size=256`); the second half retain unrestricted global access. Memory slots are exempt from this masking too. | |
| ### 5. Reasoning: Tiny Recursive Loop | |
| The "agentic" part of the model comes from a recursive inference loop. | |
| - **File**: `agent/recursive_reasoning.py` | |
| - **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer. | |
| ### 7. Teacher: NIM Distillation (N3S) β NEW | |
| The model was distilled using **NVIDIA Nemotron-3 Super (N3S)** as a high-fidelity teacher. | |
| - **Method**: Multi-Token Distillation (MTD) focused on agentic reasoning trajectories. | |
| - **Alignment**: Alignment-aware distillation ensures the kernel follows workspace safety and grounding protocols. | |
| ### 8. Ecosystem: Model Context Protocol (MCP) β EXPANDED | |
| Natively orchestrates cloud and local tools via MCP connectors. | |
| - **Integrations**: Figma (Design), Google Calendar, Notion, Google Sheets/Slides. | |
| - **Orchestration**: The recursive loop manages authentication signals and tool execution results. | |
| ## π Model Statistics | |
| - **Layers**: 10 | |
| - **Embedding Dim**: 640 | |
| - **Heads**: 10 | |
| - **Memory Slots / Layer**: 32 (K+V, persistent, learnable) | |
| - **Sparse Top-K**: 128 tokens per head (IndexCache) | |
| - **Local Window**: 256 tokens (Interleaved Attention) | |
| - **Total Parameters**: ~94.9M (includes memory K/V params) | |
| - **Precision**: 1.58-bit (Ternary) | |
| ## π οΈ Usage | |
| To view the architecture and verify parameters: | |
| ```bash | |
| python main.py | |
| ``` | |
| *(Requires `torch` installed)* | |