Spaces:

saur7764
/

EAM-100M-Agentic-Kernel

Sleeping

App Files Files Community

saur7764 commited on 23 days ago

Commit

43b0fee

verified ·

1 Parent(s): 2dda36d

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +24 -11

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
-title: EAM 100M Agentic Kernel
-emoji: 🧠
 colorFrom: blue
 colorTo: indigo
 sdk: gradio
-pinned: false
 ---
 # 100M Parameter Agentic Model Walkthrough
@@ -26,24 +27,36 @@ To ensure efficiency on consumer hardware (QVAC style), we use **Ternary Weights
 - **File**: `model/bitnet.py`
 - **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
-### 4. Reasoning: Tiny Recursive Loop
 The "agentic" part of the model comes from a recursive inference loop.
 - **File**: `agent/recursive_reasoning.py`
 - **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
-### 5. Orchestration: FAMA & AgentScope
-- **FAMA (Failure-Aware Meta-Agentic)**: Implements stage-based error detection. When a tool or thought fails, a specialized sub-agent (Stage 2) injects targeted mitigation context into the recursive loop.
-- **AgentScope**: Provides the multi-agent messaging layer.
-### 6. Inference Optimization: IndexCache & LiteRT
-- **IndexCache**: Reuses top-k sparse attention indices across layers, reducing indexing overhead by ~75%.
-- **LiteRT**: Target runtime for efficient 1.58-bit execution on edge devices.
 ## 📊 Model Statistics
 - **Layers**: 10
 - **Embedding Dim**: 640
 - **Heads**: 10
-- **Total Parameters**: ~94.2M
 - **Precision**: 1.58-bit (Ternary)
 ## 🛠️ Usage

 ---
+title: EAM 100M Agentic Kernel v1.2
+emoji: 🧬
 colorFrom: blue
 colorTo: indigo
 sdk: gradio
+app_file: hf_app.py
+pinned: true
 ---
 # 100M Parameter Agentic Model Walkthrough
 - **File**: `model/bitnet.py`
 - **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
+### 4. Attention: Memory Sparse Attention (MSA) ⭐ NEW
+Replaces the standard causal attention with a triple-mechanism attention layer.
+- **File**: `model/memory_sparse_attention.py`
+- **Mechanism 1 — Persistent Memory Tokens**: Each layer holds `n_memory_tokens=32` learnable `(K, V)` parameter pairs. Every query position attends to these slots without any causal or sparse masking, giving the model a dedicated working-memory scratchpad that persists across positions within a forward pass.
+- **Mechanism 2 — IndexCache Sparse Top-K**: Full layers (even `layer_idx`) compute top-K attention indices over the sequence and cache them. Shared layers (odd `layer_idx`) reuse the cached indices, reducing O(T²) → O(T · sparse_topk). Memory slots are always kept regardless of the sparse mask.
+- **Mechanism 3 — Interleaved Head Attention**: The first half of heads use a local sliding-window mask (`local_window_size=256`); the second half retain unrestricted global access. Memory slots are exempt from this masking too.
+### 5. Reasoning: Tiny Recursive Loop
 The "agentic" part of the model comes from a recursive inference loop.
 - **File**: `agent/recursive_reasoning.py`
 - **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
+### 7. Teacher: NIM Distillation (N3S) ⭐ NEW
+The model was distilled using **NVIDIA Nemotron-3 Super (N3S)** as a high-fidelity teacher.
+- **Method**: Multi-Token Distillation (MTD) focused on agentic reasoning trajectories.
+- **Alignment**: Alignment-aware distillation ensures the kernel follows workspace safety and grounding protocols.
+### 8. Ecosystem: Model Context Protocol (MCP) ⭐ EXPANDED
+Natively orchestrates cloud and local tools via MCP connectors.
+- **Integrations**: Figma (Design), Google Calendar, Notion, Google Sheets/Slides.
+- **Orchestration**: The recursive loop manages authentication signals and tool execution results.
 ## 📊 Model Statistics
 - **Layers**: 10
 - **Embedding Dim**: 640
 - **Heads**: 10
+- **Memory Slots / Layer**: 32 (K+V, persistent, learnable)
+- **Sparse Top-K**: 128 tokens per head (IndexCache)
+- **Local Window**: 256 tokens (Interleaved Attention)
+- **Total Parameters**: ~94.9M (includes memory K/V params)
 - **Precision**: 1.58-bit (Ternary)
 ## 🛠️ Usage