Spaces:
Sleeping
Sleeping
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,10 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
title: EAM 100M Agentic Kernel
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
-
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
# 100M Parameter Agentic Model Walkthrough
|
|
@@ -26,24 +27,36 @@ To ensure efficiency on consumer hardware (QVAC style), we use **Ternary Weights
|
|
| 26 |
- **File**: `model/bitnet.py`
|
| 27 |
- **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
|
| 28 |
|
| 29 |
-
### 4.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
The "agentic" part of the model comes from a recursive inference loop.
|
| 31 |
- **File**: `agent/recursive_reasoning.py`
|
| 32 |
- **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
|
| 33 |
|
| 34 |
-
###
|
| 35 |
-
|
| 36 |
-
- **
|
|
|
|
| 37 |
|
| 38 |
-
###
|
| 39 |
-
|
| 40 |
-
- **
|
|
|
|
| 41 |
|
| 42 |
## 📊 Model Statistics
|
| 43 |
- **Layers**: 10
|
| 44 |
- **Embedding Dim**: 640
|
| 45 |
- **Heads**: 10
|
| 46 |
-
- **
|
|
|
|
|
|
|
|
|
|
| 47 |
- **Precision**: 1.58-bit (Ternary)
|
| 48 |
|
| 49 |
## 🛠️ Usage
|
|
|
|
| 1 |
---
|
| 2 |
+
title: EAM 100M Agentic Kernel v1.2
|
| 3 |
+
emoji: 🧬
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
+
app_file: hf_app.py
|
| 8 |
+
pinned: true
|
| 9 |
---
|
| 10 |
|
| 11 |
# 100M Parameter Agentic Model Walkthrough
|
|
|
|
| 27 |
- **File**: `model/bitnet.py`
|
| 28 |
- **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
|
| 29 |
|
| 30 |
+
### 4. Attention: Memory Sparse Attention (MSA) ⭐ NEW
|
| 31 |
+
Replaces the standard causal attention with a triple-mechanism attention layer.
|
| 32 |
+
- **File**: `model/memory_sparse_attention.py`
|
| 33 |
+
- **Mechanism 1 — Persistent Memory Tokens**: Each layer holds `n_memory_tokens=32` learnable `(K, V)` parameter pairs. Every query position attends to these slots without any causal or sparse masking, giving the model a dedicated working-memory scratchpad that persists across positions within a forward pass.
|
| 34 |
+
- **Mechanism 2 — IndexCache Sparse Top-K**: Full layers (even `layer_idx`) compute top-K attention indices over the sequence and cache them. Shared layers (odd `layer_idx`) reuse the cached indices, reducing O(T²) → O(T · sparse_topk). Memory slots are always kept regardless of the sparse mask.
|
| 35 |
+
- **Mechanism 3 — Interleaved Head Attention**: The first half of heads use a local sliding-window mask (`local_window_size=256`); the second half retain unrestricted global access. Memory slots are exempt from this masking too.
|
| 36 |
+
|
| 37 |
+
### 5. Reasoning: Tiny Recursive Loop
|
| 38 |
The "agentic" part of the model comes from a recursive inference loop.
|
| 39 |
- **File**: `agent/recursive_reasoning.py`
|
| 40 |
- **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
|
| 41 |
|
| 42 |
+
### 7. Teacher: NIM Distillation (N3S) ⭐ NEW
|
| 43 |
+
The model was distilled using **NVIDIA Nemotron-3 Super (N3S)** as a high-fidelity teacher.
|
| 44 |
+
- **Method**: Multi-Token Distillation (MTD) focused on agentic reasoning trajectories.
|
| 45 |
+
- **Alignment**: Alignment-aware distillation ensures the kernel follows workspace safety and grounding protocols.
|
| 46 |
|
| 47 |
+
### 8. Ecosystem: Model Context Protocol (MCP) ⭐ EXPANDED
|
| 48 |
+
Natively orchestrates cloud and local tools via MCP connectors.
|
| 49 |
+
- **Integrations**: Figma (Design), Google Calendar, Notion, Google Sheets/Slides.
|
| 50 |
+
- **Orchestration**: The recursive loop manages authentication signals and tool execution results.
|
| 51 |
|
| 52 |
## 📊 Model Statistics
|
| 53 |
- **Layers**: 10
|
| 54 |
- **Embedding Dim**: 640
|
| 55 |
- **Heads**: 10
|
| 56 |
+
- **Memory Slots / Layer**: 32 (K+V, persistent, learnable)
|
| 57 |
+
- **Sparse Top-K**: 128 tokens per head (IndexCache)
|
| 58 |
+
- **Local Window**: 256 tokens (Interleaved Attention)
|
| 59 |
+
- **Total Parameters**: ~94.9M (includes memory K/V params)
|
| 60 |
- **Precision**: 1.58-bit (Ternary)
|
| 61 |
|
| 62 |
## 🛠️ Usage
|