saur7764 commited on
Commit
43b0fee
·
verified ·
1 Parent(s): 2dda36d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +24 -11
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
- title: EAM 100M Agentic Kernel
3
- emoji: 🧠
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: gradio
7
- pinned: false
 
8
  ---
9
 
10
  # 100M Parameter Agentic Model Walkthrough
@@ -26,24 +27,36 @@ To ensure efficiency on consumer hardware (QVAC style), we use **Ternary Weights
26
  - **File**: `model/bitnet.py`
27
  - **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
28
 
29
- ### 4. Reasoning: Tiny Recursive Loop
 
 
 
 
 
 
 
30
  The "agentic" part of the model comes from a recursive inference loop.
31
  - **File**: `agent/recursive_reasoning.py`
32
  - **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
33
 
34
- ### 5. Orchestration: FAMA & AgentScope
35
- - **FAMA (Failure-Aware Meta-Agentic)**: Implements stage-based error detection. When a tool or thought fails, a specialized sub-agent (Stage 2) injects targeted mitigation context into the recursive loop.
36
- - **AgentScope**: Provides the multi-agent messaging layer.
 
37
 
38
- ### 6. Inference Optimization: IndexCache & LiteRT
39
- - **IndexCache**: Reuses top-k sparse attention indices across layers, reducing indexing overhead by ~75%.
40
- - **LiteRT**: Target runtime for efficient 1.58-bit execution on edge devices.
 
41
 
42
  ## 📊 Model Statistics
43
  - **Layers**: 10
44
  - **Embedding Dim**: 640
45
  - **Heads**: 10
46
- - **Total Parameters**: ~94.2M
 
 
 
47
  - **Precision**: 1.58-bit (Ternary)
48
 
49
  ## 🛠️ Usage
 
1
  ---
2
+ title: EAM 100M Agentic Kernel v1.2
3
+ emoji: 🧬
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: gradio
7
+ app_file: hf_app.py
8
+ pinned: true
9
  ---
10
 
11
  # 100M Parameter Agentic Model Walkthrough
 
27
  - **File**: `model/bitnet.py`
28
  - **Efficiency**: This mimics a static sparse matrix where 0s act as pruned connections. It reduces the memory footprint by ~70% compared to FP16.
29
 
30
+ ### 4. Attention: Memory Sparse Attention (MSA) ⭐ NEW
31
+ Replaces the standard causal attention with a triple-mechanism attention layer.
32
+ - **File**: `model/memory_sparse_attention.py`
33
+ - **Mechanism 1 — Persistent Memory Tokens**: Each layer holds `n_memory_tokens=32` learnable `(K, V)` parameter pairs. Every query position attends to these slots without any causal or sparse masking, giving the model a dedicated working-memory scratchpad that persists across positions within a forward pass.
34
+ - **Mechanism 2 — IndexCache Sparse Top-K**: Full layers (even `layer_idx`) compute top-K attention indices over the sequence and cache them. Shared layers (odd `layer_idx`) reuse the cached indices, reducing O(T²) → O(T · sparse_topk). Memory slots are always kept regardless of the sparse mask.
35
+ - **Mechanism 3 — Interleaved Head Attention**: The first half of heads use a local sliding-window mask (`local_window_size=256`); the second half retain unrestricted global access. Memory slots are exempt from this masking too.
36
+
37
+ ### 5. Reasoning: Tiny Recursive Loop
38
  The "agentic" part of the model comes from a recursive inference loop.
39
  - **File**: `agent/recursive_reasoning.py`
40
  - **Process**: The model generates a `<thought>`, critiques it, and refines it up to $N$ times before producing the final answer.
41
 
42
+ ### 7. Teacher: NIM Distillation (N3S) ⭐ NEW
43
+ The model was distilled using **NVIDIA Nemotron-3 Super (N3S)** as a high-fidelity teacher.
44
+ - **Method**: Multi-Token Distillation (MTD) focused on agentic reasoning trajectories.
45
+ - **Alignment**: Alignment-aware distillation ensures the kernel follows workspace safety and grounding protocols.
46
 
47
+ ### 8. Ecosystem: Model Context Protocol (MCP) ⭐ EXPANDED
48
+ Natively orchestrates cloud and local tools via MCP connectors.
49
+ - **Integrations**: Figma (Design), Google Calendar, Notion, Google Sheets/Slides.
50
+ - **Orchestration**: The recursive loop manages authentication signals and tool execution results.
51
 
52
  ## 📊 Model Statistics
53
  - **Layers**: 10
54
  - **Embedding Dim**: 640
55
  - **Heads**: 10
56
+ - **Memory Slots / Layer**: 32 (K+V, persistent, learnable)
57
+ - **Sparse Top-K**: 128 tokens per head (IndexCache)
58
+ - **Local Window**: 256 tokens (Interleaved Attention)
59
+ - **Total Parameters**: ~94.9M (includes memory K/V params)
60
  - **Precision**: 1.58-bit (Ternary)
61
 
62
  ## 🛠️ Usage