Update README.md
Browse files
README.md
CHANGED
|
@@ -1,34 +1,3 @@
|
|
| 1 |
-
# LFM-2.5-1.2B-Compact-Reservoir
|
| 2 |
-
|
| 3 |
-
**Tags:** triton, svd-compression, spiking-neuron, liquid-foundation-model, low-rank-adaptation, efficient-inference
|
| 4 |
-
**Library:** transformers
|
| 5 |
-
**Pipeline:** text-generation
|
| 6 |
-
|
| 7 |
-
Replaces MLP layers with custom `CompactReservoirFFN`: SVD compression (rank_ratio=0.25), Triton kernels, spiking neuron activation.
|
| 8 |
-
|
| 9 |
-
## ✨ Key Features
|
| 10 |
-
- **SVD Compression**: MLP weights reduced 4x, preserves performance.
|
| 11 |
-
- **Triton Kernels**: `fast_einsum_gating` for dynamic merging, low memory/latency.
|
| 12 |
-
- **Spiking Neurons**: Membrane potential
|
| 13 |
-
|
| 14 |
-
\[
|
| 15 |
-
V_m[t] = \tau V_m[t-1] + (1-\tau)\text{Mean}(Y)
|
| 16 |
-
\]
|
| 17 |
-
|
| 18 |
-
spike if \(V_m > 0.5\).
|
| 19 |
-
- **Context Gating**:
|
| 20 |
-
|
| 21 |
-
\[
|
| 22 |
-
G = \text{Softmax}(\text{Gate}(X + \text{ContextBias})), \quad W_{\text{merged}} = \sum G_k \cdot W_{\text{svd}_k}
|
| 23 |
-
\]
|
| 24 |
-
|
| 25 |
-
## 📊 Benchmarks
|
| 26 |
-
20 prompts: reasoning / coding / writing
|
| 27 |
-
|
| 28 |
-
| Model | Latency | Memory (MLP) |
|
| 29 |
-
|------------------------|---------|--------------|
|
| 30 |
-
| Original LFM-2.5-1.2B | 1.00x | 100% |
|
| 31 |
-
| Compact-Reservoir | 0.85x | 75% |
|
| 32 |
|
| 33 |
## 🚀 Quick Start
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
|
| 2 |
## 🚀 Quick Start
|
| 3 |
|