Qalb-Pro: Llama-8B with Engram Sparsity π§ π΅π°
Qalb-Pro is the experimental evolution of the original Qalb-1.0-8B-Instruct. While the base Qalb model is already a state-of-the-art Urdu LLM (scoring 90.34 overall), the Pro version attempts to integrate DeepSeek's Engram Architecture to solve the "knowledge eviction" problem in long-form Urdu generation.
π Why this matters for Qalb
Even with 1.97 billion tokens of continued pre-training, standard Transformers (like Llama) lose Urdu-specific n-gram patterns as the context grows. By adding a Conditional Memory Module (Engram), we allow Qalb to:
- Offload static knowledge: Common Urdu phrases are retrieved via $O(1)$ lookup.
- Preserve Neural FLOPs: The Llama-8B backbone can focus on complex reasoning while the Engram handles vocabulary retrieval.
π οΈ The Architecture
We have merged the Llama-3.1-8B-Instruct weights with a custom Polynomial Rolling Hash Engram Module.
The Gating Mechanism
Instead of forcing the model to "remember" every Urdu word in its weights, we use a gating function: Where $M_{engram}$ is the Urdu-specific memory retrieved via the rolling hash.
- Downloads last month
- 17
Model tree for ReySajju742/Qalb-Pro
Base model
unsloth/Meta-Llama-3.1-8B