A newer version of this model is available: enstazao/Qalb-1.0-8B-Instruct

Qalb-Pro: Llama-8B with Engram Sparsity 🧠🇵🇰

Qalb-Pro is the experimental evolution of the original Qalb-1.0-8B-Instruct. While the base Qalb model is already a state-of-the-art Urdu LLM (scoring 90.34 overall), the Pro version attempts to integrate DeepSeek's Engram Architecture to solve the "knowledge eviction" problem in long-form Urdu generation.

🌟 Why this matters for Qalb

Even with 1.97 billion tokens of continued pre-training, standard Transformers (like Llama) lose Urdu-specific n-gram patterns as the context grows. By adding a Conditional Memory Module (Engram), we allow Qalb to:

Offload static knowledge: Common Urdu phrases are retrieved via $O(1)$ lookup.
Preserve Neural FLOPs: The Llama-8B backbone can focus on complex reasoning while the Engram handles vocabulary retrieval.

🛠️ The Architecture

We have merged the Llama-3.1-8B-Instruct weights with a custom Polynomial Rolling Hash Engram Module.

The Gating Mechanism

Instead of forcing the model to "remember" every Urdu word in its weights, we use a gating function: $H_{final} = \sigma(W \cdot H) \odot H + (1 - \sigma(W \cdot H)) \odot M_{engram}$ Where $M_{engram}$ is the Urdu-specific memory retrieved via the rolling hash.

Downloads last month: 5

Model tree for ReySajju742/Qalb-Pro

Base model

unsloth/Meta-Llama-3.1-8B

Finetuned

enstazao/Qalb-1.0-8B-Instruct

Finetuned

(2)

this model