A newer version of this model is available: enstazao/Qalb-1.0-8B-Instruct

Qalb-Pro: Llama-8B with Engram Sparsity πŸ§ πŸ‡΅πŸ‡°

Qalb-Pro is the experimental evolution of the original Qalb-1.0-8B-Instruct. While the base Qalb model is already a state-of-the-art Urdu LLM (scoring 90.34 overall), the Pro version attempts to integrate DeepSeek's Engram Architecture to solve the "knowledge eviction" problem in long-form Urdu generation.

🌟 Why this matters for Qalb

Even with 1.97 billion tokens of continued pre-training, standard Transformers (like Llama) lose Urdu-specific n-gram patterns as the context grows. By adding a Conditional Memory Module (Engram), we allow Qalb to:

  1. Offload static knowledge: Common Urdu phrases are retrieved via $O(1)$ lookup.
  2. Preserve Neural FLOPs: The Llama-8B backbone can focus on complex reasoning while the Engram handles vocabulary retrieval.

πŸ› οΈ The Architecture

We have merged the Llama-3.1-8B-Instruct weights with a custom Polynomial Rolling Hash Engram Module.

The Gating Mechanism

Instead of forcing the model to "remember" every Urdu word in its weights, we use a gating function: Hfinal=Οƒ(Wβ‹…H)βŠ™H+(1βˆ’Οƒ(Wβ‹…H))βŠ™MengramH_{final} = \sigma(W \cdot H) \odot H + (1 - \sigma(W \cdot H)) \odot M_{engram} Where $M_{engram}$ is the Urdu-specific memory retrieved via the rolling hash.

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ReySajju742/Qalb-Pro

Finetuned
(1)
this model