| | --- |
| | license: apache-2.0 |
| | language: |
| | - ar |
| | - en |
| | pipeline_tag: text-generation |
| | tags: |
| | - text-generation |
| | - pytorch |
| | - transformers |
| | - vllm |
| | - causal-lm |
| | - depth-extension |
| | - arabic |
| | - english |
| | - karnak |
| | - qwen |
| | base_model: Qwen/Qwen3-30B-A3B-Instruct-2507 |
| | model_name: Karnak |
| | parameters: 40B |
| | inference: false |
| | --- |
| | |
| | # Karnak: Enhanced Arabic–English Large Language Model |
| |
|
| | ## Model Summary |
| |
|
| | **Karnak** is a depth-extended causal language model optimized for **Arabic and English** generation. It is built on top of **Qwen/Qwen3-30B-A3B-Instruct-2507**, featuring architectural depth extension and a tokenizer specifically optimized for Arabic to improve fluency and efficiency. |
| |
|
| | Karnak was trained using **high-quality, filtered data** through a rigorous pipeline to enhance overall instruction-following capabilities, factuality, and robustness. |
| |
|
| | ## Key Features |
| |
|
| | - **Depth Extension (~40B):** Expanded depth to increase reasoning capacity and improve long-range dependency modeling. |
| | - **Arabic-Optimized Tokenizer:** Improved Arabic tokenization efficiency, resulting in reduced token fragmentation and higher-quality generation. |
| | - **Multi-Stage Training:** The model evolved through: Pre-trained weights → Depth Extension → Continued Pre-training → SFT (Supervised Fine-Tuning). |
| | - **Extended Context Window:** Designed for long-context usage with a **safe context range up to 20K tokens** (recommended to stay within this limit for optimal stability). |
| |
|
| | ## Model Details |
| |
|
| | - **Model Name:** Karnak |
| | - **Base Model:** Qwen/Qwen3-30B-A3B-Instruct-2507 |
| | - **Parameter Count:** ~40B (Depth-Extended) |
| | - **Languages:** Arabic, English |
| | - **Training:** High-quality filtered data + Multi-stage pipeline (Continued pre-training + SFT) |
| | - **Safe Context Range:** Up to **20,000 tokens** |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | ### 1) Hugging Face Transformers |
| |
|
| | To use Karnak with the standard Transformers library, ensure you have the latest version installed. |
| |
|
| | ```bash |
| | pip install -U "transformers>=4.40.0" torch accelerate |
| | ``` |
| |
|
| | Python Code Example (Chat Template): |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_id = "Applied-Innovation-Center/Karnak" |
| | |
| | # Load tokenizer and model |
| | tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | device_map="auto", |
| | torch_dtype=torch.bfloat16, |
| | trust_remote_code=True, |
| | ) |
| | |
| | # Prepare Input |
| | prompt = "اشرح لي نظرية النسبية بشكل مبسط." |
| | messages = [ |
| | {"role": "system", "content": "You are a helpful assistant."}, |
| | {"role": "user", "content": prompt}, |
| | ] |
| | |
| | # Apply chat template |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True, |
| | ) |
| | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| | |
| | # Generate |
| | generated_ids = model.generate( |
| | **model_inputs, |
| | max_new_tokens=512, |
| | temperature=0.7, |
| | top_p=0.9, |
| | ) |
| | |
| | # Decode output (removing the prompt tokens) |
| | generated_ids = generated_ids[:, model_inputs.input_ids.shape[1]:] |
| | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | print(response) |
| | ``` |
| |
|
| | 2) vLLM (Recommended for Production) |
| | Karnak is compatible with vLLM for high-throughput inference. |
| |
|
| | Installation: |
| |
|
| | ```bash |
| | |
| | pip install -U vllm |
| | ``` |
| |
|
| | Offline Inference: |
| |
|
| | ```python |
| | |
| | from vllm import LLM, SamplingParams |
| | |
| | model_id = "Applied-Innovation-Center/Karnak" |
| | |
| | # Initialize the model |
| | llm = LLM( |
| | model=model_id, |
| | trust_remote_code=True, |
| | max_model_len=20000, # Safe context range |
| | tensor_parallel_size=1, # Adjust based on available GPUs |
| | ) |
| | |
| | # Set sampling parameters |
| | sampling_params = SamplingParams( |
| | temperature=0.7, |
| | top_p=0.9, |
| | max_tokens=512, |
| | ) |
| | |
| | # Generate |
| | prompts = ["ما هي عاصمة مصر؟"] |
| | outputs = llm.generate(prompts, sampling_params) |
| | |
| | for o in outputs: |
| | print(f"Prompt: {o.prompt}") |
| | print(f"Generated: {o.outputs[0].text}") |
| | ``` |
| |
|
| | Server Mode (OpenAI-Compatible API): |
| |
|
| | You can serve the model as an API compatible with OpenAI clients: |
| |
|
| | ```bash |
| | |
| | vllm serve "Applied-Innovation-Center/Karnak" \ |
| | --trust-remote-code \ |
| | --dtype bfloat16 \ |
| | --port 8000 |
| | ``` |
| |
|
| | Citation |
| | If you use this model in your research or application, please cite it as follows: |
| |
|
| | ```bibtex |
| | @misc{karnak-40b, |
| | title={Karnak: A Depth-Extended Arabic-English LLM}, |
| | year={2026}, |
| | publisher={Applied Innovation Center}, |
| | howpublished={\url{[https://huggingface.co/Applied-Innovation-Center/Karnak](https://huggingface.co/Applied-Innovation-Center/Karnak)}} |
| | } |
| | ``` |