Llama-3-Karnak-70B-v1.0

Llama-3-Karnak-70B-v1.0 is an Arabic–English causal language model built with Meta Llama 3 70B Instruct and further adapted for bilingual generation, instruction following, and Arabic-focused use cases.

Karnak is designed to provide strong Arabic and English responses for tasks such as question answering, explanation, summarization, content generation, research assistance, and general-purpose dialogue. The model is intended for local or private deployment using common inference frameworks such as Transformers and vLLM.

Built with Meta Llama 3.


Model Summary

Llama-3-Karnak-70B-v1.0 is a 70B-parameter autoregressive transformer model optimized for Arabic and English text generation.

The model builds on the Llama 3 70B Instruct architecture and was further improved through a multi-stage adaptation pipeline focused on:

  • Arabic and English instruction following
  • High-quality bilingual generation
  • Arabic fluency and style
  • Robust response formatting
  • General assistant-style behavior
  • Compatibility with standard Llama/Transformers/vLLM deployment tools

Key Features

  • Arabic–English Generation
    Supports Arabic and English prompts, with an emphasis on producing fluent, useful Arabic responses.

  • Instruction Following
    Adapted to follow user instructions across general QA, explanation, writing, summarization, and reasoning-style tasks.

  • Llama 3 70B Foundation
    Built on top of Meta Llama 3.3 70B Instruct, enabling compatibility with the broader Llama ecosystem.

  • Production-Friendly Inference
    Compatible with Hugging Face Transformers and vLLM for local and server-based deployment.

  • Local Deployment
    Suitable for private infrastructure where organizations need control over data, inference, and fine-tuning workflows.

  • Arabic-Optimized Tokenizer: Improved Arabic tokenization efficiency, resulting in reduced token fragmentation and higher-quality generation.


Model Details

Field Value
Model name Llama-3-Karnak-70B-v1.0
Base model meta-llama/Meta-Llama-3-70B-Instruct
Architecture Llama 3 causal language model
Parameters 70B
Languages Arabic, English
Task Text generation / chat completion
Training type Continued adaptation and supervised fine-tuning
Inference frameworks Transformers, vLLM
License Meta Llama 3 Community License

Usage

1. Install Dependencies

pip install -U "transformers>=4.40.0" torch accelerate sentencepiece

For large-model inference, you may also need:

pip install -U bitsandbytes

2. Hugging Face Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

prompt = "اشرح لي نظرية النسبية بشكل مبسط."

messages = [
    {"role": "system", "content": "You are a helpful bilingual Arabic-English assistant."},
    {"role": "user", "content": prompt},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

generated_ids = generated_ids[:, model_inputs.input_ids.shape[1]:]

response = tokenizer.batch_decode(
    generated_ids,
    skip_special_tokens=True,
)[0]

print(response)

3. vLLM Inference

vLLM is recommended for high-throughput inference.

Install vLLM

pip install -U vllm

Offline Inference

from vllm import LLM, SamplingParams

model_id = "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0"

llm = LLM(
    model=model_id,
    tensor_parallel_size=4,
    dtype="bfloat16",
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512,
)

prompts = [
    "ما هي عاصمة مصر؟",
    "Explain the difference between supervised and unsupervised learning.",
]

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print("Prompt:", output.prompt)
    print("Generated:", output.outputs[0].text)
    print("-" * 80)

4. vLLM Server Mode

You can serve the model using the OpenAI-compatible vLLM API.

vllm serve "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0" \
  --tensor-parallel-size 4 \
  --dtype bfloat16 \
  --host 0.0.0.0 \
  --port 8000

Then call the server:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
)

response = client.chat.completions.create(
    model="Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0",
    messages=[
        {"role": "system", "content": "You are a helpful bilingual Arabic-English assistant."},
        {"role": "user", "content": "اكتب فقرة قصيرة عن أهمية اللغة العربية في البحث العلمي."},
    ],
    temperature=0.7,
    top_p=0.9,
    max_tokens=512,
)

print(response.choices[0].message.content)

Recommended Generation Settings

A general starting point:

temperature = 0.7
top_p = 0.9
max_new_tokens = 512

For more deterministic outputs:

temperature = 0.2
top_p = 0.8

For creative writing:

temperature = 0.8
top_p = 0.95

License

This model is built with Meta Llama 3 and is released under the terms of the Meta Llama 3 Community License.

Users must comply with:

  • The Meta Llama 3 Community License
  • The Meta Llama 3 Acceptable Use Policy
  • Any applicable laws and regulations

This model is not released under Apache-2.0 because it is derived from Meta Llama 3.


Attribution

Built with Meta Llama 3.

Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.


Citation

If you use this model in research or applications, please cite:

@misc{karnak_70b_llama_2026,
  title        = {Llama-3-Karnak-70B-v1.0: An Arabic-English Large Language Model Built with Meta Llama 3},
  author       = {{Applied Innovation Center}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0}},
  note         = {Built with Meta Llama 3}
}

Contact

For questions, feedback, or collaboration requests, please contact the Applied Innovation Center or open a discussion on the model repository.

Downloads last month
-
Safetensors
Model size
71B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0

Finetuned
(49)
this model

Collection including Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0