Llama-3-Karnak-70B-v1.0
Llama-3-Karnak-70B-v1.0 is an Arabic–English causal language model built with Meta Llama 3 70B Instruct and further adapted for bilingual generation, instruction following, and Arabic-focused use cases.
Karnak is designed to provide strong Arabic and English responses for tasks such as question answering, explanation, summarization, content generation, research assistance, and general-purpose dialogue. The model is intended for local or private deployment using common inference frameworks such as Transformers and vLLM.
Built with Meta Llama 3.
Model Summary
Llama-3-Karnak-70B-v1.0 is a 70B-parameter autoregressive transformer model optimized for Arabic and English text generation.
The model builds on the Llama 3 70B Instruct architecture and was further improved through a multi-stage adaptation pipeline focused on:
- Arabic and English instruction following
- High-quality bilingual generation
- Arabic fluency and style
- Robust response formatting
- General assistant-style behavior
- Compatibility with standard Llama/Transformers/vLLM deployment tools
Key Features
Arabic–English Generation
Supports Arabic and English prompts, with an emphasis on producing fluent, useful Arabic responses.Instruction Following
Adapted to follow user instructions across general QA, explanation, writing, summarization, and reasoning-style tasks.Llama 3 70B Foundation
Built on top of Meta Llama 3.3 70B Instruct, enabling compatibility with the broader Llama ecosystem.Production-Friendly Inference
Compatible with Hugging Face Transformers and vLLM for local and server-based deployment.Local Deployment
Suitable for private infrastructure where organizations need control over data, inference, and fine-tuning workflows.Arabic-Optimized Tokenizer: Improved Arabic tokenization efficiency, resulting in reduced token fragmentation and higher-quality generation.
Model Details
| Field | Value |
|---|---|
| Model name | Llama-3-Karnak-70B-v1.0 |
| Base model | meta-llama/Meta-Llama-3-70B-Instruct |
| Architecture | Llama 3 causal language model |
| Parameters | 70B |
| Languages | Arabic, English |
| Task | Text generation / chat completion |
| Training type | Continued adaptation and supervised fine-tuning |
| Inference frameworks | Transformers, vLLM |
| License | Meta Llama 3 Community License |
Usage
1. Install Dependencies
pip install -U "transformers>=4.40.0" torch accelerate sentencepiece
For large-model inference, you may also need:
pip install -U bitsandbytes
2. Hugging Face Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
prompt = "اشرح لي نظرية النسبية بشكل مبسط."
messages = [
{"role": "system", "content": "You are a helpful bilingual Arabic-English assistant."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
generated_ids = generated_ids[:, model_inputs.input_ids.shape[1]:]
response = tokenizer.batch_decode(
generated_ids,
skip_special_tokens=True,
)[0]
print(response)
3. vLLM Inference
vLLM is recommended for high-throughput inference.
Install vLLM
pip install -U vllm
Offline Inference
from vllm import LLM, SamplingParams
model_id = "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0"
llm = LLM(
model=model_id,
tensor_parallel_size=4,
dtype="bfloat16",
)
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512,
)
prompts = [
"ما هي عاصمة مصر؟",
"Explain the difference between supervised and unsupervised learning.",
]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print("Prompt:", output.prompt)
print("Generated:", output.outputs[0].text)
print("-" * 80)
4. vLLM Server Mode
You can serve the model using the OpenAI-compatible vLLM API.
vllm serve "Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0" \
--tensor-parallel-size 4 \
--dtype bfloat16 \
--host 0.0.0.0 \
--port 8000
Then call the server:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
)
response = client.chat.completions.create(
model="Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0",
messages=[
{"role": "system", "content": "You are a helpful bilingual Arabic-English assistant."},
{"role": "user", "content": "اكتب فقرة قصيرة عن أهمية اللغة العربية في البحث العلمي."},
],
temperature=0.7,
top_p=0.9,
max_tokens=512,
)
print(response.choices[0].message.content)
Recommended Generation Settings
A general starting point:
temperature = 0.7
top_p = 0.9
max_new_tokens = 512
For more deterministic outputs:
temperature = 0.2
top_p = 0.8
For creative writing:
temperature = 0.8
top_p = 0.95
License
This model is built with Meta Llama 3 and is released under the terms of the Meta Llama 3 Community License.
Users must comply with:
- The Meta Llama 3 Community License
- The Meta Llama 3 Acceptable Use Policy
- Any applicable laws and regulations
This model is not released under Apache-2.0 because it is derived from Meta Llama 3.
Attribution
Built with Meta Llama 3.
Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
Citation
If you use this model in research or applications, please cite:
@misc{karnak_70b_llama_2026,
title = {Llama-3-Karnak-70B-v1.0: An Arabic-English Large Language Model Built with Meta Llama 3},
author = {{Applied Innovation Center}},
year = {2026},
howpublished = {\url{https://huggingface.co/Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0}},
note = {Built with Meta Llama 3}
}
Contact
For questions, feedback, or collaboration requests, please contact the Applied Innovation Center or open a discussion on the model repository.
- Downloads last month
- -
Model tree for Applied-Innovation-Center/Karnak-70B-LLAMA-v1.0
Base model
meta-llama/Meta-Llama-3-70B