Instructions to use wesam3/qu-llm-assistant-allam with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use wesam3/qu-llm-assistant-allam with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="wesam3/qu-llm-assistant-allam", filename="ALLaM-7B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use wesam3/qu-llm-assistant-allam with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf wesam3/qu-llm-assistant-allam:Q4_K_M # Run inference directly in the terminal: llama-cli -hf wesam3/qu-llm-assistant-allam:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf wesam3/qu-llm-assistant-allam:Q4_K_M # Run inference directly in the terminal: llama-cli -hf wesam3/qu-llm-assistant-allam:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf wesam3/qu-llm-assistant-allam:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf wesam3/qu-llm-assistant-allam:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf wesam3/qu-llm-assistant-allam:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf wesam3/qu-llm-assistant-allam:Q4_K_M
Use Docker
docker model run hf.co/wesam3/qu-llm-assistant-allam:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use wesam3/qu-llm-assistant-allam with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wesam3/qu-llm-assistant-allam" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wesam3/qu-llm-assistant-allam", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/wesam3/qu-llm-assistant-allam:Q4_K_M
- Ollama
How to use wesam3/qu-llm-assistant-allam with Ollama:
ollama run hf.co/wesam3/qu-llm-assistant-allam:Q4_K_M
- Unsloth Studio new
How to use wesam3/qu-llm-assistant-allam with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for wesam3/qu-llm-assistant-allam to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for wesam3/qu-llm-assistant-allam to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for wesam3/qu-llm-assistant-allam to start chatting
- Docker Model Runner
How to use wesam3/qu-llm-assistant-allam with Docker Model Runner:
docker model run hf.co/wesam3/qu-llm-assistant-allam:Q4_K_M
- Lemonade
How to use wesam3/qu-llm-assistant-allam with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull wesam3/qu-llm-assistant-allam:Q4_K_M
Run and chat with the model
lemonade run user.qu-llm-assistant-allam-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)QU-LLM Assistant (ALLaM) — wesam3/qu-llm-assistant-allam
A fine-tuned ALLaM-7B-Instruct model specialised in answering student advisory questions for Qassim University (جامعة القصيم). ALLaM is an Arabic-first large language model developed by SDAIA (Saudi Data and Artificial Intelligence Authority) and is particularly strong on Arabic-language tasks.
The model is distributed as a Q4_K_M GGUF file for efficient local inference with llama.cpp / Ollama.
Model Details
| Property | Value |
|---|---|
| Base model | sdaia/ALLaM-7B-Instruct |
| Developed by (base) | SDAIA — Saudi Data and Artificial Intelligence Authority |
| Fine-tuning method | LoRA (MLX framework) |
| Quantisation | Q4_K_M (GGUF) |
| File size | ~4.3 GB |
| Language | Arabic-native, English-secondary |
| License | Apache-2.0 |
| Domain | University student advisory (academic rules, registration, scholarships, etc.) |
Training Data
- Dataset: 12,320 curated Q&A pairs filtered exclusively from 2024–2026 Qassim University documents — the most current bylaws, handbooks, and policy circulars.
- Source: Publicly available PDFs and web pages from qu.edu.sa.
- Language: Predominantly Arabic.
- Rationale for filtering: Focusing on 2024-2026 documents ensures the model reflects the latest university regulations and avoids outdated policies.
Usage
Ollama (recommended)
ollama run wesamhamad/qu-llm-assistant-allam
Or pull the GGUF manually and create a local Modelfile:
FROM ./ALLaM-7B-Q4_K_M.gguf
SYSTEM "أنت مساعد ذكاء اصطناعي متخصص في الإجابة على استفسارات طلاب جامعة القصيم."
ollama create qu-llm-assistant-allam -f Modelfile
ollama run qu-llm-assistant-allam
llama.cpp
llama-cli -m ALLaM-7B-Q4_K_M.gguf \
-p "ما هي متطلبات التسجيل في الفصل الدراسي الأول؟"
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="ALLaM-7B-Q4_K_M.gguf")
response = llm.create_chat_completion(messages=[
{"role": "system", "content": "أنت مساعد أكاديمي لجامعة القصيم."},
{"role": "user", "content": "ما هي شروط الحصول على المنحة الدراسية؟"},
])
print(response["choices"][0]["message"]["content"])
Why ALLaM?
ALLaM was pre-trained on an extensive Arabic corpus by SDAIA, giving it superior morphological understanding and dialectal coverage compared with multilingual models of similar size. Fine-tuning ALLaM on Qassim University data yields a model that is both culturally aware and domain-specific.
Intended Use & Limitations
- Intended use: Helping students navigate university regulations, registration procedures, scholarship requirements, and academic policies at Qassim University — using the most up-to-date (2024-2026) policy documents.
- Out-of-scope: General-purpose chat, medical/legal advice, topics unrelated to the university.
- Hallucinations: Like all LLMs the model may occasionally produce incorrect information. Always verify critical details with the official university portal.
Citation
@misc{qu-llm-assistant-allam-2025,
author = {Wesam Hamad},
title = {QU-LLM Assistant (ALLaM): Fine-tuned ALLaM-7B for Qassim University Advisory},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/wesam3/qu-llm-assistant-allam}
}
License
Apache-2.0 — see LICENSE.
- Downloads last month
- 28
4-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="wesam3/qu-llm-assistant-allam", filename="ALLaM-7B-Q4_K_M.gguf", )