| | --- |
| | license: cc-by-nc-4.0 |
| | language: |
| | - ar |
| | - en |
| | base_model: |
| | - Qwen/Qwen3-14B-Base |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # SUHAIL-14B-preview |
| |
|
| | > **14B Arabic LLM – LoRA fine-tuned from Qwen-3-14B-Base for instruction following and human-preference alignment** |
| |
|
| | --- |
| |
|
| | ## TL;DR |
| |
|
| | - **Base model**: Qwen-3-14B-Base (Transformer decoder, Rotary Positional Embeddings) |
| | - **Fine-tuning**: Two-stage **Low-Rank Adaptation (LoRA)** |
| | 1. **Supervised Fine-Tuning (SFT)** on a curated Arabic/English instruction dataset |
| | 2. **Human Preference Alignment** using binary accept/reject feedback |
| | - **Data selection**: Employed a **state-of-the-art encoder-based reranker** to filter the Efficient Instruction-Tuning corpus via **Style-Aligned Response Ranking**, retaining only stylistically coherent, high-quality samples |
| | - **Context window**: 32k tokens |
| | - **License**: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) |
| | - **Intended use**: Arabic content generation, multi-turn tool use (Agentic System), conversational agents, educational tools, and research (non-commercial only) |
| | - **Training samples**: 33k (SFT), 66k (human preference alignment) |
| | - **Training cost**: Less than $500 |
| |
|
| | --- |
| |
|
| | ## Table of Contents |
| |
|
| | 1. [Model Description](#model-description) |
| | 2. [Quick Start](#quick-start) |
| | 3. [Limitations & Biases](#limitations--biases) |
| | 4. [License](#license) |
| | 5. [Citation](#citation) |
| | 6. [Changelog](#changelog) |
| |
|
| | --- |
| |
|
| | ## Table of contents |
| |
|
| | 1. [Model description](#model-description) |
| | 2. [Quick start](#quick-start) |
| | 3. [Limitations & biases](#limitations--biases) |
| | 4. [License](#license) |
| | 5. [Citation](#citation) |
| | 6. [Changelog](#changelog) |
| |
|
| | --- |
| |
|
| | ## Model Description |
| |
|
| | **SUHAIL-14B-preview** extends the open-weight **Qwen-3-14B-Base** to better support Arabic instruction-following using **Low-Rank Adaptation (LoRA)**. LoRA introduces small trainable matrices to linear layers as well as attention layers, keeping base weights frozen—enabling compact, efficient fine-tuning. |
| |
|
| | ### 1 · Supervised Fine-Tuning (SFT) |
| |
|
| | We first conducted SFT on a high-quality instruction dataset in Arabic and English. This dataset was curated using **Style-Aligned Response Ranking**, a RoBERTa-based reranker that filters out stylistically incoherent or low-quality samples from the Instruction-Tuning corpus. This step enhanced factuality and stylistic consistency. |
| | > **Result**: Up to 22% performance improvements observed on internal benchmarks (e.g., IFEVAL). |
| |
|
| | ### 2 · Human Preference Alignment |
| |
|
| | To align model behavior with user intent, we applied preference optimization using binary accept/reject feedback. This direct signal training guides the model toward generating helpful, honest, and harmless outputs, at low alignment cost. |
| |
|
| | ### 3 · Integration of Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models in Verifiable and Auditable Environments (TO-DO) |
| |
|
| | ### 4 · Benchmarks (TO-DO) |
| |
|
| | > *Explicit benchmark scores are not yet included. We encourage users to evaluate the model in their specific contexts.* |
| | --- |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | device = "cuda:0" |
| | model_id = "01-ZeroOne/SUHAIL-14B-preview" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | torch_dtype=torch.float16, |
| | device_map="auto" |
| | ) |
| | |
| | prompt = "اكتب ملخصًا بسيطًا عن الإنترنت باللغة العربية." |
| | inputs = tokenizer(prompt, return_tensors="pt").to(device) |
| | outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | *The LoRA adapters are merged into the checkpoint on the Hub for ease of use.* |
| |
|
| | --- |
| |
|
| | ## Limitations & biases |
| |
|
| | * **Factual reliability** – hallucinations remain. Verify critical information. |
| | * **Dialect coverage** – best on Gulf & Egyptian Arabic; less data for Maghrebi and Levantine. |
| | * **Code completeness** – suitable for small code snippets, but not guaranteed bug-free. |
| | * **Agentic Function Calling Coverage** – Preliminary support included in SFT. Future updates aim to enhance reasoning and structured API calling capabilities. |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | Released under the **Creative Commons Attribution-NonCommercial 4.0 International** (CC BY-NC 4.0) — non-commercial use only. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @software{Suhail2025, |
| | author = {ZeroOne AI}, |
| | title = {SUHAIL-14B-preview}, |
| | year = {2025}, |
| | url = {https://huggingface.co/01-ZeroOne/SUHAIL-14B-preview} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Changelog |
| |
|
| | | Version | Date | Notes | |
| | | ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------- | |
| | | **v0.1**| 2025-07-05 | Initial public LoRA-merged release (SFT + human-preference alignment; data filtered with Style-Aligned Response Ranking) | |
| |
|
| | --- |
| |
|
| | Maintained by **Mohammed Almaghrabi**, Founder of **ZeroOne AI**. This work was supported by **Khalid Alharbi** — contributions are welcome! To contribute, please email: almaghrabima@gmail.com |