Update README.md

193b511 verified 8 months ago

5.32 kB

	---
	license: cc-by-nc-4.0
	language:
	- ar
	- en
	base_model:
	- Qwen/Qwen3-14B-Base
	pipeline_tag: text-generation
	---

	# SUHAIL-14B-preview

	> 14B Arabic LLM – LoRA fine-tuned from Qwen-3-14B-Base for instruction following and human-preference alignment

	---

	## TL;DR

	- Base model: Qwen-3-14B-Base (Transformer decoder, Rotary Positional Embeddings)
	- Fine-tuning: Two-stage Low-Rank Adaptation (LoRA)
	1. Supervised Fine-Tuning (SFT) on a curated Arabic/English instruction dataset
	2. Human Preference Alignment using binary accept/reject feedback
	- Data selection: Employed a state-of-the-art encoder-based reranker to filter the Efficient Instruction-Tuning corpus via Style-Aligned Response Ranking, retaining only stylistically coherent, high-quality samples
	- Context window: 32k tokens
	- License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
	- Intended use: Arabic content generation, multi-turn tool use (Agentic System), conversational agents, educational tools, and research (non-commercial only)
	- Training samples: 33k (SFT), 66k (human preference alignment)
	- Training cost: Less than $500

	---

	## Table of Contents

	1. [Model Description](#model-description)
	2. [Quick Start](#quick-start)
	3. [Limitations & Biases](#limitations--biases)
	4. [License](#license)
	5. [Citation](#citation)
	6. [Changelog](#changelog)

	---

	## Table of contents

	1. [Model description](#model-description)
	2. [Quick start](#quick-start)
	3. [Limitations & biases](#limitations--biases)
	4. [License](#license)
	5. [Citation](#citation)
	6. [Changelog](#changelog)

	---

	## Model Description

	SUHAIL-14B-preview extends the open-weight Qwen-3-14B-Base to better support Arabic instruction-following using Low-Rank Adaptation (LoRA). LoRA introduces small trainable matrices to linear layers as well as attention layers, keeping base weights frozen—enabling compact, efficient fine-tuning.

	### 1 · Supervised Fine-Tuning (SFT)

	We first conducted SFT on a high-quality instruction dataset in Arabic and English. This dataset was curated using Style-Aligned Response Ranking, a RoBERTa-based reranker that filters out stylistically incoherent or low-quality samples from the Instruction-Tuning corpus. This step enhanced factuality and stylistic consistency.
	> Result: Up to 22% performance improvements observed on internal benchmarks (e.g., IFEVAL).

	### 2 · Human Preference Alignment

	To align model behavior with user intent, we applied preference optimization using binary accept/reject feedback. This direct signal training guides the model toward generating helpful, honest, and harmless outputs, at low alignment cost.

	### 3 · Integration of Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models in Verifiable and Auditable Environments (TO-DO)

	### 4 · Benchmarks (TO-DO)

	> Explicit benchmark scores are not yet included. We encourage users to evaluate the model in their specific contexts.
	---

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	device = "cuda:0"
	model_id = "01-ZeroOne/SUHAIL-14B-preview"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	prompt = "اكتب ملخصًا بسيطًا عن الإنترنت باللغة العربية."
	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	The LoRA adapters are merged into the checkpoint on the Hub for ease of use.

	---

	## Limitations & biases

	* Factual reliability – hallucinations remain. Verify critical information.
	* Dialect coverage – best on Gulf & Egyptian Arabic; less data for Maghrebi and Levantine.
	* Code completeness – suitable for small code snippets, but not guaranteed bug-free.
	* Agentic Function Calling Coverage – Preliminary support included in SFT. Future updates aim to enhance reasoning and structured API calling capabilities.

	---

	## License

	Released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) — non-commercial use only.

	---

	## Citation

	```bibtex
	@software{Suhail2025,
	author = {ZeroOne AI},
	title = {SUHAIL-14B-preview},
	year = {2025},
	url = {https://huggingface.co/01-ZeroOne/SUHAIL-14B-preview}
	}
	```

	---

	## Changelog

	\| Version \| Date \| Notes \|
	\| ------- \| ---------- \| -------------------------------------------------------------------------------------------------------------------------- \|
	\| v0.1\| 2025-07-05 \| Initial public LoRA-merged release (SFT + human-preference alignment; data filtered with Style-Aligned Response Ranking) \|

	---

	Maintained by Mohammed Almaghrabi, Founder of ZeroOne AI. This work was supported by Khalid Alharbi — contributions are welcome! To contribute, please email: almaghrabima@gmail.com