USLaP Mistral v22
Universal Scientific Laws and Principles (USLaP) — Fine-tuned Mistral-7B for scientific terminology validation against Qur'anic Arabic roots.
Purpose
Detects and rejects contaminated scientific terminology (Persian, Greek, Latin) and provides Qur'anic alternatives.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.2",
device_map="auto", torch_dtype="auto"
)
model = PeftModel.from_pretrained(base, "uslap/uslap-mistral-v22")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
prompt = "[INST] What is the Arabic term for geometry? [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Expected Output
❌ REJECTED: "geometry" (Greek)
❌ REJECTED: "هَنْدَسَة" (handasa) — PERSIAN CONTAMINATION
✅ USE INSTEAD: عِلْم التَّقْدِير ('Ilm al-Taqdīr)
Root: ق د ر (q-d-r) — Qur'anic: 54:49
Training
- Base: Mistral-7B-Instruct-v0.2
- Method: LoRA (r=32)
- Dataset: 2,680 validated entries
- Final Loss: 0.069-0.116
Framework
- TRL: 0.27.2
- Transformers: 5.0.0
- PEFT: LoRA adapter
License
Apache 2.0