Text Generation
PEFT
Safetensors
Transformers
Arabic
English
lora
sft
trl
unsloth
vision-language-model
vlm
legal
json-extraction
arabic
conversational
Instructions to use Humachine/egypt-constitution-vlm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Humachine/egypt-constitution-vlm with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-3-4b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Humachine/egypt-constitution-vlm") - Transformers
How to use Humachine/egypt-constitution-vlm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Humachine/egypt-constitution-vlm") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Humachine/egypt-constitution-vlm", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Humachine/egypt-constitution-vlm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Humachine/egypt-constitution-vlm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Humachine/egypt-constitution-vlm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Humachine/egypt-constitution-vlm
- SGLang
How to use Humachine/egypt-constitution-vlm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Humachine/egypt-constitution-vlm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Humachine/egypt-constitution-vlm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Humachine/egypt-constitution-vlm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Humachine/egypt-constitution-vlm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use Humachine/egypt-constitution-vlm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Humachine/egypt-constitution-vlm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Humachine/egypt-constitution-vlm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Humachine/egypt-constitution-vlm to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Humachine/egypt-constitution-vlm", max_seq_length=2048, ) - Docker Model Runner
How to use Humachine/egypt-constitution-vlm with Docker Model Runner:
docker model run hf.co/Humachine/egypt-constitution-vlm
File size: 5,390 Bytes
368a68e 99e2790 368a68e 99e2790 368a68e 99e2790 4bf3e50 368a68e 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 99e2790 4bf3e50 f86fbdd 4bf3e50 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
base_model: unsloth/gemma-3-4b-it-unsloth-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/gemma-3-4b-it-unsloth-bnb-4bit
- lora
- sft
- transformers
- trl
- unsloth
- vision-language-model
- vlm
- legal
- json-extraction
- arabic
language:
- ar
- en
---
# Egypt Constitution VLM
This model is a fine-tuned Vision-Language Model (VLM) based on **Gemma-3-4B-IT** (quantized to 4-bit via Unsloth). It is designed to extract highly structured JSON data—including constitutional articles, page metadata, legal intent, and named entities—directly from scanned images of Arabic constitutional and legal documents.
## Model Details
### Model Description
This model leverages Parameter-Efficient Fine-Tuning (PEFT) using LoRA on the Gemma-3 architecture. By processing image inputs of scanned legal documents alongside specific instructions, it accurately transcribes Arabic text while simultaneously structuring the output into a predefined JSON schema. Vision layers, Language layers, Attention modules, and MLP modules were all targeted during the fine-tuning process.
* **Developed by:** [Mahmoud Essam]
* **Model type:** Vision-Language Model (VLM) with LoRA Adapters
* **Language(s) (NLP):** Arabic (content extraction), English (JSON keys/schema)
* **License:** Apache 2.0 (Inherited from Gemma-3)
* **Finetuned from model:** `unsloth/gemma-3-4b-it-unsloth-bnb-4bit`
## Uses
### Direct Use
The primary use case is the digitization and structured data extraction from scanned Arabic legal documents (specifically the Egyptian Constitution). By providing a document image as input, the model outputs a structured JSON object containing:
* **Page Metadata:** Source document, page number, language.
* **Hierarchy Context:** Part and Chapter titles.
* **Articles:** raw text, cleaned body text, legal intent, key entities, Arabic summaries, and keywords.
### Out-of-Scope Use
* Recognition of highly illegible handwritten Arabic documents.
* General-purpose conversational AI or chat tasks (it is highly specialized for JSON extraction).
* Processing documents in languages other than Arabic.
## Bias, Risks, and Limitations
* **Domain Specificity:** The model is heavily optimized for formal Arabic legal and constitutional texts. It may hallucinate or underperform on standard conversational Arabic or vastly different document layouts (e.g., newspapers, unstructured letters).
* **Resolution Sensitivity:** The model relies heavily on a specific image preprocessing pipeline (resizing and padding to `1024x1024`). Feeding raw, unformatted images may degrade performance.
### Recommendations
Users should ensure that input images undergo the identical preprocessing steps used during training (Grayscale, Autocontrast, Denoising, Sharpening, and Padding) to achieve optimal extraction accuracy. Human-in-the-loop verification is recommended for critical legal digitization tasks.
## How to Get Started with the Model
Use the separated code blocks below to get started with the model using Unsloth.
### Getting Model
```python
import torch
import json
from PIL import Image, ImageOps, ImageFilter
from unsloth import FastVisionModel
# 1. Load Model & Tokenizer
model_id = "Humachine/egypt-constitution-vlm"
model, tokenizer = FastVisionModel.from_pretrained(
model_name=model_id,
load_in_4bit=True,
trust_remote_code=True
)
```
### Image Preprocessing Logic
```python
def preprocess_image(image_path: str, target_size: tuple = (1024, 1024)) -> Image.Image:
image = Image.open(image_path).convert('L')
image = ImageOps.autocontrast(image, cutoff=1)
image = image.filter(ImageFilter.MedianFilter(size=3))
image = image.filter(ImageFilter.SHARPEN)
image.thumbnail(target_size, Image.Resampling.LANCZOS)
padded = Image.new('L', target_size, color=255)
offset = ((target_size[0] - image.width) // 2, (target_size[1] - image.height) // 2)
padded.paste(image, offset)
return padded.convert('RGB')
```
### Prepare Inputs
```python
image_path = "/content/0100.jpg"
image = preprocess_image(image_path)
instruction = (
"Extract all articles from this Arabic constitutional document. "
"Return a JSON object with keys: page_metadata, hierarchy_context, and articles. "
"Each article must include: article_id, article_number, content "
"(body_text, key_entities, legal_intent), training_features (summary_ar, keywords), "
"and text_raw."
)
messages = [
{"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": instruction}]}
]
```
### Generate Output
```python
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda",dtype=torch.bfloat16)
output_tokens = model.generate(**inputs, max_new_tokens=2048, use_cache=True, temperature=0.2)
output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
print(output_text)
```
## Citation
If you use this model in your research or application, please cite it as follows:
```json
@misc{essam2026egypt,
author = {Mahmoud Essam},
title = {Egypt Constitution VLM: A Vision-Language Model for Arabic Legal JSON Extraction},
journal = {Hugging Face Repositories},
year = {2026},
url = {https://huggingface.co/your-username/egypt-constitution-vlm}
}
``` |