--- base_model: unsloth/gemma-3-4b-it-unsloth-bnb-4bit library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:unsloth/gemma-3-4b-it-unsloth-bnb-4bit - lora - sft - transformers - trl - unsloth - vision-language-model - vlm - legal - json-extraction - arabic language: - ar - en --- # Egypt Constitution VLM This model is a fine-tuned Vision-Language Model (VLM) based on **Gemma-3-4B-IT** (quantized to 4-bit via Unsloth). It is designed to extract highly structured JSON data—including constitutional articles, page metadata, legal intent, and named entities—directly from scanned images of Arabic constitutional and legal documents. ## Model Details ### Model Description This model leverages Parameter-Efficient Fine-Tuning (PEFT) using LoRA on the Gemma-3 architecture. By processing image inputs of scanned legal documents alongside specific instructions, it accurately transcribes Arabic text while simultaneously structuring the output into a predefined JSON schema. Vision layers, Language layers, Attention modules, and MLP modules were all targeted during the fine-tuning process. * **Developed by:** [Mahmoud Essam] * **Model type:** Vision-Language Model (VLM) with LoRA Adapters * **Language(s) (NLP):** Arabic (content extraction), English (JSON keys/schema) * **License:** Apache 2.0 (Inherited from Gemma-3) * **Finetuned from model:** `unsloth/gemma-3-4b-it-unsloth-bnb-4bit` ## Uses ### Direct Use The primary use case is the digitization and structured data extraction from scanned Arabic legal documents (specifically the Egyptian Constitution). By providing a document image as input, the model outputs a structured JSON object containing: * **Page Metadata:** Source document, page number, language. * **Hierarchy Context:** Part and Chapter titles. * **Articles:** raw text, cleaned body text, legal intent, key entities, Arabic summaries, and keywords. ### Out-of-Scope Use * Recognition of highly illegible handwritten Arabic documents. * General-purpose conversational AI or chat tasks (it is highly specialized for JSON extraction). * Processing documents in languages other than Arabic. ## Bias, Risks, and Limitations * **Domain Specificity:** The model is heavily optimized for formal Arabic legal and constitutional texts. It may hallucinate or underperform on standard conversational Arabic or vastly different document layouts (e.g., newspapers, unstructured letters). * **Resolution Sensitivity:** The model relies heavily on a specific image preprocessing pipeline (resizing and padding to `1024x1024`). Feeding raw, unformatted images may degrade performance. ### Recommendations Users should ensure that input images undergo the identical preprocessing steps used during training (Grayscale, Autocontrast, Denoising, Sharpening, and Padding) to achieve optimal extraction accuracy. Human-in-the-loop verification is recommended for critical legal digitization tasks. ## How to Get Started with the Model Use the separated code blocks below to get started with the model using Unsloth. ### Getting Model ```python import torch import json from PIL import Image, ImageOps, ImageFilter from unsloth import FastVisionModel # 1. Load Model & Tokenizer model_id = "Humachine/egypt-constitution-vlm" model, tokenizer = FastVisionModel.from_pretrained( model_name=model_id, load_in_4bit=True, trust_remote_code=True ) ``` ### Image Preprocessing Logic ```python def preprocess_image(image_path: str, target_size: tuple = (1024, 1024)) -> Image.Image: image = Image.open(image_path).convert('L') image = ImageOps.autocontrast(image, cutoff=1) image = image.filter(ImageFilter.MedianFilter(size=3)) image = image.filter(ImageFilter.SHARPEN) image.thumbnail(target_size, Image.Resampling.LANCZOS) padded = Image.new('L', target_size, color=255) offset = ((target_size[0] - image.width) // 2, (target_size[1] - image.height) // 2) padded.paste(image, offset) return padded.convert('RGB') ``` ### Prepare Inputs ```python image_path = "/content/0100.jpg" image = preprocess_image(image_path) instruction = ( "Extract all articles from this Arabic constitutional document. " "Return a JSON object with keys: page_metadata, hierarchy_context, and articles. " "Each article must include: article_id, article_number, content " "(body_text, key_entities, legal_intent), training_features (summary_ar, keywords), " "and text_raw." ) messages = [ {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": instruction}]} ] ``` ### Generate Output ```python text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to("cuda",dtype=torch.bfloat16) output_tokens = model.generate(**inputs, max_new_tokens=2048, use_cache=True, temperature=0.2) output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True) print(output_text) ``` ## Citation If you use this model in your research or application, please cite it as follows: ```json @misc{essam2026egypt, author = {Mahmoud Essam}, title = {Egypt Constitution VLM: A Vision-Language Model for Arabic Legal JSON Extraction}, journal = {Hugging Face Repositories}, year = {2026}, url = {https://huggingface.co/your-username/egypt-constitution-vlm} } ```