Instructions to use Jwalit/gemma4-e4b-kyc-document-extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jwalit/gemma4-e4b-kyc-document-extractor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Jwalit/gemma4-e4b-kyc-document-extractor")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Jwalit/gemma4-e4b-kyc-document-extractor", dtype="auto")

PEFT
How to use Jwalit/gemma4-e4b-kyc-document-extractor with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Jwalit/gemma4-e4b-kyc-document-extractor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jwalit/gemma4-e4b-kyc-document-extractor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Jwalit/gemma4-e4b-kyc-document-extractor

SGLang

How to use Jwalit/gemma4-e4b-kyc-document-extractor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jwalit/gemma4-e4b-kyc-document-extractor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jwalit/gemma4-e4b-kyc-document-extractor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Jwalit/gemma4-e4b-kyc-document-extractor with Docker Model Runner:
```
docker model run hf.co/Jwalit/gemma4-e4b-kyc-document-extractor
```

Jwalit commited on 28 days ago

Commit

7357eb4

verified ·

1 Parent(s): bf014b7

Add Colab training notebook for free GPU training

Browse files

Files changed (1) hide show

train_kyc_colab.ipynb +362 -0

train_kyc_colab.ipynb ADDED Viewed

	@@ -0,0 +1,362 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 🔍 Train Gemma 4 E4B for KYC Document Extraction & Classification\n",
+    "\n",
+    "**Free GPU Training on Google Colab**\n",
+    "\n",
+    "This notebook fine-tunes `google/gemma-4-E4B-it` using QLoRA SFT for:\n",
+    "- **Document Classification**: Aadhaar, PAN, Passport, Visa, Election Card\n",
+    "- **Field Extraction**: Extract all structured fields as JSON\n",
+    "\n",
+    "**Requirements**: Colab T4 (free) or L4/A100 (Colab Pro)\n",
+    "\n",
+    "| Resource | Link |\n",
+    "|----------|------|\n",
+    "| Dataset | [Jwalit/kyc-document-extraction-vlm](https://huggingface.co/datasets/Jwalit/kyc-document-extraction-vlm) |\n",
+    "| Model Repo | [Jwalit/gemma4-e4b-kyc-document-extractor](https://huggingface.co/Jwalit/gemma4-e4b-kyc-document-extractor) |\n",
+    "| Base Model | [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it) |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q torch transformers trl datasets peft accelerate bitsandbytes trackio pillow\n",
+    "!pip install -q flash-attn --no-build-isolation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Login to Hugging Face"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import notebook_login\n",
+    "notebook_login()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Check GPU"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
+    "print(f\"VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB\")\n",
+    "print(f\"CUDA: {torch.version.cuda}\")\n",
+    "print(f\"PyTorch: {torch.__version__}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Load Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "DATASET_ID = \"Jwalit/kyc-document-extraction-vlm\"\n",
+    "dataset = load_dataset(DATASET_ID)\n",
+    "train_dataset = dataset[\"train\"]\n",
+    "eval_dataset = dataset[\"test\"]\n",
+    "\n",
+    "print(f\"Train: {len(train_dataset)} samples\")\n",
+    "print(f\"Eval: {len(eval_dataset)} samples\")\n",
+    "print(f\"Columns: {train_dataset.column_names}\")\n",
+    "\n",
+    "# Preview a sample\n",
+    "sample = train_dataset[0]\n",
+    "print(f\"\\nSample message roles: {[m['role'] for m in sample['messages']]}\")\n",
+    "print(f\"Num images: {len(sample['images'])}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Load Model with QLoRA (4-bit)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig\n",
+    "\n",
+    "MODEL_ID = \"google/gemma-4-E4B-it\"\n",
+    "\n",
+    "# 4-bit quantization\n",
+    "bnb_config = BitsAndBytesConfig(\n",
+    "    load_in_4bit=True,\n",
+    "    bnb_4bit_use_double_quant=True,\n",
+    "    bnb_4bit_quant_type=\"nf4\",\n",
+    "    bnb_4bit_compute_dtype=torch.bfloat16,\n",
+    ")\n",
+    "\n",
+    "print(f\"Loading {MODEL_ID}...\")\n",
+    "model = AutoModelForImageTextToText.from_pretrained(\n",
+    "    MODEL_ID,\n",
+    "    device_map=\"auto\",\n",
+    "    torch_dtype=torch.bfloat16,\n",
+    "    quantization_config=bnb_config,\n",
+    "    attn_implementation=\"flash_attention_2\",\n",
+    ")\n",
+    "\n",
+    "processor = AutoProcessor.from_pretrained(MODEL_ID)\n",
+    "if processor.tokenizer.pad_token is None:\n",
+    "    processor.tokenizer.pad_token = processor.tokenizer.eos_token\n",
+    "\n",
+    "print(f\"Model loaded: {model.__class__.__name__}\")\n",
+    "print(f\"GPU memory used: {torch.cuda.memory_allocated() / 1e9:.2f} GB\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Configure LoRA & Training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from peft import LoraConfig\n",
+    "from trl import SFTConfig, SFTTrainer\n",
+    "\n",
+    "# ===== YOUR SETTINGS =====\n",
+    "HUB_MODEL_ID = \"Jwalit/gemma4-e4b-kyc-document-extractor\"  # Change to your username!\n",
+    "OUTPUT_DIR = \"./gemma4-kyc-extractor\"\n",
+    "\n",
+    "# Trackio monitoring (optional)\n",
+    "os.environ[\"TRACKIO_SPACE_ID\"] = \"Jwalit/kyc-trackio\"  # Change to your space\n",
+    "os.environ[\"TRACKIO_PROJECT\"] = \"kyc-document-extractor\"\n",
+    "# =========================\n",
+    "\n",
+    "# LoRA: target text decoder only (vision encoder stays frozen)\n",
+    "peft_config = LoraConfig(\n",
+    "    r=16,\n",
+    "    lora_alpha=32,\n",
+    "    lora_dropout=0.05,\n",
+    "    bias=\"none\",\n",
+    "    task_type=\"CAUSAL_LM\",\n",
+    "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
+    ")\n",
+    "\n",
+    "# SFT config optimized for T4 (16GB VRAM)\n",
+    "training_args = SFTConfig(\n",
+    "    output_dir=OUTPUT_DIR,\n",
+    "    num_train_epochs=3,\n",
+    "    per_device_train_batch_size=1,        # T4: batch=1, accumulate=16\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    gradient_accumulation_steps=16,        # Effective batch = 16\n",
+    "    learning_rate=2e-4,\n",
+    "    lr_scheduler_type=\"cosine\",\n",
+    "    warmup_ratio=0.05,\n",
+    "    bf16=True,\n",
+    "    optim=\"adamw_torch_fused\",\n",
+    "    gradient_checkpointing=True,\n",
+    "    max_length=None,                       # CRITICAL for VLMs\n",
+    "    logging_strategy=\"steps\",\n",
+    "    logging_steps=10,\n",
+    "    logging_first_step=True,\n",
+    "    disable_tqdm=False,                    # Keep tqdm in Colab\n",
+    "    report_to=\"trackio\",\n",
+    "    run_name=\"gemma4-kyc-colab\",\n",
+    "    eval_strategy=\"steps\",\n",
+    "    eval_steps=100,\n",
+    "    save_strategy=\"steps\",\n",
+    "    save_steps=200,\n",
+    "    save_total_limit=2,\n",
+    "    load_best_model_at_end=True,\n",
+    "    metric_for_best_model=\"eval_loss\",\n",
+    "    push_to_hub=True,\n",
+    "    hub_model_id=HUB_MODEL_ID,\n",
+    "    hub_strategy=\"every_save\",\n",
+    "    assistant_only_loss=True,\n",
+    ")\n",
+    "\n",
+    "print(\"Config ready!\")\n",
+    "print(f\"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}\")\n",
+    "print(f\"  Push to: {HUB_MODEL_ID}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Train! 🚀"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=training_args,\n",
+    "    train_dataset=train_dataset,\n",
+    "    eval_dataset=eval_dataset,\n",
+    "    peft_config=peft_config,\n",
+    "    processing_class=processor,\n",
+    ")\n",
+    "\n",
+    "# Print trainable params\n",
+    "trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
+    "total = sum(p.numel() for p in model.parameters())\n",
+    "print(f\"Trainable: {trainable:,} / {total:,} ({100*trainable/total:.2f}%)\")\n",
+    "\n",
+    "# Train\n",
+    "train_result = trainer.train()\n",
+    "\n",
+    "# Save & push\n",
+    "trainer.save_model(OUTPUT_DIR)\n",
+    "trainer.push_to_hub()\n",
+    "\n",
+    "print(f\"\\n✅ Done! Model at: https://huggingface.co/{HUB_MODEL_ID}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Test Inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test on a sample from the eval set\n",
+    "test_sample = eval_dataset[0]\n",
+    "test_image = test_sample[\"images\"][0]\n",
+    "\n",
+    "# Display the document\n",
+    "from IPython.display import display\n",
+    "display(test_image)\n",
+    "\n",
+    "# Run inference\n",
+    "messages = [\n",
+    "    {\"role\": \"system\", \"content\": [{\"type\": \"text\", \"text\": \"You are an expert KYC document analyst. Always respond with accurate, structured JSON output.\"}]},\n",
+    "    {\"role\": \"user\", \"content\": [\n",
+    "        {\"type\": \"image\"},\n",
+    "        {\"type\": \"text\", \"text\": \"Classify this document and extract all information as structured JSON.\"}\n",
+    "    ]}\n",
+    "]\n",
+    "\n",
+    "inputs = processor.apply_chat_template(\n",
+    "    messages, add_generation_prompt=True, tokenize=True,\n",
+    "    return_dict=True, return_tensors=\"pt\", images=[test_image]\n",
+    ").to(model.device)\n",
+    "\n",
+    "with torch.no_grad():\n",
+    "    output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, do_sample=True)\n",
+    "\n",
+    "result = processor.batch_decode(output[:, inputs[\"input_ids\"].shape[1]:], skip_special_tokens=True)[0]\n",
+    "\n",
+    "import json\n",
+    "print(\"\\n📄 Model Output:\")\n",
+    "try:\n",
+    "    print(json.dumps(json.loads(result), indent=2))\n",
+    "except:\n",
+    "    print(result)\n",
+    "\n",
+    "print(\"\\n📋 Ground Truth:\")\n",
+    "gt_msg = test_sample[\"messages\"][-1]  # assistant message\n",
+    "gt_text = gt_msg[\"content\"][0][\"text\"] if isinstance(gt_msg[\"content\"], list) else gt_msg[\"content\"]\n",
+    "try:\n",
+    "    print(json.dumps(json.loads(gt_text), indent=2))\n",
+    "except:\n",
+    "    print(gt_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Deploy with vLLM\n",
+    "\n",
+    "After training, deploy the model with vLLM for production speed:\n",
+    "\n",
+    "```bash\n",
+    "# Merge LoRA adapters first (optional but recommended)\n",
+    "python -c \"\n",
+    "from peft import AutoPeftModelForCausalLM\n",
+    "import torch\n",
+    "model = AutoPeftModelForCausalLM.from_pretrained('Jwalit/gemma4-e4b-kyc-document-extractor', device_map='auto', torch_dtype=torch.bfloat16)\n",
+    "merged = model.merge_and_unload()\n",
+    "merged.save_pretrained('./merged-kyc-extractor')\n",
+    "\"\n",
+    "\n",
+    "# Start vLLM server\n",
+    "python -m vllm.entrypoints.openai.api_server \\\n",
+    "    --model ./merged-kyc-extractor \\\n",
+    "    --max-model-len 4096 \\\n",
+    "    --dtype bfloat16\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}