Instructions to use Jwalit/gemma4-e4b-kyc-document-extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jwalit/gemma4-e4b-kyc-document-extractor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Jwalit/gemma4-e4b-kyc-document-extractor")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Jwalit/gemma4-e4b-kyc-document-extractor", dtype="auto")

PEFT
How to use Jwalit/gemma4-e4b-kyc-document-extractor with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Jwalit/gemma4-e4b-kyc-document-extractor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jwalit/gemma4-e4b-kyc-document-extractor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Jwalit/gemma4-e4b-kyc-document-extractor

SGLang

How to use Jwalit/gemma4-e4b-kyc-document-extractor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jwalit/gemma4-e4b-kyc-document-extractor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jwalit/gemma4-e4b-kyc-document-extractor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jwalit/gemma4-e4b-kyc-document-extractor",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Jwalit/gemma4-e4b-kyc-document-extractor with Docker Model Runner:
```
docker model run hf.co/Jwalit/gemma4-e4b-kyc-document-extractor
```

Jwalit commited on Apr 23

Commit

200e006

verified ·

1 Parent(s): 5138c52

Add training script

Browse files

Files changed (1) hide show

train_kyc_vlm.py +237 -0

train_kyc_vlm.py ADDED Viewed

	@@ -0,0 +1,237 @@

+"""
+Fine-tune Google Gemma 4 E4B-IT for KYC Document Extraction & Classification.
+Model: google/gemma-4-E4B-it (Gemma4ForConditionalGeneration, ~8B params)
+Method: QLoRA SFT (4-bit quantization + LoRA on text decoder)
+Dataset: Jwalit/kyc-document-extraction-vlm (synthetic KYC documents)
+Hardware: A100-large (80GB VRAM)
+Output: Jwalit/gemma4-e4b-kyc-document-extractor
+Reference implementation: TRL SFT VLM docs (https://huggingface.co/docs/trl/sft_trainer#training-vision-language-models)
+"""
+import os
+import torch
+from datasets import load_dataset
+from transformers import (
+    AutoProcessor,
+    AutoModelForImageTextToText,
+    BitsAndBytesConfig,
+)
+from peft import LoraConfig
+from trl import SFTConfig, SFTTrainer
+# ============================================================
+# Configuration
+# ============================================================
+MODEL_ID = "google/gemma-4-E4B-it"
+DATASET_ID = "Jwalit/kyc-document-extraction-vlm"
+OUTPUT_DIR = "./gemma4-e4b-kyc-extractor"
+HUB_MODEL_ID = "Jwalit/gemma4-e4b-kyc-document-extractor"
+# Training hyperparameters (based on VLM SFT best practices)
+LEARNING_RATE = 2e-4       # Higher LR for LoRA adapters
+NUM_EPOCHS = 3
+BATCH_SIZE = 2
+GRADIENT_ACCUMULATION = 8  # Effective batch size = 2 * 8 = 16
+MAX_SEQ_LENGTH = None      # CRITICAL for VLMs: don't truncate image tokens
+# LoRA config (target text decoder only, vision encoder stays frozen)
+LORA_R = 16
+LORA_ALPHA = 32
+LORA_DROPOUT = 0.05
+# ============================================================
+# Setup Trackio monitoring via environment variables
+# ============================================================
+os.environ["TRACKIO_SPACE_ID"] = "Jwalit/kyc-trackio"
+os.environ["TRACKIO_PROJECT"] = "kyc-document-extractor"
+# ============================================================
+# Load dataset
+# ============================================================
+print("Loading dataset...")
+dataset = load_dataset(DATASET_ID)
+train_dataset = dataset["train"]
+eval_dataset = dataset["test"]
+print(f"Train: {len(train_dataset)} samples")
+print(f"Eval: {len(eval_dataset)} samples")
+print(f"Sample keys: {train_dataset.column_names}")
+# ============================================================
+# Model & Processor setup
+# ============================================================
+print(f"\nLoading model: {MODEL_ID}")
+# 4-bit quantization for memory efficiency
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+# Load model
+model = AutoModelForImageTextToText.from_pretrained(
+    MODEL_ID,
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+    quantization_config=bnb_config,
+    attn_implementation="flash_attention_2",
+)
+# Load processor
+processor = AutoProcessor.from_pretrained(MODEL_ID)
+# Ensure pad token is set
+if processor.tokenizer.pad_token is None:
+    processor.tokenizer.pad_token = processor.tokenizer.eos_token
+print(f"Model loaded: {model.__class__.__name__}")
+print(f"Model device map: {model.hf_device_map if hasattr(model, 'hf_device_map') else 'N/A'}")
+# ============================================================
+# LoRA Configuration
+# ============================================================
+# Target only the text decoder layers (vision encoder stays frozen)
+peft_config = LoraConfig(
+    r=LORA_R,
+    lora_alpha=LORA_ALPHA,
+    lora_dropout=LORA_DROPOUT,
+    bias="none",
+    task_type="CAUSAL_LM",
+    target_modules=[
+        "q_proj", "k_proj", "v_proj", "o_proj",
+        "gate_proj", "up_proj", "down_proj",
+    ],
+)
+print(f"\nLoRA config: r={LORA_R}, alpha={LORA_ALPHA}, dropout={LORA_DROPOUT}")
+print(f"Target modules: {peft_config.target_modules}")
+# ============================================================
+# SFT Training Configuration
+# ============================================================
+training_args = SFTConfig(
+    output_dir=OUTPUT_DIR,
+    # Training schedule
+    num_train_epochs=NUM_EPOCHS,
+    per_device_train_batch_size=BATCH_SIZE,
+    per_device_eval_batch_size=1,
+    gradient_accumulation_steps=GRADIENT_ACCUMULATION,
+    # Learning rate
+    learning_rate=LEARNING_RATE,
+    lr_scheduler_type="cosine",
+    warmup_ratio=0.05,
+    # Precision & optimization
+    bf16=True,
+    optim="adamw_torch_fused",
+    gradient_checkpointing=True,
+    # VLM-specific: DO NOT truncate (image tokens get cut off)
+    max_length=None,
+    # Logging - plain text, no tqdm
+    logging_strategy="steps",
+    logging_steps=10,
+    logging_first_step=True,
+    disable_tqdm=True,
+    report_to="trackio",
+    run_name="gemma4-e4b-kyc-sft-qlora",
+    # Eval
+    eval_strategy="steps",
+    eval_steps=100,
+    # Saving
+    save_strategy="steps",
+    save_steps=200,
+    save_total_limit=3,
+    load_best_model_at_end=True,
+    metric_for_best_model="eval_loss",
+    # Hub push
+    push_to_hub=True,
+    hub_model_id=HUB_MODEL_ID,
+    hub_strategy="every_save",
+    # SFT-specific
+    assistant_only_loss=True,  # Only train on assistant responses
+)
+# ============================================================
+# Create Trainer
+# ============================================================
+print("\nInitializing SFTTrainer...")
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_dataset,
+    eval_dataset=eval_dataset,
+    peft_config=peft_config,
+    processing_class=processor,  # Use processor (not tokenizer) for VLMs
+)
+# Print trainable parameters
+trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+total_params = sum(p.numel() for p in model.parameters())
+print(f"\nTrainable params: {trainable_params:,} / {total_params:,} ({100*trainable_params/total_params:.2f}%)")
+# ============================================================
+# Train!
+# ============================================================
+print("\n" + "="*60)
+print("Starting training...")
+print(f"  Model: {MODEL_ID}")
+print(f"  Dataset: {DATASET_ID}")
+print(f"  Epochs: {NUM_EPOCHS}")
+print(f"  Batch size: {BATCH_SIZE} x {GRADIENT_ACCUMULATION} = {BATCH_SIZE * GRADIENT_ACCUMULATION}")
+print(f"  Learning rate: {LEARNING_RATE}")
+print(f"  LoRA rank: {LORA_R}")
+print(f"  Push to: {HUB_MODEL_ID}")
+print("="*60 + "\n")
+train_result = trainer.train()
+# ============================================================
+# Save & push final model
+# ============================================================
+print("\nSaving final model...")
+trainer.save_model(OUTPUT_DIR)
+trainer.push_to_hub()
+# Log final metrics
+metrics = train_result.metrics
+print("\n" + "="*60)
+print("Training completed!")
+print(f"  Final train loss: {metrics.get('train_loss', 'N/A')}")
+print(f"  Total steps: {metrics.get('total_flos', 'N/A')}")
+print(f"  Model saved to: {HUB_MODEL_ID}")
+print(f"  View at: https://huggingface.co/{HUB_MODEL_ID}")
+print("="*60)
+# ============================================================
+# Run quick evaluation
+# ============================================================
+print("\nRunning final evaluation...")
+eval_metrics = trainer.evaluate()
+print(f"  Eval loss: {eval_metrics.get('eval_loss', 'N/A')}")
+print(f"  Eval runtime: {eval_metrics.get('eval_runtime', 'N/A')}s")
+print("\n✅ Training complete! Model is ready for vLLM deployment.")
+print(f"🔗 https://huggingface.co/{HUB_MODEL_ID}")