--- license: mit language: - en - multilingual tags: - ocr - vision-language - document-understanding - gothitech - document-ai - text-extraction - invoice-processing - production - handwriting-recognition - table-extraction pipeline_tag: image-text-to-text model-index: - name: GT-REX-v4 results: [] --- # GT-REX-v4: Production OCR Model

๐Ÿฆ– GothiTech Recognition & Extraction eXpert โ€” Version 4

Model License: MIT vLLM Parameters

--- **GT-REX-v4** is a state-of-the-art production-grade OCR model developed by **GothiTech** for enterprise document understanding, text extraction, and intelligent document processing. Built on a Vision-Language Model (VLM) architecture, it delivers high-accuracy text extraction from complex documents including invoices, contracts, forms, handwritten notes, and dense tables. --- ## ๐Ÿ“‘ Table of Contents - [GT-REX Variants](#-gt-rex-variants) - [Key Features](#-key-features) - [Model Details](#-model-details) - [Quick Start](#-quick-start) - [Installation](#-installation) - [Usage Examples](#-usage-examples) - [Use Cases](#-use-cases) - [Performance Benchmarks](#-performance-benchmarks) - [Prompt Engineering Guide](#-prompt-engineering-guide) - [API Integration](#-api-integration) - [Troubleshooting](#-troubleshooting) - [License](#-license) - [Citation](#-citation) --- ## โš™๏ธ GT-REX Variants GT-REX-v4 ships with **three optimized configurations** tailored to different performance and accuracy requirements. All variants share the same underlying model weights โ€” they differ only in inference settings. | Variant | Speed | Accuracy | Resolution | GPU Memory | Throughput | Best For | |---------|-------|----------|------------|------------|------------|----------| | **๐Ÿš€ Nano** | โšกโšกโšกโšกโšก | โญโญโญ | 640px | 4โ€“6 GB | 100โ€“150 docs/min | High-volume batch processing | | **โšก Pro** *(Default)* | โšกโšกโšกโšก | โญโญโญโญ | 1024px | 6โ€“10 GB | 50โ€“80 docs/min | Standard enterprise workflows | | **๐ŸŽฏ Ultra** | โšกโšกโšก | โญโญโญโญโญ | 1536px | 10โ€“15 GB | 20โ€“30 docs/min | High-accuracy & fine-detail needs | ### How to Choose a Variant - **Nano** โ†’ You need maximum throughput and documents are simple (receipts, IDs, labels). - **Pro** โ†’ General-purpose. Best balance for invoices, contracts, forms, and reports. - **Ultra** โ†’ Documents have fine print, dense tables, medical records, or legal footnotes. --- ### ๐Ÿš€ GT-Rex-Nano **Speed-optimized for high-volume batch processing** | Setting | Value | |---------|-------| | Resolution | 640 ร— 640 px | | Speed | ~1โ€“2s per image | | Max Tokens | 2048 | | GPU Memory | 4โ€“6 GB | | Recommended Batch Size | 256 sequences | **Best for:** Thumbnails, previews, high-throughput pipelines (100+ docs/min), mobile uploads, receipt scanning. ```python from vllm import LLM llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=2048, gpu_memory_utilization=0.6, max_num_seqs=256, limit_mm_per_prompt={"image": 1}, ) ``` --- ### โšก GT-Rex-Pro (Default) **Balanced quality and speed for standard enterprise documents** | Setting | Value | |---------|-------| | Resolution | 1024 ร— 1024 px | | Speed | ~2โ€“5s per image | | Max Tokens | 4096 | | GPU Memory | 6โ€“10 GB | | Recommended Batch Size | 128 sequences | **Best for:** Contracts, forms, invoices, reports, government documents, insurance claims. ```python from vllm import LLM llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=4096, gpu_memory_utilization=0.75, max_num_seqs=128, limit_mm_per_prompt={"image": 1}, ) ``` --- ### ๐ŸŽฏ GT-Rex-Ultra **Maximum quality with adaptive processing for complex documents** | Setting | Value | |---------|-------| | Resolution | 1536 ร— 1536 px | | Speed | ~5โ€“10s per image | | Max Tokens | 8192 | | GPU Memory | 10โ€“15 GB | | Recommended Batch Size | 64 sequences | **Best for:** Legal documents, fine print, dense tables, medical records, engineering drawings, academic papers, multi-column layouts. ```python from vllm import LLM llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=8192, gpu_memory_utilization=0.85, max_num_seqs=64, limit_mm_per_prompt={"image": 1}, ) ``` --- ## ๐ŸŽฏ Key Features | Feature | Description | |---------|-------------| | **High Accuracy** | Advanced vision-language architecture for precise text extraction | | **Multi-Language** | Handles documents in English and multiple other languages | | **Production Ready** | Optimized for deployment with the vLLM inference engine | | **Batch Processing** | Process hundreds of documents per minute (Nano variant) | | **Flexible Prompts** | Supports structured extraction โ€” JSON, tables, key-value pairs, forms | | **Handwriting Support** | Transcribes handwritten text with high fidelity | | **Three Variants** | Nano (speed), Pro (balanced), Ultra (accuracy) | | **Structured Output** | Extract data directly into JSON, Markdown tables, or custom schemas | --- ## ๐Ÿ“Š Model Details | Attribute | Value | |-----------|-------| | **Developer** | GothiTech (Jenis Hathaliya) | | **Architecture** | Vision-Language Model (VLM) | | **Model Size** | ~6.5 GB | | **Parameters** | ~7B | | **License** | MIT | | **Release Date** | February 2026 | | **Precision** | BF16 / FP16 | | **Input Resolution** | 640px โ€“ 1536px (variant dependent) | | **Max Sequence Length** | 2048 โ€“ 8192 tokens (variant dependent) | | **Inference Engine** | vLLM (recommended) | | **Framework** | PyTorch / Transformers | --- ## ๐Ÿš€ Quick Start Get running in under 5 minutes: ```python from vllm import LLM, SamplingParams from PIL import Image # 1. Load model (Pro variant โ€” default) llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=4096, gpu_memory_utilization=0.75, max_num_seqs=128, limit_mm_per_prompt={"image": 1}, ) # 2. Prepare input image = Image.open("document.png") prompt = "Extract all text from this document." # 3. Run inference sampling_params = SamplingParams( temperature=0.0, max_tokens=4096, ) outputs = llm.generate( [{ "prompt": prompt, "multi_modal_data": {"image": image}, }], sampling_params=sampling_params, ) # 4. Get results result = outputs[0].outputs[0].text print(result) ``` --- ## ๐Ÿ’ป Installation ### Prerequisites - Python 3.9+ - CUDA 11.8+ (GPU required) - 8 GB+ VRAM (Pro variant), 4 GB+ (Nano), 12 GB+ (Ultra) ### Install Dependencies ```bash pip install vllm pillow torch transformers ``` ### Verify Installation ```python from vllm import LLM print("vLLM installed successfully!") ``` --- ## ๐Ÿ“– Usage Examples ### Basic Text Extraction ```python prompt = "Extract all text from this document image." ``` ### Structured JSON Extraction ```python prompt = """Extract the following fields from this invoice as JSON: { "invoice_number": "", "date": "", "vendor_name": "", "total_amount": "", "line_items": [ {"description": "", "quantity": "", "unit_price": "", "amount": ""} ] }""" ``` ### Table Extraction (Markdown Format) ```python prompt = "Extract all tables from this document in Markdown table format." ``` ### Key-Value Pair Extraction ```python prompt = """Extract all key-value pairs from this form. Return as: Key: Value Key: Value ...""" ``` ### Handwritten Text Transcription ```python prompt = "Transcribe all handwritten text from this image accurately." ``` ### Multi-Document Batch Processing ```python from PIL import Image from vllm import LLM, SamplingParams llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=4096, gpu_memory_utilization=0.75, max_num_seqs=128, limit_mm_per_prompt={"image": 1}, ) # Prepare batch image_paths = ["doc1.png", "doc2.png", "doc3.png"] prompts = [] for path in image_paths: img = Image.open(path) prompts.append({ "prompt": "Extract all text from this document.", "multi_modal_data": {"image": img}, }) # Run batch inference sampling_params = SamplingParams(temperature=0.0, max_tokens=4096) outputs = llm.generate(prompts, sampling_params=sampling_params) # Collect results for i, output in enumerate(outputs): print(f"--- Document {i + 1} ---") print(output.outputs[0].text) print() ``` --- ## ๐Ÿข Use Cases | Domain | Application | Recommended Variant | |--------|-------------|---------------------| | **Finance** | Invoice processing, receipt scanning, bank statements | Pro / Nano | | **Legal** | Contract analysis, clause extraction, legal filings | Ultra | | **Healthcare** | Medical records, prescriptions, lab reports | Ultra | | **Government** | Form processing, ID verification, tax documents | Pro | | **Insurance** | Claims processing, policy documents | Pro | | **Education** | Exam paper digitization, handwritten notes | Pro / Ultra | | **Logistics** | Shipping labels, waybills, packing lists | Nano | | **Real Estate** | Property documents, deeds, mortgage papers | Pro | | **Retail** | Product catalogs, price tags, inventory lists | Nano | --- ## ๐Ÿ“ˆ Performance Benchmarks ### Throughput by Variant (NVIDIA A100 80GB) | Variant | Single Image | Batch (32) | Batch (128) | |---------|-------------|------------|-------------| | Nano | ~1.2s | ~15s | ~55s | | Pro | ~3.5s | ~45s | ~170s | | Ultra | ~7.0s | ~110s | ~380s | ### Accuracy by Document Type (Pro Variant) | Document Type | Character Accuracy | Field Accuracy | |---------------|--------------------|----------------| | Printed invoices | 98.5%+ | 96%+ | | Typed contracts | 98%+ | 95%+ | | Handwritten notes | 92%+ | 88%+ | | Dense tables | 96%+ | 93%+ | | Low-quality scans | 94%+ | 90%+ | > **Note:** Benchmark numbers are approximate and may vary based on document quality, content complexity, and hardware configuration. --- ## ๐Ÿง  Prompt Engineering Guide Get the best results from GT-REX-v4 with these prompt strategies: ### Do's - **Be specific** about what to extract ("Extract the invoice number and total amount") - **Specify output format** ("Return as JSON", "Return as Markdown table") - **Provide schema** for structured extraction (show the expected JSON keys) - **Use clear instructions** ("Transcribe exactly as written, preserving spelling errors") ### Don'ts - Avoid vague prompts ("What is this?") - Don't ask for analysis or summarization โ€” GT-REX is optimized for **extraction** - Don't include unrelated context in the prompt ### Example Prompts ```text # Simple extraction "Extract all text from this document." # Targeted extraction "Extract only the table on this page as a Markdown table." # Schema-driven extraction "Extract data matching this schema: {name: str, date: str, amount: float}" # Preservation mode "Transcribe this document exactly as written, preserving original formatting." ``` --- ## ๐Ÿ”Œ API Integration ### FastAPI Server Example ```python from fastapi import FastAPI, UploadFile from PIL import Image from vllm import LLM, SamplingParams import io app = FastAPI() llm = LLM( model="developerJenis/GT-REX-v4", trust_remote_code=True, max_model_len=4096, gpu_memory_utilization=0.75, max_num_seqs=128, limit_mm_per_prompt={"image": 1}, ) sampling_params = SamplingParams(temperature=0.0, max_tokens=4096) @app.post("/extract") async def extract_text(file: UploadFile, prompt: str = "Extract all text."): image_bytes = await file.read() image = Image.open(io.BytesIO(image_bytes)).convert("RGB") outputs = llm.generate( [{ "prompt": prompt, "multi_modal_data": {"image": image}, }], sampling_params=sampling_params, ) return {"text": outputs[0].outputs[0].text} ``` --- ## ๐Ÿ› ๏ธ Troubleshooting | Issue | Solution | |-------|----------| | **CUDA Out of Memory** | Reduce `gpu_memory_utilization` or switch to Nano variant | | **Slow inference** | Increase `max_num_seqs` for better batching; use Nano for speed | | **Truncated output** | Increase `max_tokens` in `SamplingParams` | | **Low accuracy on small text** | Switch to Ultra variant for higher resolution | | **Garbled multilingual text** | Ensure image resolution is sufficient; try Ultra variant | --- ## ๐Ÿ”ง Hardware Recommendations | Variant | Minimum GPU | Recommended GPU | |---------|-------------|-----------------| | Nano | NVIDIA T4 (16 GB) | NVIDIA A10 (24 GB) | | Pro | NVIDIA A10 (24 GB) | NVIDIA A100 (40 GB) | | Ultra | NVIDIA A100 (40 GB) | NVIDIA A100 (80 GB) | --- ## ๐Ÿ“œ License This model is released under the **MIT License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes. --- ## ๐Ÿ“– Citation If you use GT-REX-v4 in your work, please cite: ```bibtex @misc{gtrex-v4-2026, title = {GT-REX-v4: Production-Grade OCR with Vision-Language Models}, author = {Hathaliya, Jenis}, year = {2026}, month = {February}, url = {https://huggingface.co/developerJenis/GT-REX-v4}, note = {GothiTech Recognition \& Extraction eXpert, Version 4} } ``` --- ## ๐Ÿค Contact & Support - **Developer:** Jenis Hathaliya - **Organization:** GothiTech - **HuggingFace:** [developerJenis](https://huggingface.co/developerJenis) ---

Built with โค๏ธ by GothiTech

Last updated: February 2026
Model Version: v4.0 | Variants: Nano | Pro | Ultra