| | --- |
| | license: mit |
| | language: |
| | - en |
| | - multilingual |
| | tags: |
| | - ocr |
| | - vision-language |
| | - document-understanding |
| | - gothitech |
| | - document-ai |
| | - text-extraction |
| | - invoice-processing |
| | - production |
| | - handwriting-recognition |
| | - table-extraction |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # GT-REX: Production OCR Model |
| |
|
| | <p align="center"> |
| | <strong>GothiTech Recognition and Extraction eXpert</strong> |
| | </p> |
| |
|
| | <p align="center"> |
| | <a href="https://huggingface.co/gothitech/GT-REX"><img src="https://img.shields.io/badge/Model-GT--REX-blue" alt="Model"></a> |
| | <a href="#"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a> |
| | <a href="#"><img src="https://img.shields.io/badge/vLLM-Supported-orange" alt="vLLM"></a> |
| | <a href="#"><img src="https://img.shields.io/badge/Params-~3B-red" alt="Parameters"></a> |
| | </p> |
| |
|
| | --- |
| |
|
| | **GT-REX** is a state-of-the-art production-grade OCR model developed by **GothiTech** for enterprise document understanding, text extraction, and intelligent document processing. Built on a Vision-Language Model (VLM) architecture, it delivers high-accuracy text extraction from complex documents including invoices, contracts, forms, handwritten notes, and dense tables. |
| |
|
| | --- |
| |
|
| | ## Table of Contents |
| |
|
| | - [GT-REX Variants](#gt-rex-variants) |
| | - [Key Features](#key-features) |
| | - [Model Details](#model-details) |
| | - [Quick Start](#quick-start) |
| | - [Installation](#installation) |
| | - [Usage Examples](#usage-examples) |
| | - [Use Cases](#use-cases) |
| | - [Performance Benchmarks](#performance-benchmarks) |
| | - [Prompt Engineering Guide](#prompt-engineering-guide) |
| | - [API Integration](#api-integration) |
| | - [Troubleshooting](#troubleshooting) |
| | - [Hardware Recommendations](#hardware-recommendations) |
| | - [License](#license) |
| | - [Citation](#citation) |
| |
|
| | --- |
| |
|
| | ## GT-REX Variants |
| |
|
| | GT-REX ships with **three optimized configurations** tailored to different performance and accuracy requirements. All variants share the same underlying model weights — they differ only in inference settings. |
| |
|
| | | Variant | Speed | Accuracy | Resolution | GPU Memory | Throughput | Best For | |
| | |---------|-------|----------|------------|------------|------------|----------| |
| | | **Nano** | Ultra Fast | Good | 640px | 4-6 GB | 100-150 docs/min | High-volume batch processing | |
| | | **Pro** (Default) | Fast | High | 1024px | 6-10 GB | 50-80 docs/min | Standard enterprise workflows | |
| | | **Ultra** | Moderate | Maximum | 1536px | 10-15 GB | 20-30 docs/min | High-accuracy and fine-detail needs | |
| |
|
| | ### How to Choose a Variant |
| |
|
| | - **Nano**: You need maximum throughput and documents are simple (receipts, IDs, labels). |
| | - **Pro**: General-purpose. Best balance for invoices, contracts, forms, and reports. |
| | - **Ultra**: Documents have fine print, dense tables, medical records, or legal footnotes. |
| |
|
| | --- |
| |
|
| | ### GT-Rex-Nano |
| |
|
| | **Speed-optimized for high-volume batch processing** |
| |
|
| | | Setting | Value | |
| | |---------|-------| |
| | | Resolution | 640 x 640 px | |
| | | Speed | ~1-2s per image | |
| | | Max Tokens | 2048 | |
| | | GPU Memory | 4-6 GB | |
| | | Recommended Batch Size | 256 sequences | |
| |
|
| | **Best for:** Thumbnails, previews, high-throughput pipelines (100+ docs/min), mobile uploads, receipt scanning. |
| |
|
| | ```python |
| | from vllm import LLM |
| | |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=2048, |
| | gpu_memory_utilization=0.6, |
| | max_num_seqs=256, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | ``` |
| |
|
| | --- |
| |
|
| | ### GT-Rex-Pro (Default) |
| |
|
| | **Balanced quality and speed for standard enterprise documents** |
| |
|
| | | Setting | Value | |
| | |---------|-------| |
| | | Resolution | 1024 x 1024 px | |
| | | Speed | ~2-5s per image | |
| | | Max Tokens | 4096 | |
| | | GPU Memory | 6-10 GB | |
| | | Recommended Batch Size | 128 sequences | |
| |
|
| | **Best for:** Contracts, forms, invoices, reports, government documents, insurance claims. |
| |
|
| | ```python |
| | from vllm import LLM |
| | |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=4096, |
| | gpu_memory_utilization=0.75, |
| | max_num_seqs=128, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | ``` |
| |
|
| | --- |
| |
|
| | ### GT-Rex-Ultra |
| |
|
| | **Maximum quality with adaptive processing for complex documents** |
| |
|
| | | Setting | Value | |
| | |---------|-------| |
| | | Resolution | 1536 x 1536 px | |
| | | Speed | ~5-10s per image | |
| | | Max Tokens | 8192 | |
| | | GPU Memory | 10-15 GB | |
| | | Recommended Batch Size | 64 sequences | |
| |
|
| | **Best for:** Legal documents, fine print, dense tables, medical records, engineering drawings, academic papers, multi-column layouts. |
| |
|
| | ```python |
| | from vllm import LLM |
| | |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=8192, |
| | gpu_memory_utilization=0.85, |
| | max_num_seqs=64, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Key Features |
| |
|
| | | Feature | Description | |
| | |---------|-------------| |
| | | **High Accuracy** | Advanced vision-language architecture for precise text extraction | |
| | | **Multi-Language** | Handles documents in English and multiple other languages | |
| | | **Production Ready** | Optimized for deployment with the vLLM inference engine | |
| | | **Batch Processing** | Process hundreds of documents per minute (Nano variant) | |
| | | **Flexible Prompts** | Supports structured extraction: JSON, tables, key-value pairs, forms | |
| | | **Handwriting Support** | Transcribes handwritten text with high fidelity | |
| | | **Three Variants** | Nano (speed), Pro (balanced), Ultra (accuracy) | |
| | | **Structured Output** | Extract data directly into JSON, Markdown tables, or custom schemas | |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | | Attribute | Value | |
| | |-----------|-------| |
| | | **Developer** | GothiTech (Jenis Hathaliya) | |
| | | **Architecture** | Vision-Language Model (VLM) | |
| | | **Model Size** | ~6.5 GB | |
| | | **Parameters** | ~7B | |
| | | **License** | MIT | |
| | | **Release Date** | February 2026 | |
| | | **Precision** | BF16 / FP16 | |
| | | **Input Resolution** | 640px - 1536px (variant dependent) | |
| | | **Max Sequence Length** | 2048 - 8192 tokens (variant dependent) | |
| | | **Inference Engine** | vLLM (recommended) | |
| | | **Framework** | PyTorch / Transformers | |
| |
|
| | --- |
| |
|
| | ## Quick Start |
| |
|
| | Get running in under 5 minutes: |
| |
|
| | ```python |
| | from vllm import LLM, SamplingParams |
| | from PIL import Image |
| | |
| | # 1. Load model (Pro variant - default) |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=4096, |
| | gpu_memory_utilization=0.75, |
| | max_num_seqs=128, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | |
| | # 2. Prepare input |
| | image = Image.open("document.png") |
| | prompt = "Extract all text from this document." |
| | |
| | # 3. Run inference |
| | sampling_params = SamplingParams( |
| | temperature=0.0, |
| | max_tokens=4096, |
| | ) |
| | |
| | outputs = llm.generate( |
| | [{ |
| | "prompt": prompt, |
| | "multi_modal_data": {"image": image}, |
| | }], |
| | sampling_params=sampling_params, |
| | ) |
| | |
| | # 4. Get results |
| | result = outputs[0].outputs[0].text |
| | print(result) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Installation |
| |
|
| | ### Prerequisites |
| |
|
| | - Python 3.9+ |
| | - CUDA 11.8+ (GPU required) |
| | - 8 GB+ VRAM (Pro variant), 4 GB+ (Nano), 12 GB+ (Ultra) |
| |
|
| | ### Install Dependencies |
| |
|
| | ```bash |
| | pip install vllm pillow torch transformers |
| | ``` |
| |
|
| | ### Verify Installation |
| |
|
| | ```python |
| | from vllm import LLM |
| | print("vLLM installed successfully!") |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Usage Examples |
| |
|
| | ### Basic Text Extraction |
| |
|
| | ```python |
| | prompt = "Extract all text from this document image." |
| | ``` |
| |
|
| | ### Structured JSON Extraction |
| |
|
| | ```python |
| | prompt = '''Extract the following fields from this invoice as JSON: |
| | { |
| | "invoice_number": "", |
| | "date": "", |
| | "vendor_name": "", |
| | "total_amount": "", |
| | "line_items": [ |
| | {"description": "", "quantity": "", "unit_price": "", "amount": ""} |
| | ] |
| | }''' |
| | ``` |
| |
|
| | ### Table Extraction (Markdown Format) |
| |
|
| | ```python |
| | prompt = "Extract all tables from this document in Markdown table format." |
| | ``` |
| |
|
| | ### Key-Value Pair Extraction |
| |
|
| | ```python |
| | prompt = '''Extract all key-value pairs from this form. |
| | Return as: |
| | Key: Value |
| | Key: Value''' |
| | ``` |
| |
|
| | ### Handwritten Text Transcription |
| |
|
| | ```python |
| | prompt = "Transcribe all handwritten text from this image accurately." |
| | ``` |
| |
|
| | ### Multi-Document Batch Processing |
| |
|
| | ```python |
| | from PIL import Image |
| | from vllm import LLM, SamplingParams |
| | |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=4096, |
| | gpu_memory_utilization=0.75, |
| | max_num_seqs=128, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | |
| | # Prepare batch |
| | image_paths = ["doc1.png", "doc2.png", "doc3.png"] |
| | prompts = [] |
| | for path in image_paths: |
| | img = Image.open(path) |
| | prompts.append({ |
| | "prompt": "Extract all text from this document.", |
| | "multi_modal_data": {"image": img}, |
| | }) |
| | |
| | # Run batch inference |
| | sampling_params = SamplingParams(temperature=0.0, max_tokens=4096) |
| | outputs = llm.generate(prompts, sampling_params=sampling_params) |
| | |
| | # Collect results |
| | for i, output in enumerate(outputs): |
| | print(f"--- Document {i + 1} ---") |
| | print(output.outputs[0].text) |
| | print() |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Use Cases |
| |
|
| | | Domain | Application | Recommended Variant | |
| | |--------|-------------|---------------------| |
| | | **Finance** | Invoice processing, receipt scanning, bank statements | Pro / Nano | |
| | | **Legal** | Contract analysis, clause extraction, legal filings | Ultra | |
| | | **Healthcare** | Medical records, prescriptions, lab reports | Ultra | |
| | | **Government** | Form processing, ID verification, tax documents | Pro | |
| | | **Insurance** | Claims processing, policy documents | Pro | |
| | | **Education** | Exam paper digitization, handwritten notes | Pro / Ultra | |
| | | **Logistics** | Shipping labels, waybills, packing lists | Nano | |
| | | **Real Estate** | Property documents, deeds, mortgage papers | Pro | |
| | | **Retail** | Product catalogs, price tags, inventory lists | Nano | |
| |
|
| | --- |
| |
|
| | ## Performance Benchmarks |
| |
|
| | ### Throughput by Variant (NVIDIA A100 80GB) |
| |
|
| | | Variant | Single Image | Batch (32) | Batch (128) | |
| | |---------|-------------|------------|-------------| |
| | | Nano | ~1.2s | ~15s | ~55s | |
| | | Pro | ~3.5s | ~45s | ~170s | |
| | | Ultra | ~7.0s | ~110s | ~380s | |
| |
|
| | ### Accuracy by Document Type (Pro Variant) |
| |
|
| | | Document Type | Character Accuracy | Field Accuracy | |
| | |---------------|--------------------|----------------| |
| | | Printed invoices | 98.5%+ | 96%+ | |
| | | Typed contracts | 98%+ | 95%+ | |
| | | Handwritten notes | 92%+ | 88%+ | |
| | | Dense tables | 96%+ | 93%+ | |
| | | Low-quality scans | 94%+ | 90%+ | |
| |
|
| | > **Note:** Benchmark numbers are approximate and may vary based on document quality, content complexity, and hardware configuration. |
| |
|
| | --- |
| |
|
| | ## Prompt Engineering Guide |
| |
|
| | Get the best results from GT-REX with these prompt strategies: |
| |
|
| | ### Tips for Best Results |
| |
|
| | **Do:** |
| | - Be specific about what to extract ("Extract the invoice number and total amount") |
| | - Specify output format ("Return as JSON", "Return as Markdown table") |
| | - Provide schema for structured extraction (show the expected JSON keys) |
| | - Use clear instructions ("Transcribe exactly as written, preserving spelling errors") |
| |
|
| | **Don't:** |
| | - Use vague prompts ("What is this?") |
| | - Ask for analysis or summarization (GT-REX is optimized for extraction) |
| | - Include unrelated context in the prompt |
| |
|
| | ### Example Prompts |
| |
|
| | ```text |
| | # Simple extraction |
| | "Extract all text from this document." |
| | |
| | # Targeted extraction |
| | "Extract only the table on this page as a Markdown table." |
| | |
| | # Schema-driven extraction |
| | "Extract data matching this schema: {name: str, date: str, amount: float}" |
| | |
| | # Preservation mode |
| | "Transcribe this document exactly as written, preserving original formatting." |
| | ``` |
| |
|
| | --- |
| |
|
| | ## API Integration |
| |
|
| | ### FastAPI Server Example |
| |
|
| | ```python |
| | from fastapi import FastAPI, UploadFile |
| | from PIL import Image |
| | from vllm import LLM, SamplingParams |
| | import io |
| | |
| | app = FastAPI() |
| | |
| | llm = LLM( |
| | model="gothitech/GT-REX", |
| | trust_remote_code=True, |
| | max_model_len=4096, |
| | gpu_memory_utilization=0.75, |
| | max_num_seqs=128, |
| | limit_mm_per_prompt={"image": 1}, |
| | ) |
| | |
| | sampling_params = SamplingParams(temperature=0.0, max_tokens=4096) |
| | |
| | |
| | @app.post("/extract") |
| | async def extract_text(file: UploadFile, prompt: str = "Extract all text."): |
| | image_bytes = await file.read() |
| | image = Image.open(io.BytesIO(image_bytes)).convert("RGB") |
| | |
| | outputs = llm.generate( |
| | [{ |
| | "prompt": prompt, |
| | "multi_modal_data": {"image": image}, |
| | }], |
| | sampling_params=sampling_params, |
| | ) |
| | |
| | return {"text": outputs[0].outputs[0].text} |
| | ``` |
| |
|
| | ### cURL Example |
| |
|
| | ```bash |
| | curl -X POST "http://localhost:8000/extract" \ |
| | -F "file=@invoice.png" \ |
| | -F "prompt=Extract all text from this invoice as JSON." |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Troubleshooting |
| |
|
| | | Issue | Solution | |
| | |-------|----------| |
| | | **CUDA Out of Memory** | Reduce `gpu_memory_utilization` or switch to Nano variant | |
| | | **Slow inference** | Increase `max_num_seqs` for better batching; use Nano for speed | |
| | | **Truncated output** | Increase `max_tokens` in `SamplingParams` | |
| | | **Low accuracy on small text** | Switch to Ultra variant for higher resolution | |
| | | **Garbled multilingual text** | Ensure image resolution is sufficient; try Ultra variant | |
| | | **Empty output** | Check that the image is loaded correctly and is not blank | |
| | | **Model loading errors** | Ensure `trust_remote_code=True` is set | |
| |
|
| | --- |
| |
|
| | ## Hardware Recommendations |
| |
|
| | | Variant | Minimum GPU | Recommended GPU | |
| | |---------|-------------|-----------------| |
| | | Nano | NVIDIA T4 (16 GB) | NVIDIA A10 (24 GB) | |
| | | Pro | NVIDIA A10 (24 GB) | NVIDIA A100 (40 GB) | |
| | | Ultra | NVIDIA A100 (40 GB) | NVIDIA A100 (80 GB) | |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | This model is released under the **MIT License**. You are free to use, modify, and distribute it for both commercial and non-commercial purposes. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use GT-REX in your work, please cite: |
| |
|
| | ```bibtex |
| | @misc{gtrex-2026, |
| | title = {GT-REX: Production-Grade OCR with Vision-Language Models}, |
| | author = {Hathaliya, Jenis}, |
| | year = {2026}, |
| | month = {February}, |
| | url = {https://huggingface.co/gothitech/GT-REX}, |
| | note = {GothiTech Recognition and Extraction eXpert} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Contact and Support |
| |
|
| | - **Developer:** Jenis Hathaliya |
| | - **Organization:** GothiTech |
| | - **HuggingFace:** [gothitech](https://huggingface.co/gothitech) |
| |
|
| | --- |
| |
|
| | <p align="center"> |
| | Built by <strong>GothiTech</strong> |
| | </p> |
| |
|
| | <p align="center"> |
| | <em>Last updated: February 2026</em><br> |
| | <em>GT-REX | Variants: Nano | Pro | Ultra</em> |
| | </p> |
| |
|