developerJenis commited on
Commit
521c4b6
·
verified ·
1 Parent(s): 8f06446

Add GT-REX variants (Nano/Pro/Ultra) to model card

Browse files
Files changed (1) hide show
  1. README.md +61 -227
README.md CHANGED
@@ -19,271 +19,105 @@ pipeline_tag: image-text-to-text
19
 
20
  **GT-REX-v4** is a state-of-the-art production-grade OCR model developed by GothiTech for enterprise document understanding, text extraction, and intelligent document processing.
21
 
22
- ## 🎯 Key Features
23
-
24
- - **High Accuracy**: Advanced vision-language architecture for precise text extraction
25
- - **Multi-Language Support**: Handles documents in multiple languages
26
- - **Production Ready**: Optimized for deployment with vLLM inference engine
27
- - **Batch Processing**: Process hundreds of documents per minute
28
- - **Flexible Prompts**: Support for structured extraction (JSON, tables, forms)
29
- - **Handwriting Support**: Capable of transcribing handwritten text
30
-
31
- ## 📊 Model Details
32
-
33
- | Attribute | Value |
34
- |-----------|-------|
35
- | **Developer** | GothiTech (Jenis Hathaliya) |
36
- | **Architecture** | Vision-Language Model (VLM) |
37
- | **Model Size** | ~6.5 GB |
38
- | **Parameters** | ~7B |
39
- | **License** | MIT |
40
- | **Release Date** | February 2026 |
41
- | **Precision** | BF16/FP16 |
42
- | **Input Resolution** | Up to 1024x1024 |
43
-
44
- ## 🚀 Use Cases
45
-
46
- ### Enterprise Applications
47
- - 📄 **Document Digitization**: Convert scanned documents to editable text
48
- - 🧾 **Invoice & Receipt Processing**: Extract structured data from financial documents
49
- - 📋 **Form Automation**: Auto-fill and process forms from images
50
- - 📑 **Contract Analysis**: Extract key terms and clauses from legal documents
51
- - 🏥 **Medical Records**: Digitize patient records and prescriptions
52
- - 📦 **Logistics**: Process shipping labels, delivery notes, and manifests
53
 
54
- ### Advanced Features
55
- - ✍️ **Handwriting Recognition**: Transcribe handwritten notes and forms
56
- - 🌍 **Multi-language OCR**: Support for English, Spanish, French, German, Chinese, and more
57
- - 📊 **Table Extraction**: Parse complex tables with accurate cell detection
58
- - 🎨 **Layout Understanding**: Maintain document structure and formatting
59
- - 🔍 **Selective Extraction**: Target specific fields with custom prompts
60
-
61
- ## 💻 Installation
62
 
63
- ```bash
64
- pip install vllm pillow torch transformers
65
- ```
 
 
66
 
67
- ## 🔧 Usage
 
68
 
69
- ### Basic Usage with vLLM
 
 
 
70
 
71
  ```python
72
- from vllm import LLM, SamplingParams
73
- from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
74
- from PIL import Image
75
-
76
- # Initialize model
77
  llm = LLM(
78
  model='developerJenis/GT-REX-v4',
79
  trust_remote_code=True,
80
- max_model_len=4096,
81
- gpu_memory_utilization=0.75,
 
82
  logits_processors=[NGramPerReqLogitsProcessor],
83
  )
84
-
85
- # Load document
86
- image = Image.open('invoice.jpg')
87
- prompt = '<image>\n<|grounding|>Extract all text from this document.'
88
-
89
- # Generate
90
- result = llm.generate(
91
- {'prompt': prompt, 'multi_modal_data': {'image': image}},
92
- SamplingParams(temperature=0.0, max_tokens=2000)
93
- )
94
-
95
- # Extract text
96
- print(result.outputs.text)
97
  ```
98
 
99
- ### Structured Data Extraction (JSON)
100
-
101
- ```python
102
- # Extract specific fields in JSON format
103
- prompt = '''<image>\n<|grounding|>Extract the following information in JSON format:
104
- - invoice_number
105
- - date
106
- - vendor_name
107
- - total_amount
108
- - line_items (list)'''
109
-
110
- result = llm.generate(
111
- {'prompt': prompt, 'multi_modal_data': {'image': invoice_image}},
112
- SamplingParams(temperature=0.0, max_tokens=2000)
113
- )
114
-
115
- import json
116
- data = json.loads(result.outputs.text)
117
- print(data)
118
- ```
119
 
120
- ### Batch Processing
 
 
 
121
 
122
- ```python
123
- # Process multiple documents efficiently
124
- from pathlib import Path
125
-
126
- doc_paths = list(Path('documents/').glob('*.jpg'))
127
- images = [Image.open(p) for p in doc_paths]
128
-
129
- prompts = [
130
- {'prompt': '<image>\n<|grounding|>Extract all text.',
131
- 'multi_modal_data': {'image': img}}
132
- for img in images
133
- ]
134
-
135
- # Batch inference
136
- results = llm.generate(
137
- prompts,
138
- SamplingParams(temperature=0.0, max_tokens=2000)
139
- )
140
-
141
- for i, result in enumerate(results):
142
- text = result.outputs.text
143
- print(f'Document {i}: {text[:100]}...')
144
- ```
145
-
146
- ### Table Extraction
147
-
148
- ```python
149
- # Extract tables with structure preservation
150
- prompt = '<image>\n<|grounding|>Extract all tables in markdown format.'
151
-
152
- result = llm.generate(
153
- {'prompt': prompt, 'multi_modal_data': {'image': table_image}},
154
- SamplingParams(temperature=0.0, max_tokens=3000)
155
- )
156
-
157
- markdown_table = result.outputs.text
158
- print(markdown_table)
159
- ```
160
-
161
- ## 📈 Performance Benchmarks
162
-
163
- | Metric | T4 GPU | V100 GPU | A100 GPU |
164
- |--------|---------|----------|----------|
165
- | **Latency (single image)** | 3-5 sec | 2-3 sec | 1-2 sec |
166
- | **Throughput (batch=8)** | ~60 img/min | ~120 img/min | ~200 img/min |
167
- | **GPU Memory** | 6-8 GB | 8-10 GB | 10-12 GB |
168
- | **Max Resolution** | 1024x1024 | 1024x1024 | 1024x1024 |
169
-
170
- ## ⚙️ System Requirements
171
-
172
- ### Minimum Requirements
173
- ```
174
- Python >= 3.8
175
- PyTorch >= 2.0
176
- CUDA >= 11.8
177
- GPU Memory: 15GB+ (T4 or better)
178
- vLLM >= 0.15.0
179
- ```
180
-
181
- ### Recommended Setup
182
- ```
183
- Python 3.10+
184
- PyTorch 2.1+
185
- CUDA 12.1+
186
- GPU: A100 (40GB) or V100 (32GB)
187
- vLLM 0.16+
188
- ```
189
-
190
- ## 🎛️ Advanced Configuration
191
-
192
- ### Optimize for Throughput
193
  ```python
194
  llm = LLM(
195
  model='developerJenis/GT-REX-v4',
196
  trust_remote_code=True,
197
- tensor_parallel_size=2, # Multi-GPU
 
198
  max_num_seqs=128,
199
- max_num_batched_tokens=8192,
200
- gpu_memory_utilization=0.9,
201
  )
202
  ```
203
 
204
- ### Optimize for Latency
 
 
 
 
 
 
 
205
  ```python
206
  llm = LLM(
207
  model='developerJenis/GT-REX-v4',
208
  trust_remote_code=True,
209
- max_num_seqs=1,
210
- gpu_memory_utilization=0.6,
211
- enable_prefix_caching=True,
 
212
  )
213
  ```
214
 
215
- ## 📝 Supported Prompt Templates
216
-
217
- ### General Extraction
218
- - `Extract all text from this document`
219
- - `Transcribe the entire page`
220
- - `Convert this image to text`
221
-
222
- ### Structured Extraction
223
- - `Extract invoice number, date, and total in JSON format`
224
- - `Parse all form fields as key-value pairs`
225
- - `Extract table data in CSV format`
226
-
227
- ### Selective Extraction
228
- - `Extract only the recipient address`
229
- - `Find and extract all dates`
230
- - `Extract signature fields`
231
-
232
- ## 🏆 Model Capabilities
233
-
234
- ✅ **Printed Text**: High accuracy on machine-printed documents
235
- ✅ **Handwriting**: Good performance on clear handwritten text
236
- ✅ **Tables**: Accurate cell detection and structure preservation
237
- ✅ **Multi-column**: Handles complex layouts
238
- ✅ **Low Quality**: Works on scanned and photographed documents
239
- ✅ **Mixed Content**: Text + images + tables in same document
240
-
241
- ## 🔒 Limitations
242
-
243
- - Requires GPU for inference (CPU inference not supported)
244
- - Maximum input resolution: 1024x1024 pixels
245
- - Performance may vary on heavily degraded or low-contrast images
246
- - Complex mathematical formulas may require specialized prompts
247
-
248
- ## 👨‍💻 Developer
249
-
250
- **Jenis Hathaliya** - AI Engineer at GothiTech
251
-
252
- Specializing in production AI systems, document intelligence, and enterprise ML deployment.
253
-
254
- - 🌐 HuggingFace: [@developerJenis](https://huggingface.co/developerJenis)
255
- - 💻 GitHub: [@developerJenis](https://github.com/developerJenis)
256
- - 🏢 Company: GothiTech - AI Solutions for Enterprise
257
-
258
- ## 📞 Support & Contact
259
-
260
- For enterprise support, custom deployments, or commercial licensing:
261
- - Open an issue on GitHub
262
- - Contact via HuggingFace profile
263
-
264
- ## 📄 License
265
 
266
- This model is released under the MIT License. See LICENSE file for details.
 
 
 
 
 
 
267
 
268
- ## 🙏 Acknowledgments
269
 
270
- Built with cutting-edge ML frameworks and optimized for production deployment.
 
 
 
 
 
 
 
 
 
271
 
272
- ## 📖 Citation
273
 
274
- If you use GT-REX-v4 in your research or production systems, please cite:
275
 
276
- ```bibtex
277
- @misc{gtrex-v4-2026,
278
- title={GT-REX-v4: Production OCR Model for Enterprise Document Understanding},
279
- author={Jenis Hathaliya},
280
- year={2026},
281
- publisher={GothiTech},
282
- url={https://huggingface.co/developerJenis/GT-REX-v4},
283
- note={Production-grade vision-language model for OCR and document AI}
284
- }
285
  ```
286
 
287
  ---
288
 
289
  *Last updated: February 2026*
 
 
19
 
20
  **GT-REX-v4** is a state-of-the-art production-grade OCR model developed by GothiTech for enterprise document understanding, text extraction, and intelligent document processing.
21
 
22
+ ## ⚙️ GT-REX Variants
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ GT-REX-v4 supports **three optimized configurations** for different performance requirements:
 
 
 
 
 
 
 
25
 
26
+ | Variant | Speed | Accuracy | Resolution | GPU Memory | Throughput | Best For |
27
+ |---------|-------|----------|------------|------------|------------|----------|
28
+ | **🚀 Nano** | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | 640px | 4-6 GB | 100-150 docs/min | High-volume batch |
29
+ | **⚡ Pro** | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | 1024px | 6-10 GB | 50-80 docs/min | Standard workflows |
30
+ | **🎯 Ultra** | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | 1536px | 10-15 GB | 20-30 docs/min | High-accuracy needs |
31
 
32
+ ### 🚀 GT-Rex-Nano
33
+ **Speed-optimized for high-volume batch processing**
34
 
35
+ - **Resolution**: 640×640px
36
+ - **Speed**: ~1-2s per image
37
+ - **Max Tokens**: 2048
38
+ - **Best for**: Thumbnails, previews, high-throughput pipelines (100+ docs)
39
 
40
  ```python
 
 
 
 
 
41
  llm = LLM(
42
  model='developerJenis/GT-REX-v4',
43
  trust_remote_code=True,
44
+ max_model_len=2048,
45
+ gpu_memory_utilization=0.6,
46
+ max_num_seqs=256,
47
  logits_processors=[NGramPerReqLogitsProcessor],
48
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ```
50
 
51
+ ### GT-Rex-Pro (Default)
52
+ **Balanced quality and speed for standard documents**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ - **Resolution**: 1024×1024px
55
+ - **Speed**: ~2-5s per image
56
+ - **Max Tokens**: 4096
57
+ - **Best for**: Contracts, forms, invoices, reports
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ```python
60
  llm = LLM(
61
  model='developerJenis/GT-REX-v4',
62
  trust_remote_code=True,
63
+ max_model_len=4096,
64
+ gpu_memory_utilization=0.75,
65
  max_num_seqs=128,
66
+ logits_processors=[NGramPerReqLogitsProcessor],
 
67
  )
68
  ```
69
 
70
+ ### 🎯 GT-Rex-Ultra
71
+ **Maximum quality with adaptive processing**
72
+
73
+ - **Resolution**: 1536×1536px
74
+ - **Speed**: ~5-10s per image
75
+ - **Max Tokens**: 8192
76
+ - **Best for**: Legal documents, fine print, dense tables, medical records
77
+
78
  ```python
79
  llm = LLM(
80
  model='developerJenis/GT-REX-v4',
81
  trust_remote_code=True,
82
+ max_model_len=8192,
83
+ gpu_memory_utilization=0.85,
84
+ max_num_seqs=64,
85
+ logits_processors=[NGramPerReqLogitsProcessor],
86
  )
87
  ```
88
 
89
+ ## 🎯 Key Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
+ - **High Accuracy**: Advanced vision-language architecture for precise text extraction
92
+ - **Multi-Language Support**: Handles documents in multiple languages
93
+ - **Production Ready**: Optimized for deployment with vLLM inference engine
94
+ - **Batch Processing**: Process hundreds of documents per minute
95
+ - **Flexible Prompts**: Support for structured extraction (JSON, tables, forms)
96
+ - **Handwriting Support**: Capable of transcribing handwritten text
97
+ - **Three Optimized Variants**: Nano, Pro, and Ultra for different use cases
98
 
99
+ ## 📊 Model Details
100
 
101
+ | Attribute | Value |
102
+ |-----------|-------|
103
+ | **Developer** | GothiTech (Jenis Hathaliya) |
104
+ | **Architecture** | Vision-Language Model (VLM) |
105
+ | **Model Size** | ~6.5 GB |
106
+ | **Parameters** | ~7B |
107
+ | **License** | MIT |
108
+ | **Release Date** | February 2026 |
109
+ | **Precision** | BF16/FP16 |
110
+ | **Input Resolution** | 640px - 1536px (variant dependent) |
111
 
112
+ ## 🚀 Use Cases
113
 
114
+ ## 💻 Installation
115
 
116
+ ```bash
117
+ pip install vllm pillow torch transformers
 
 
 
 
 
 
 
118
  ```
119
 
120
  ---
121
 
122
  *Last updated: February 2026*
123
+ *Model Version: v4.0 | Variants: Nano | Pro | Ultra*