File size: 5,586 Bytes
b50ecc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
pipeline_tag: image-text-to-text
tags:
- visual-document-understanding
- visual-question-answering
- indian-documents
license: apache-2.0
language:
- en
library_name: transformers
base_model:
- bharatgenai/patram-7b-instruct
---

# Patram-7B-Instruct

Patram-7B-Instruct by BharatGen is a 7B parameter vision-language model trained from scratch for visual document understanding. As India’s first document foundation model, it is built to tackle complex document analysis.
The model was trained on a carefully curated instruction-tuned dataset, combining diverse public and custom synthetic data designed to support a broad spectrum of document understanding tasks.

## Model Overview

* **Architecture:** Vision Transformer (ViT) + MLP projector + OLMo-7B LLM
* **Training Data:** BharatDocs-v1, a dataset of diverse Indian documents + Other Open Source Document Datasets
* **Supported I/O Formats:** The model currently accepts English-language instructions and image files (e.g., PNG, JPEG) as input. The output is provided in text format.
* **Language:** English (Indian language support upcoming)
* **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

## Usage Examples

Use the `transformers` library.

```python
import torch
from transformers import AutoProcessor, AutoModelForCausalLM, GenerationConfig
from PIL import Image
import requests

# Model ID and device setup
model_id = "bharatgenai/patram-7b-instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
).to(device)

def get_patram_response(image_path_or_url, question):
    try:
        # Load image
        if image_path_or_url.startswith("http"):
            image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert("RGB")
        else:
            image = Image.open(image_path_or_url).convert("RGB")
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

    # Format the prompt as expected
    prompt = f"Question: {question} Answer based on the image."

    try:
        # Preprocess image and text using the processor
        inputs = processor.process(images=[image], text=prompt)
        inputs = {k: v.to(device).unsqueeze(0) for k, v in inputs.items()}

        # Generate output using model's generate_from_batch method (Patram-specific)
        output = model.generate_from_batch(
            inputs,
            GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
            tokenizer=processor.tokenizer
        )

        # Extract generated tokens (excluding input tokens) and decode
        generated_tokens = output[0, inputs['input_ids'].size(1):]
        response = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()
        return response
    except Exception as e:
        print(f"Error during inference: {e}")
        return None

# Example usage:
# image_input = "https://knowscope.in/wp-content/uploads/2025/05/cghd-nag.png"
# question = "Who issued this notice?"
# answer = get_patram_response(image_input, question)
# if answer:
#     print("Answer:", answer)
```

**Note**: If you're trying this on an Apple Silicon (M1/M2/M3/M4/...) chip, please follow the official documentation by PyTorch and Hugging Face for installing dependencies:

- [PyTorch's official guide on installation (macOS)](https://pytorch.org/get-started/locally/#:~:text=torch%20torchvision%20torchaudio-,Installing%20on%20macOS,-PyTorch%20can%20be)
- [Hugging Face Transformers performance tips](https://huggingface.co/docs/transformers/main/en/perf_train_special)
     
  
## Evaluations

We evaluated Patram-7B-Instruct alongside other vision-language models (VLMs) in the 7B–9B parameter range across multiple public document benchmarks.

**Benchmarks**: DocVQA, VisualMRC, Patram-Bench

Patram-Bench is an in-house benchmark designed for Indic Document VQA.

**Metric**: G-Eval (LLM-as-a-judge)

| Model                  | Overall | DocVQA | Patram-Bench | VisualMRC |
| ---------------------- | ------- | ------ | ------------ | --------- |
| claude-3.7-sonnet      | 0.8830  | 0.8480 | 0.8857       | 0.8830    |
| Qwen2.5-VL-7B-Instruct | 0.8759  | 0.8722 | 0.6816       | 0.9169    |
| gemma-3-12b-it         | 0.8556  | 0.8451 | 0.6349       | 0.9069    |
| **patram-7b-instruct** | 0.8331  | 0.8550 | 0.6515       | 0.8510    |
| InternVL3-9B           | 0.7865  | 0.8681 | 0.6888       | 0.7405    |
| deepseek-vl2           | 0.7581  | 0.8739 | 0.5089       | 0.7144    |

*Note: The benchmarked results reflect the API variant.

## Citation

```bibtex
@online{BharatGenPatramLaunch2025,
  author    = {{BharatGen Team}},
  title     = {BharatGen Unveils Patram: India's Pioneering Vision-Language Foundation Model for Document Intelligence},
  year      = {2025},
  url       = {https://bharatgen.com/blog/patram-launch},
  urldate   = {2025-06-02}
}
```

## Resources

* **Model**: [huggingface.co/bharatgenai/patram-7b-instruct](https://huggingface.co/bharatgenai/patram-7b-instruct)
* **Project Page**: [bharatgen.com/patram](https://bharatgen.com/patram)
* **Blog**: [bharatgen.com/blog/patram-launch](https://bharatgen.com/blog/patram-launch)

## Authors

* **Principal Investigators**: Prof. Ravi Kiran Sarvadevabhatla, Prof. Ganesh Ramakrishnan
* **Contributors**: BharatGen Team

## Contact

* [Contact Form](https://bharatgen.com/contact)
* Hugging Face Community Tab