Image-Text-to-Text
Transformers
Safetensors
Bengali
gemma4
headline-generation
bangla
bengali
news
vlm
lora
conversational
Instructions to use dipta007/BartaLens-E2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dipta007/BartaLens-E2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="dipta007/BartaLens-E2B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("dipta007/BartaLens-E2B") model = AutoModelForImageTextToText.from_pretrained("dipta007/BartaLens-E2B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use dipta007/BartaLens-E2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dipta007/BartaLens-E2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/BartaLens-E2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/dipta007/BartaLens-E2B
- SGLang
How to use dipta007/BartaLens-E2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dipta007/BartaLens-E2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/BartaLens-E2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dipta007/BartaLens-E2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/BartaLens-E2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use dipta007/BartaLens-E2B with Docker Model Runner:
docker model run hf.co/dipta007/BartaLens-E2B
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
license: gemma
|
| 4 |
+
base_model: unsloth/gemma-4-E2B-it
|
| 5 |
+
pipeline_tag: image-text-to-text
|
| 6 |
+
language:
|
| 7 |
+
- bn
|
| 8 |
+
tags:
|
| 9 |
+
- headline-generation
|
| 10 |
+
- bangla
|
| 11 |
+
- bengali
|
| 12 |
+
- news
|
| 13 |
+
- vlm
|
| 14 |
+
- lora
|
| 15 |
+
- gemma4
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# BartaLens-E2B
|
| 19 |
+
|
| 20 |
+
<p align="center">
|
| 21 |
+
<a href="https://arxiv.org/abs/0000.00000">
|
| 22 |
+
<img src="https://img.shields.io/badge/%F0%9F%93%84_Paper-Coming_Soon-b12a00?style=for-the-badge&labelColor=ffb300" alt="Paper Coming Soon">
|
| 23 |
+
</a>
|
| 24 |
+
</p>
|
| 25 |
+
|
| 26 |
+
[](https://arxiv.org/abs/0000.00000)
|
| 27 |
+
[](https://huggingface.co/datasets/dipta007/shironam-pro-max)
|
| 28 |
+
[](https://huggingface.co/dipta007/BartaLens-E2B)
|
| 29 |
+
|
| 30 |
+
**BartaLens-E2B** is a Bangla multimodal headline generation model fine-tuned from `gemma-4-E2B-it`. Given a Bengali news article (and optionally an accompanying image), it generates a concise, accurate newspaper-style headline. Trained with **LoRA** on the `dipta007/shironam-pro-max` dataset with 50% image supervision — the model is robust to both image-present and text-only inputs.
|
| 31 |
+
|
| 32 |
+
## Highlights
|
| 33 |
+
|
| 34 |
+
- **ROUGE-1: 0.3851 | ROUGE-L: 0.3551 | BLEU-4: 10.43 | BERTScore: 0.8969** on own test split (with images)
|
| 35 |
+
- **ROUGE-1: 0.3840 | ROUGE-L: 0.3553 | BLEU-4: 11.02 | BERTScore: 0.8970** on Shironam test split (text-only)
|
| 36 |
+
- Outperforms zero-shot Gemma4-E2B, Qwen3.5-4B, Ministral-3 3B, and prior Bangla headline systems
|
| 37 |
+
- Robust to missing images: trained with 50% text-only supervision so performance doesn't degrade without images
|
| 38 |
+
|
| 39 |
+
## Model Overview
|
| 40 |
+
|
| 41 |
+
| Property | Value |
|
| 42 |
+
|----------|-------|
|
| 43 |
+
| **Model Type** | Vision-Language Model (Causal LM + Vision Encoder) |
|
| 44 |
+
| **Base Model** | unsloth/gemma-4-E2B-it |
|
| 45 |
+
| **Training** | SFT + LoRA (r=32, alpha=32) |
|
| 46 |
+
| **LoRA Targets** | all-linear (vision + language + attention + MLP) |
|
| 47 |
+
| **Max Sequence Length** | 4,096 tokens |
|
| 48 |
+
| **Language** | Bengali (বাংলা) |
|
| 49 |
+
| **Image Supervision** | 50% (model sees image for half of training samples) |
|
| 50 |
+
| **Effective Batch Size** | 64 (8 per device x 8 gradient accumulation) |
|
| 51 |
+
|
| 52 |
+
## Quickstart
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
from unsloth import FastVisionModel, get_chat_template
|
| 56 |
+
import torch
|
| 57 |
+
from PIL import Image
|
| 58 |
+
|
| 59 |
+
model_name = "dipta007/BartaLens-E2B"
|
| 60 |
+
|
| 61 |
+
model, processor = FastVisionModel.from_pretrained(
|
| 62 |
+
model_name,
|
| 63 |
+
max_seq_length=4096,
|
| 64 |
+
load_in_4bit=False,
|
| 65 |
+
dtype=torch.bfloat16,
|
| 66 |
+
)
|
| 67 |
+
processor = get_chat_template(processor, "gemma-4")
|
| 68 |
+
FastVisionModel.for_inference(model)
|
| 69 |
+
|
| 70 |
+
INSTRUCTION = (
|
| 71 |
+
"আপনি একজন অভিজ্ঞ বাংলা সংবাদ সম্পাদক। নিচের সংবাদ নিবন্ধটির জন্য একটি উপযুক্ত শিরোনাম তৈরি করুন।\n"
|
| 72 |
+
"\n"
|
| 73 |
+
"নিয়মাবলী:\n"
|
| 74 |
+
"- নিবন্ধের মূল ঘটনা ও তথ্য সঠিকভাবে প্রকাশ করুন; কাল্পনিক তথ্য যোগ করবেন না।\n"
|
| 75 |
+
"- সংবাদপত্রের সাধারণ শিরোনামের শৈলীতে, সংক্ষিপ্ত ও আকর্ষণীয়ভাবে লিখুন।\n"
|
| 76 |
+
"- উদ্ধৃতি চিহ্ন, মার্কডাউন, ইমোজি, তালিকা চিহ্ন (*, -), অথবা নম্বর (১., 1.) ব্যবহার করবেন না।\n"
|
| 77 |
+
"- কোনো ভূমিকা, ব্যাখ্যা, একাধিক বিকল্প বা অতিরিক্ত মন্তব্য যোগ করবেন না।\n"
|
| 78 |
+
"- শুধু শিরোনামটি একটি লাইনে লিখুন, অন্য কিছু নয়।\n"
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def generate_headline(article: str, image: Image.Image | None = None):
|
| 83 |
+
"""Generate a Bengali headline for a news article."""
|
| 84 |
+
user_text = f"{INSTRUCTION}\nনিবন্ধ:\n{article}\n\nশিরোনাম:"
|
| 85 |
+
|
| 86 |
+
if image is None:
|
| 87 |
+
image = Image.new("RGB", (224, 224), color="black")
|
| 88 |
+
|
| 89 |
+
messages = [
|
| 90 |
+
{
|
| 91 |
+
"role": "user",
|
| 92 |
+
"content": [
|
| 93 |
+
{"type": "image"},
|
| 94 |
+
{"type": "text", "text": user_text},
|
| 95 |
+
],
|
| 96 |
+
}
|
| 97 |
+
]
|
| 98 |
+
|
| 99 |
+
input_text = processor.apply_chat_template(
|
| 100 |
+
messages, add_generation_prompt=True
|
| 101 |
+
)
|
| 102 |
+
inputs = processor(
|
| 103 |
+
images=[[image]],
|
| 104 |
+
text=[input_text],
|
| 105 |
+
add_special_tokens=False,
|
| 106 |
+
return_tensors="pt",
|
| 107 |
+
).to("cuda")
|
| 108 |
+
|
| 109 |
+
with torch.no_grad():
|
| 110 |
+
out = model.generate(
|
| 111 |
+
**inputs,
|
| 112 |
+
max_new_tokens=64,
|
| 113 |
+
use_cache=True,
|
| 114 |
+
do_sample=False,
|
| 115 |
+
)
|
| 116 |
+
headline = processor.tokenizer.decode(
|
| 117 |
+
out[0][inputs["input_ids"].shape[1]:],
|
| 118 |
+
skip_special_tokens=True,
|
| 119 |
+
).strip()
|
| 120 |
+
return headline
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
# Usage: text-only
|
| 124 |
+
article = """বাংলাদেশ জাতীয় ক্রিকেট দলের অধিনায়ক নাজমুল হোসেন শান্ত আজ
|
| 125 |
+
সংবাদ সম্মেলনে জানিয়েছেন, দল আগামী টেস্ট সিরিজের জন্য পুরোপুরি প্রস্তুত।
|
| 126 |
+
তিনি বলেন, তরুণ খেলোয়াড়দের পারফরম্যান্স দলের শক্তি বাড়িয়েছে।"""
|
| 127 |
+
|
| 128 |
+
headline = generate_headline(article)
|
| 129 |
+
print(headline)
|
| 130 |
+
|
| 131 |
+
# Usage: with image
|
| 132 |
+
# image = Image.open("news_image.jpg")
|
| 133 |
+
# headline = generate_headline(article, image=image)
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
## Performance
|
| 137 |
+
|
| 138 |
+
### Comparison with zero-shot baselines (Own eval, n=15,000)
|
| 139 |
+
|
| 140 |
+
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | BERTScore | METEOR |
|
| 141 |
+
|-------|---------|---------|---------|--------|-----------|--------|
|
| 142 |
+
| Qwen3.5-0.8B | 0.1546 | 0.0541 | 0.1429 | 2.14 | 0.8373 | 0.1002 |
|
| 143 |
+
| Qwen3.5-2B | 0.2029 | 0.0613 | 0.1821 | 1.83 | 0.8498 | 0.1191 |
|
| 144 |
+
| Ministral-3 3B | 0.2892 | 0.0903 | 0.2445 | 2.48 | 0.8725 | 0.1625 |
|
| 145 |
+
| Qwen3.5-4B | 0.2984 | 0.1065 | 0.2618 | 3.98 | 0.8729 | 0.1924 |
|
| 146 |
+
| Gemma4-E2B (zero-shot) | 0.3484 | 0.1507 | 0.3127 | 6.37 | 0.8874 | 0.2531 |
|
| 147 |
+
| **BartaLens-E2B (ours)** | **0.3851** | **0.1807** | **0.3551** | **10.43** | **0.8969** | **0.2644** |
|
| 148 |
+
|
| 149 |
+
### Cross-dataset generalization (Shironam eval, text-only, n=15,012)
|
| 150 |
+
|
| 151 |
+
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | BERTScore | METEOR |
|
| 152 |
+
|-------|---------|---------|---------|--------|-----------|--------|
|
| 153 |
+
| Gemma4-E2B (zero-shot) | 0.3535 | 0.1526 | 0.3190 | 6.36 | 0.8883 | 0.2620 |
|
| 154 |
+
| **BartaLens-E2B (ours)** | **0.3840** | **0.1809** | **0.3553** | **11.02** | **0.8970** | **0.2590** |
|
| 155 |
+
|
| 156 |
+
## Training Details
|
| 157 |
+
|
| 158 |
+
- **Dataset**: [dipta007/shironam-pro-max](https://huggingface.co/datasets/dipta007/shironam-pro-max) (train split)
|
| 159 |
+
- **Supervision**: 50% of training samples include the news image; 50% use a black placeholder (text-only). This trains the model to be robust when no image is available.
|
| 160 |
+
- **Early stopping**: patience=3 on validation loss, eval every 200 steps
|
| 161 |
+
- **Optimizer**: AdamW, LR=5e-4, cosine schedule, warmup 5%
|
| 162 |
+
- **Hardware**: NVIDIA L40S (46 GB)
|
| 163 |
+
- **Metrics**: csebuetnlp multilingual ROUGE (Bengali stemmer), HF BLEU, BanglaBERT BERTScore, METEOR
|
| 164 |
+
|
| 165 |
+
## Intended Use
|
| 166 |
+
|
| 167 |
+
- **In-scope**: generating concise Bengali news headlines from article text (optionally with an image), headline suggestion for editors, summarization benchmarks.
|
| 168 |
+
- **Out-of-scope**: generating headlines in other languages, creative/clickbait headline generation, summarization of non-news content.
|
| 169 |
+
|
| 170 |
+
## Citation
|
| 171 |
+
|
| 172 |
+
```bibtex
|
| 173 |
+
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
## License
|
| 177 |
+
|
| 178 |
+
Released under the Gemma License.
|