Instructions to use dipta007/BartaLens-E2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dipta007/BartaLens-E2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="dipta007/BartaLens-E2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("dipta007/BartaLens-E2B")
model = AutoModelForImageTextToText.from_pretrained("dipta007/BartaLens-E2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dipta007/BartaLens-E2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dipta007/BartaLens-E2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/BartaLens-E2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/dipta007/BartaLens-E2B

SGLang

How to use dipta007/BartaLens-E2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dipta007/BartaLens-E2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/BartaLens-E2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dipta007/BartaLens-E2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/BartaLens-E2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use dipta007/BartaLens-E2B with Docker Model Runner:
```
docker model run hf.co/dipta007/BartaLens-E2B
```

BartaLens-E2B / README.md

dipta007

Upload README.md with huggingface_hub

39f329c verified 7 days ago

preview code

raw

history blame contribute delete

7.77 kB

	---
	library_name: transformers
	license: gemma
	base_model: unsloth/gemma-4-E2B-it
	pipeline_tag: image-text-to-text
	language:
	- bn
	tags:
	- headline-generation
	- bangla
	- bengali
	- news
	- vlm
	- lora
	- gemma4
	---

	# BartaLens-E2B

	<p align="center">
	<a href="https://arxiv.org/abs/0000.00000">
	<img src="https://img.shields.io/badge/%F0%9F%93%84_Paper-Coming_Soon-b12a00?style=for-the-badge&labelColor=ffb300" alt="Paper Coming Soon">
	</a>
	</p>

	[![Paper](https://img.shields.io/badge/arXiv-coming--soon-red)](https://arxiv.org/abs/0000.00000)
	[![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/dipta007/shironam-pro-max)
	[![Models](https://img.shields.io/badge/HuggingFace-Models-orange)](https://huggingface.co/dipta007/BartaLens-E2B)

	BartaLens-E2B is a Bangla multimodal headline generation model fine-tuned from `gemma-4-E2B-it`. Given a Bengali news article (and optionally an accompanying image), it generates a concise, accurate newspaper-style headline. Trained with LoRA on the `dipta007/shironam-pro-max` dataset with 50% image supervision — the model is robust to both image-present and text-only inputs.

	## Highlights

	- ROUGE-1: 0.3851 \| ROUGE-L: 0.3551 \| BLEU-4: 10.43 \| BERTScore: 0.8969 on own test split (with images)
	- ROUGE-1: 0.3840 \| ROUGE-L: 0.3553 \| BLEU-4: 11.02 \| BERTScore: 0.8970 on Shironam test split (text-only)
	- Outperforms zero-shot Gemma4-E2B, Qwen3.5-4B, Ministral-3 3B, and prior Bangla headline systems
	- Robust to missing images: trained with 50% text-only supervision so performance doesn't degrade without images

	## Model Overview

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Type \| Vision-Language Model (Causal LM + Vision Encoder) \|
	\| Base Model \| unsloth/gemma-4-E2B-it \|
	\| Training \| SFT + LoRA (r=32, alpha=32) \|
	\| LoRA Targets \| all-linear (vision + language + attention + MLP) \|
	\| Max Sequence Length \| 4,096 tokens \|
	\| Language \| Bengali (বাংলা) \|
	\| Image Supervision \| 50% (model sees image for half of training samples) \|
	\| Effective Batch Size \| 64 (8 per device x 8 gradient accumulation) \|

	## Quickstart

	```python
	from unsloth import FastVisionModel, get_chat_template
	import torch
	from PIL import Image

	model_name = "dipta007/BartaLens-E2B"

	model, processor = FastVisionModel.from_pretrained(
	model_name,
	max_seq_length=4096,
	load_in_4bit=False,
	dtype=torch.bfloat16,
	)
	processor = get_chat_template(processor, "gemma-4")
	FastVisionModel.for_inference(model)

	INSTRUCTION = (
	"আপনি একজন অভিজ্ঞ বাংলা সংবাদ সম্পাদক। নিচের সংবাদ নিবন্ধটির জন্য একটি উপযুক্ত শিরোনাম তৈরি করুন।\n"
	"\n"
	"নিয়মাবলী:\n"
	"- নিবন্ধের মূল ঘটনা ও তথ্য সঠিকভাবে প্রকাশ করুন; কাল্পনিক তথ্য যোগ করবেন না।\n"
	"- সংবাদপত্রের সাধারণ শিরোনামের শৈলীতে, সংক্ষিপ্ত ও আকর্ষণীয়ভাবে লিখুন।\n"
	"- উদ্ধৃতি চিহ্ন, মার্কডাউন, ইমোজি, তালিকা চিহ্ন (*, -), অথবা নম্বর (১., 1.) ব্যবহার করবেন না।\n"
	"- কোনো ভূমিকা, ব্যাখ্যা, একাধিক বিকল্প বা অতিরিক্ত মন্তব্য যোগ করবেন না।\n"
	"- শুধু শিরোনামটি একটি লাইনে লিখুন, অন্য কিছু নয়।\n"
	)


	def generate_headline(article: str, image: Image.Image \| None = None):
	"""Generate a Bengali headline for a news article."""
	user_text = f"{INSTRUCTION}\nনিবন্ধ:\n{article}\n\nশিরোনাম:"

	if image is None:
	image = Image.new("RGB", (224, 224), color="black")

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image"},
	{"type": "text", "text": user_text},
	],
	}
	]

	input_text = processor.apply_chat_template(
	messages, add_generation_prompt=True
	)
	inputs = processor(
	images=[[image]],
	text=[input_text],
	add_special_tokens=False,
	return_tensors="pt",
	).to("cuda")

	with torch.no_grad():
	out = model.generate(
	**inputs,
	max_new_tokens=64,
	use_cache=True,
	do_sample=False,
	)
	headline = processor.tokenizer.decode(
	out[0][inputs["input_ids"].shape[1]:],
	skip_special_tokens=True,
	).strip()
	return headline


	# Usage: text-only
	article = """বাংলাদেশ জাতীয় ক্রিকেট দলের অধিনায়ক নাজমুল হোসেন শান্ত আজ
	সংবাদ সম্মেলনে জানিয়েছেন, দল আগামী টেস্ট সিরিজের জন্য পুরোপুরি প্রস্তুত।
	তিনি বলেন, তরুণ খেলোয়াড়দের পারফরম্যান্স দলের শক্তি বাড়িয়েছে।"""

	headline = generate_headline(article)
	print(headline)

	# Usage: with image
	# image = Image.open("news_image.jpg")
	# headline = generate_headline(article, image=image)
	```

	## Performance

	### Comparison with zero-shot baselines (Own eval, n=15,000)

	\| Model \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| BLEU-4 \| BERTScore \| METEOR \|
	\|-------\|---------\|---------\|---------\|--------\|-----------\|--------\|
	\| Qwen3.5-0.8B \| 0.1546 \| 0.0541 \| 0.1429 \| 2.14 \| 0.8373 \| 0.1002 \|
	\| Qwen3.5-2B \| 0.2029 \| 0.0613 \| 0.1821 \| 1.83 \| 0.8498 \| 0.1191 \|
	\| Ministral-3 3B \| 0.2892 \| 0.0903 \| 0.2445 \| 2.48 \| 0.8725 \| 0.1625 \|
	\| Qwen3.5-4B \| 0.2984 \| 0.1065 \| 0.2618 \| 3.98 \| 0.8729 \| 0.1924 \|
	\| Gemma4-E2B (zero-shot) \| 0.3484 \| 0.1507 \| 0.3127 \| 6.37 \| 0.8874 \| 0.2531 \|
	\| BartaLens-E2B (ours) \| 0.3851 \| 0.1807 \| 0.3551 \| 10.43 \| 0.8969 \| 0.2644 \|

	### Cross-dataset generalization (Shironam eval, text-only, n=15,012)

	\| Model \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| BLEU-4 \| BERTScore \| METEOR \|
	\|-------\|---------\|---------\|---------\|--------\|-----------\|--------\|
	\| Gemma4-E2B (zero-shot) \| 0.3535 \| 0.1526 \| 0.3190 \| 6.36 \| 0.8883 \| 0.2620 \|
	\| BartaLens-E2B (ours) \| 0.3840 \| 0.1809 \| 0.3553 \| 11.02 \| 0.8970 \| 0.2590 \|

	## Training Details

	- Dataset: [dipta007/shironam-pro-max](https://huggingface.co/datasets/dipta007/shironam-pro-max) (train split)
	- Supervision: 50% of training samples include the news image; 50% use a black placeholder (text-only). This trains the model to be robust when no image is available.
	- Early stopping: patience=3 on validation loss, eval every 200 steps
	- Optimizer: AdamW, LR=5e-4, cosine schedule, warmup 5%
	- Hardware: NVIDIA L40S (46 GB)
	- Metrics: csebuetnlp multilingual ROUGE (Bengali stemmer), HF BLEU, BanglaBERT BERTScore, METEOR

	## Intended Use

	- In-scope: generating concise Bengali news headlines from article text (optionally with an image), headline suggestion for editors, summarization benchmarks.
	- Out-of-scope: generating headlines in other languages, creative/clickbait headline generation, summarization of non-news content.

	## Citation

	```bibtex

	```

	## License

	Released under the Gemma License.