Instructions to use GorankLabs/Ranker-Gemma-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GorankLabs/Ranker-Gemma-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="GorankLabs/Ranker-Gemma-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("GorankLabs/Ranker-Gemma-4B")
model = AutoModelForMultimodalLM.from_pretrained("GorankLabs/Ranker-Gemma-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use GorankLabs/Ranker-Gemma-4B with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GorankLabs/Ranker-Gemma-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GorankLabs/Ranker-Gemma-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GorankLabs/Ranker-Gemma-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/GorankLabs/Ranker-Gemma-4B

SGLang

How to use GorankLabs/Ranker-Gemma-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GorankLabs/Ranker-Gemma-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GorankLabs/Ranker-Gemma-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GorankLabs/Ranker-Gemma-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GorankLabs/Ranker-Gemma-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use GorankLabs/Ranker-Gemma-4B with Docker Model Runner:
```
docker model run hf.co/GorankLabs/Ranker-Gemma-4B
```

Ranker-Gemma-4B

Ranker-Gemma-4B is a LoRA fine-tune built on top of google/gemma-3-4b-it, designed for fast search-style answering, grounded summarization, and concise web-assisted responses.

The model is tuned to behave like a lightweight answer engine: it prioritizes speed, directness, readable structure, and strong grounding behavior when search results or retrieved passages are included in the prompt. Instead of drifting into generic assistant chatter, it is shaped to answer first, synthesize quickly, and stay close to the evidence it is given.

Benchmark

What It Does

This model is intended for search-centric use cases such as:

fast question answering over retrieved web results
concise evidence-grounded summaries
direct comparison answers with citations
retrieval-augmented chat
lightweight ranking-oriented answer synthesis

It is especially useful when your application already has a retrieval layer and needs a smaller model that can turn search snippets, passages, and source blocks into a clean final answer.

Model Overview

Base model: google/gemma-3-4b-it
Fine-tune type: LoRA
Task style: grounded answer generation
Package name: GorankLabs/Ranker-Gemma-4B
Focus: fast search-engine style response generation

The adapter targets key attention and MLP projection layers to steer the base instruct model toward sharper retrieval use, tighter answer formatting, and more disciplined grounded outputs.

Intended Behavior

GorankLabs/Ranker-Gemma-4B is tuned to:

lead with the answer instead of a long preamble
keep outputs compact, readable, and information-dense
synthesize multiple sources into one coherent response
distinguish evidence from inference
mention uncertainty when retrieval is weak or incomplete
use source ids like [1] when inline search results are provided
avoid pretending it searched beyond the context it actually received

When no search evidence is provided, the model still behaves like a concise instruct assistant. When search evidence is present, it shifts into grounded answer mode.

Built-In Search-Aware Prompting

This repository includes a packaged prompt setup that makes the model search-aware at inference time.

Relevant files:

chat_template.jinja
tokenizer_config.json
search_config.json
citation_schema.json
generation_config.json

The chat template includes instructions for handling inline retrieved evidence. If your inference layer injects search results directly into the prompt, the model is guided to treat them as sources and produce a more search-engine-like answer.

Recommended inline format:

SEARCH RESULTS

[1] Example title
URL: https://example.com
Published: 2026-04-08
Snippet: Important fact here.
- Supporting passage here.

[2] Another source
URL: https://example.org
Snippet: Another relevant fact.

With this format, the model is expected to:

answer from the supplied evidence first
cite source ids where appropriate
mention concrete dates for time-sensitive topics
state when the available evidence is weak, conflicting, or insufficient

Use Cases

search answer engines
retrieval-augmented generation pipelines
document-grounded assistants
web result synthesis
ranking plus summarization workflows
compact research copilots

Limitations

This repository does not include a live web search engine, browser, crawler, or reranker by itself. The model is search-aware, not search-capable on its own. It performs best when another system retrieves the documents, snippets, or web results and passes them into the prompt.

Like other small models, it may still:

over-compress nuanced topics
inherit errors from bad retrieval
struggle when sources are sparse or contradictory
need careful prompt formatting for best citation behavior

Recommended Deployment Pattern

For best results, use the model inside a retrieval pipeline:

Run search or document retrieval upstream.
Select the highest-quality passages.
Inject them into the prompt using a clear source format.
Let the model synthesize the final grounded answer.

This setup works well for products that want search-engine style output without the cost and latency of a much larger model.

Training Notes

This is a LoRA adapter trained on top of Gemma 3 4B Instruct. The goal of the fine-tune is to improve:

answer directness
grounding discipline
search-result usage
compact explanatory style
source-aware response formatting

License

This adapter is built on top of google/gemma-3-4b-it. Use is subject to the applicable Gemma license terms and upstream access conditions.

Downloads last month: 9

Safetensors

Model size

32.8M params

Tensor type

BF16

Model tree for GorankLabs/Ranker-Gemma-4B

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Finetuned

(731)

this model

Collection including GorankLabs/Ranker-Gemma-4B

Ranker Gemma

Collection

1 item • Updated Apr 8