Instructions to use ZySec-AI/gemma-3-4b-document-writer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ZySec-AI/gemma-3-4b-document-writer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ZySec-AI/gemma-3-4b-document-writer")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ZySec-AI/gemma-3-4b-document-writer")
model = AutoModelForImageTextToText.from_pretrained("ZySec-AI/gemma-3-4b-document-writer")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ZySec-AI/gemma-3-4b-document-writer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ZySec-AI/gemma-3-4b-document-writer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZySec-AI/gemma-3-4b-document-writer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ZySec-AI/gemma-3-4b-document-writer

SGLang

How to use ZySec-AI/gemma-3-4b-document-writer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ZySec-AI/gemma-3-4b-document-writer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZySec-AI/gemma-3-4b-document-writer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ZySec-AI/gemma-3-4b-document-writer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZySec-AI/gemma-3-4b-document-writer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ZySec-AI/gemma-3-4b-document-writer with Docker Model Runner:
```
docker model run hf.co/ZySec-AI/gemma-3-4b-document-writer
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

GEMMA Document Rewriter for RAG Pipeline

Overview

The GEMMA Document Rewriter for RAG Pipeline is a state-of-the-art text rewriting model built on top of the pre-trained Google Gemma 3 4B language model. This model has been fine-tuned using a LoRA (Low-Rank Adaptation) technique, with the adapter weights provided by ZySec-AI/gemma-3-4b-document-writer-lora. The primary goal of this model is to intelligently rewrite documents by eliminating unnecessary information, byte spaces, and redundant content. It extracts and emphasizes the information that is significant for Retrieval-Augmented Generation (RAG) pipelines, outputting a clean, structured version of the document in Markdown format with appropriate headings.

Key Features

Efficient Document Rewriting:
Extracts the essential content from lengthy documents, removing extraneous details and whitespace to create a more concise version ideal for RAG systems.
Markdown Output:
The model reformats content into Markdown, automatically generating headings and subheadings for improved readability and further processing.
Cost-Effective and Speed Optimized:
Built on top of a relatively small language model (Gemma 3 4B), this approach offers a cost-effective solution while delivering fast inference speeds suitable for production pipelines.
LoRA Fine-Tuning:
Utilizes LoRA adapter layers to efficiently fine-tune the base model, enabling rapid adaptation to the document rewriting task without the need for full-scale model retraining.
State-of-the-Art Performance:
Designed to integrate seamlessly into modern RAG pipelines, ensuring that only the most relevant and structured information is preserved and highlighted.

Intended Use Cases

This model is ideal for a range of document processing and natural language understanding tasks, including:

Document Summarization & Rewriting:
Simplify and restructure long documents or articles by extracting key information and presenting it in an organized, Markdown formatted style.
Data Preprocessing for RAG Pipelines:
Serve as a preprocessing step in retrieval-augmented generation systems by providing clean, condensed documents that enhance retrieval quality and downstream performance.
Content Cleanup & Standardization:
Remove noise such as extra whitespace, irrelevant bytes, and redundant verbiage, ensuring that documents conform to a standardized format before further processing.
Cost-Effective Deployment:
For organizations that require document rewriting capabilities without the overhead of large, resource-intensive models, this solution provides an excellent balance between performance and efficiency.

Model Architecture

The model is built on the Google Gemma 3 4B architecture, a transformer-based language model designed for high-speed inference. On top of this base model, LoRA adapter layers are applied to efficiently specialize the model for document rewriting. The adapter mechanism allows the model to learn task-specific modifications with only a fraction of the parameters updated, making the fine-tuning process both memory- and compute-efficient.

How It Works

Input Processing:
The model accepts input as a raw text string, which can be an entire document or a section of text. It first tokenizes the input and identifies areas with extraneous content such as byte spaces and redundant sentences.
Information Extraction:
Using its fine-tuned attention mechanisms, the model extracts content that is semantically important for the intended downstream RAG tasks. It evaluates context and relevance to determine which pieces of information should be retained.
Content Rewriting & Formatting:
The extracted information is then rewritten into a concise format. The model organizes the output into Markdown format, automatically adding appropriate headings and subheadings based on the structure and flow of the content.
Output Generation:
The final output is a clean, structured document that preserves key insights and removes unnecessary noise, ready for use in RAG pipelines or other downstream applications.

Usage

https://colab.research.google.com/drive/11yIG9FFp3cU5G5iUXxHjJrXEXH-7zOYw?usp=sharing

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

F16

Model tree for ZySec-AI/gemma-3-4b-document-writer

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Finetuned

(695)

this model

Merges

2 models

Quantizations

3 models

ZySec-AI
/

gemma-3-4b-document-writer