Instructions to use ZySec-AI/gemma-3-4b-document-writer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ZySec-AI/gemma-3-4b-document-writer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ZySec-AI/gemma-3-4b-document-writer") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ZySec-AI/gemma-3-4b-document-writer") model = AutoModelForImageTextToText.from_pretrained("ZySec-AI/gemma-3-4b-document-writer") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ZySec-AI/gemma-3-4b-document-writer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ZySec-AI/gemma-3-4b-document-writer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ZySec-AI/gemma-3-4b-document-writer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ZySec-AI/gemma-3-4b-document-writer
- SGLang
How to use ZySec-AI/gemma-3-4b-document-writer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ZySec-AI/gemma-3-4b-document-writer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ZySec-AI/gemma-3-4b-document-writer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ZySec-AI/gemma-3-4b-document-writer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ZySec-AI/gemma-3-4b-document-writer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ZySec-AI/gemma-3-4b-document-writer with Docker Model Runner:
docker model run hf.co/ZySec-AI/gemma-3-4b-document-writer
GEMMA Document Rewriter for RAG Pipeline
Overview
The GEMMA Document Rewriter for RAG Pipeline is a state-of-the-art text rewriting model built on top of the pre-trained Google Gemma 3 4B language model. This model has been fine-tuned using a LoRA (Low-Rank Adaptation) technique, with the adapter weights provided by ZySec-AI/gemma-3-4b-document-writer-lora. The primary goal of this model is to intelligently rewrite documents by eliminating unnecessary information, byte spaces, and redundant content. It extracts and emphasizes the information that is significant for Retrieval-Augmented Generation (RAG) pipelines, outputting a clean, structured version of the document in Markdown format with appropriate headings.
Key Features
Efficient Document Rewriting:
Extracts the essential content from lengthy documents, removing extraneous details and whitespace to create a more concise version ideal for RAG systems.Markdown Output:
The model reformats content into Markdown, automatically generating headings and subheadings for improved readability and further processing.Cost-Effective and Speed Optimized:
Built on top of a relatively small language model (Gemma 3 4B), this approach offers a cost-effective solution while delivering fast inference speeds suitable for production pipelines.LoRA Fine-Tuning:
Utilizes LoRA adapter layers to efficiently fine-tune the base model, enabling rapid adaptation to the document rewriting task without the need for full-scale model retraining.State-of-the-Art Performance:
Designed to integrate seamlessly into modern RAG pipelines, ensuring that only the most relevant and structured information is preserved and highlighted.
Intended Use Cases
This model is ideal for a range of document processing and natural language understanding tasks, including:
Document Summarization & Rewriting:
Simplify and restructure long documents or articles by extracting key information and presenting it in an organized, Markdown formatted style.Data Preprocessing for RAG Pipelines:
Serve as a preprocessing step in retrieval-augmented generation systems by providing clean, condensed documents that enhance retrieval quality and downstream performance.Content Cleanup & Standardization:
Remove noise such as extra whitespace, irrelevant bytes, and redundant verbiage, ensuring that documents conform to a standardized format before further processing.Cost-Effective Deployment:
For organizations that require document rewriting capabilities without the overhead of large, resource-intensive models, this solution provides an excellent balance between performance and efficiency.
Model Architecture
The model is built on the Google Gemma 3 4B architecture, a transformer-based language model designed for high-speed inference. On top of this base model, LoRA adapter layers are applied to efficiently specialize the model for document rewriting. The adapter mechanism allows the model to learn task-specific modifications with only a fraction of the parameters updated, making the fine-tuning process both memory- and compute-efficient.
How It Works
Input Processing:
The model accepts input as a raw text string, which can be an entire document or a section of text. It first tokenizes the input and identifies areas with extraneous content such as byte spaces and redundant sentences.Information Extraction:
Using its fine-tuned attention mechanisms, the model extracts content that is semantically important for the intended downstream RAG tasks. It evaluates context and relevance to determine which pieces of information should be retained.Content Rewriting & Formatting:
The extracted information is then rewritten into a concise format. The model organizes the output into Markdown format, automatically adding appropriate headings and subheadings based on the structure and flow of the content.Output Generation:
The final output is a clean, structured document that preserves key insights and removes unnecessary noise, ready for use in RAG pipelines or other downstream applications.
Usage
https://colab.research.google.com/drive/11yIG9FFp3cU5G5iUXxHjJrXEXH-7zOYw?usp=sharing
- Downloads last month
- 18
Model tree for ZySec-AI/gemma-3-4b-document-writer
Base model
google/gemma-3-4b-pt
docker model run hf.co/ZySec-AI/gemma-3-4b-document-writer