You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Internal Model Release: Gemma 3 4B Dolci SFT Instruct Alignment-Free (LoRA Merged)

Summary

This model is Gemma 3 4B fine-tuned on Dolci SFT Instruct data (derived from OLMo 3 SFT data), using an alignment-filtered variant intended to remove low-quality alignment data. The release is a merged full-weight model produced from a LoRA adapter.

Tool-calling is an important part of the training mix, but it is not the primary objective; the primary objective is broad instruction tuning on the Dolci SFT Instruct alignment-free corpus.

Data and Curation

Data lineage: Dolci SFT Instruct derived from OLMo 3 SFT data.
Curation goal: alignment-filtered (alignment-free) training subset.
Effective training samples: 51,476.
Preprocessing format: chat_tool_calls_v5_hermes_with_im_end.
Chat style: ChatML with Hermes-style tool-call representation.
Overlength policy: trim oldest turns.
Samples with unparsed tool calls were dropped.

Tool-Calling Formatting

Tool interactions are represented with XML markers in assistant/tool turns:

<tool_call> ... </tool_call>
<tool_response> ... </tool_response>

Special token behavior validated after merge:

<|im_start|> -> 105
<|im_end|> -> 106
<tool_call> -> 8
</tool_call> -> 9

EOS/PAD behavior used in training:

eos_token = <|im_end|>
pad_token = <|im_end|>

Training Setup

Base model: google/gemma-3-4b-pt.
Distributed setup: 4 nodes x 8 GPUs (32 GPUs total).
Precision: bf16.
Sequence length: 32,768.
Epochs: 1.0.
Per-device batch size: 2.
Gradient accumulation: 1.
Learning rate: 3e-4.
Checkpoint interval: 500 steps.
Gradient checkpointing enabled.
Liger kernel enabled.

LoRA Configuration

Method: PEFT LoRA (peft 0.18.1).
Rank: r = 64.
Alpha: 32.
Dropout: 0.05.
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.
Excluded modules: vision tower modules.

Training Outcomes

Train loss: 0.7651.
Train runtime: 65,354s.
Train samples/sec: 0.788.
Train steps/sec: 0.012.
Total FLOPs: 3.79e19.
Approximate parameter count observed in run telemetry: 4,419,128,176.

Merge Details

Adapter was merged into base weights to produce this full model.
Output dtype: bf16.
Merge executed on CPU (safe_serialization enabled).
Sharded save with 5GB shard target.
Merge-time stack:
- torch 2.9.1+rocm6.4
- transformers 4.57.3
- peft 0.18.1

Compatibility adjustment applied during merge:

Tokenizer config was sanitized for current Gemma fast-tokenizer loading behavior (an extra_special_tokens list field was removed).

License and Terms

Use remains subject to the upstream google/gemma-3-4b-pt license and terms.

Downloads last month: 59

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for synquid/gemma-3-4b-dolci-sft

Base model

google/gemma-3-4b-pt

Finetuned

(280)

this model