You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Internal Model Release: Gemma 3 4B Dolci SFT Instruct Alignment-Free (LoRA Merged)

Summary

This model is Gemma 3 4B fine-tuned on Dolci SFT Instruct data (derived from OLMo 3 SFT data), using an alignment-filtered variant intended to remove low-quality alignment data. The release is a merged full-weight model produced from a LoRA adapter.

Tool-calling is an important part of the training mix, but it is not the primary objective; the primary objective is broad instruction tuning on the Dolci SFT Instruct alignment-free corpus.

Data and Curation

  • Data lineage: Dolci SFT Instruct derived from OLMo 3 SFT data.
  • Curation goal: alignment-filtered (alignment-free) training subset.
  • Effective training samples: 51,476.
  • Preprocessing format: chat_tool_calls_v5_hermes_with_im_end.
  • Chat style: ChatML with Hermes-style tool-call representation.
  • Overlength policy: trim oldest turns.
  • Samples with unparsed tool calls were dropped.

Tool-Calling Formatting

Tool interactions are represented with XML markers in assistant/tool turns:

  • <tool_call> ... </tool_call>
  • <tool_response> ... </tool_response>

Special token behavior validated after merge:

  • <|im_start|> -> 105
  • <|im_end|> -> 106
  • <tool_call> -> 8
  • </tool_call> -> 9

EOS/PAD behavior used in training:

  • eos_token = <|im_end|>
  • pad_token = <|im_end|>

Training Setup

  • Base model: google/gemma-3-4b-pt.
  • Distributed setup: 4 nodes x 8 GPUs (32 GPUs total).
  • Precision: bf16.
  • Sequence length: 32,768.
  • Epochs: 1.0.
  • Per-device batch size: 2.
  • Gradient accumulation: 1.
  • Learning rate: 3e-4.
  • Checkpoint interval: 500 steps.
  • Gradient checkpointing enabled.
  • Liger kernel enabled.

LoRA Configuration

  • Method: PEFT LoRA (peft 0.18.1).
  • Rank: r = 64.
  • Alpha: 32.
  • Dropout: 0.05.
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.
  • Excluded modules: vision tower modules.

Training Outcomes

  • Train loss: 0.7651.
  • Train runtime: 65,354s.
  • Train samples/sec: 0.788.
  • Train steps/sec: 0.012.
  • Total FLOPs: 3.79e19.
  • Approximate parameter count observed in run telemetry: 4,419,128,176.

Merge Details

  • Adapter was merged into base weights to produce this full model.
  • Output dtype: bf16.
  • Merge executed on CPU (safe_serialization enabled).
  • Sharded save with 5GB shard target.
  • Merge-time stack:
    • torch 2.9.1+rocm6.4
    • transformers 4.57.3
    • peft 0.18.1

Compatibility adjustment applied during merge:

  • Tokenizer config was sanitized for current Gemma fast-tokenizer loading behavior (an extra_special_tokens list field was removed).

License and Terms

Use remains subject to the upstream google/gemma-3-4b-pt license and terms.

Downloads last month
59
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for synquid/gemma-3-4b-dolci-sft

Finetuned
(280)
this model