CIx-Gemma-3-270M Reasoning SFT

Model Summary

This model is a fine-tuned derivative of google/gemma-3-270m, adapted using the Convergent Intelligence sparse fine-tuning setup originally tested on Liquid Foundation Models.

The checkpoint was trained on reasoning-style English examples from angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k using a targeted adaptation strategy and the custom CIxOpt optimizer framework.

The goal of this model is to test whether a compact Gemma 3 270M backbone can be shaped toward reasoning-style text generation through selective parameter participation rather than broad full-model modification.

This is an experimental research checkpoint intended for evaluation, local testing, optimizer research, and continued fine-tuning.

Base Model

  • Base model: google/gemma-3-270m
  • Model family: Gemma 3
  • Approximate size: 270M parameters
  • Task: Causal language modeling / text generation
  • Language: English-focused fine-tuning
  • Library: Hugging Face Transformers
  • License: Gemma license

If this checkpoint was instead trained from google/gemma-3-270m-it, update the base_model field accordingly.

Dataset

Fine-tuning data:

  • angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

The dataset was processed into text-generation / chat-style training examples. Empty, malformed, or unusable samples were filtered before tokenization.

Training used causal language modeling labels with padding masked using -100.

Training Method

This model was trained using the same CIx sparse-adaptation setup used for LFM experiments.

The training approach emphasized:

text preserve the compact pretrained backbone adapt selected reasoning and response-shaping surfaces avoid unnecessary full-model disturbance use heterogeneous optimizer routing by parameter type

CIxOpt Optimizer

Training used CIxOpt, a custom heterogeneous optimizer designed for architecture-aware routing.

CIxOpt supports:

  • AdamW-style adaptive updates
  • Lion-style sign momentum
  • AdaMax-compatible routing
  • Optional ASGD-style averaging
  • Optional low-rank projected momentum
  • Gradient centralization
  • Decoupled weight decay
  • Discrepancy-aware caution filtering for sign updates
  • fp32 optimizer state for bf16/fp16 safety
  • Parameter-name-aware routing

The intended optimizer behavior is:

text large projection matrices -> Lion-style sign momentum normalization / sensitive params -> AdamW-style updates embedding / lm-head surfaces -> conservative adaptive routing

This makes the checkpoint useful for testing whether small models can be efficiently adapted with custom optimizer routing rather than full uniform AdamW updates.

Sparse Fine-Tuning Strategy

The setup used sparse parameter participation rather than unrestricted full-model training.

The intended adaptation pattern was:

text freeze or reduce movement in lower representational structure train selected higher-level adaptation surfaces preserve base language structure where possible shape reasoning and response behavior through targeted updates

This checkpoint should be treated as an experimental adaptation artifact, not a fully benchmarked general-purpose assistant.

Intended Use

This model is intended for:

  • Research on compact Gemma fine-tuning
  • CIxOpt optimizer experiments
  • Small-model reasoning-style generation
  • Local text-generation experiments
  • Instruction-following and response-style studies
  • Efficient adaptation research
  • Continued fine-tuning and ablation testing
  • Comparison against the base google/gemma-3-270m

Potential use cases:

  • Technical explanation
  • Lightweight reasoning experiments
  • Prompt-response generation
  • Local prototyping
  • Small agent backbone testing
  • Educational model behavior analysis

Out-of-Scope Use

This model is not intended for high-stakes autonomous deployment.

Do not use this model as the sole decision-maker for:

  • Medical diagnosis
  • Legal judgment
  • Financial decisions
  • Emergency response
  • Cyber offensive automation
  • Personnel screening
  • Surveillance or targeting decisions
  • Critical infrastructure decisions
  • Any setting requiring verified factual accuracy

Limitations

This is an experimental fine-tuned checkpoint. Expected limitations include:

  • May hallucinate facts, dates, citations, or technical details
  • May inherit limitations from the Gemma 3 270M base model
  • May overproduce reasoning-style outputs
  • May be sensitive to prompt format
  • May repeat or drift during longer generations
  • Has not been fully evaluated for factuality, safety, math, coding, or instruction-following
  • Fine-tuning on reasoning-style data does not guarantee correct reasoning
  • Sparse adaptation may change some behaviors unevenly while leaving others close to the base model
  • Small model size limits world knowledge, reasoning depth, and robustness

Safety Notes

Users should independently validate important outputs.

Before deployment, additional evaluation is recommended:

  • Hallucination testing
  • Bias and toxicity evaluation
  • Refusal behavior testing
  • Prompt-injection sensitivity testing
  • Side-by-side comparison against the base model
  • Domain-specific factuality testing
  • Human review of outputs
  • Guardrails for public-facing applications

Example Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

prompt = "Explain why small language models are useful for edge reasoning experiments."

inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Chat-Style Usage

If the tokenizer provides a chat template:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": "Why is sparse fine-tuning useful for compact language models?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=384,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

generated = output[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Suggested Generation Settings

Balanced exploratory generation:

generation_config = {
    "max_new_tokens": 384,
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "repetition_penalty": 1.05,
}

More deterministic generation:

generation_config = {
    "max_new_tokens": 384,
    "do_sample": False,
}

For smaller models, shorter outputs are often more stable:

generation_config = {
    "max_new_tokens": 128,
    "do_sample": True,
    "temperature": 0.6,
    "top_p": 0.9,
    "repetition_penalty": 1.1,
}

Training Configuration

Approximate training configuration:

text base_model: google/gemma-3-270m
dataset: angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
task: causal language modeling / reasoning-style SFT
optimizer: CIxOpt state_dtype: fp32
optimizer state model_dtype: bf16 where supported

Evaluation

Formal benchmark results have not yet been added.

Recommended evaluations:

  • Held-out perplexity
  • Base model comparison against google/gemma-3-270m
  • Short-form reasoning checks
  • IFEval-style instruction-following tests
  • Repetition and degeneration testing
  • Human preference review
  • Truthfulness / hallucination checks
  • Prompt-format robustness testing
  • CIxOpt vs AdamW ablation

Responsible Use

This model may generate plausible but incorrect text. It should be used with human oversight.

Developers should follow the Gemma usage terms and apply appropriate safety review before deploying the model in user-facing or operational settings.

Citation

Base model:

bibtex @misc{google_gemma_3_270m,
title = {Gemma 3 270M},
author = {Google DeepMind},
publisher = {Hugging Face},
year = {2025}
} 

Fine-tuning dataset:

bibtex @misc{angrygiraffe_reasoning_dataset,
title = {claude-opus-4.6-4.7-reasoning-8.7k},
author = {angrygiraffe},
publisher = {Hugging Face}
} 

Author / Maintainer

Fine-tuning and optimizer experimentation by:

Convergent Intelligence LLC

Research focus: AI systems, intelligence analysis, mathematical frameworks, optimizer design, and efficient model adaptation.

Disclaimer

This model is provided for research and experimentation. It should not be treated as a verified expert system. Outputs require human review, especially in factual, technical, legal, medical, financial, operational, or safety-critical contexts.

Downloads last month
68
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for reaperdoesntknow/Gemma-3-270m-Opus-Distil

Finetuned
(143)
this model

Dataset used to train reaperdoesntknow/Gemma-3-270m-Opus-Distil

Collection including reaperdoesntknow/Gemma-3-270m-Opus-Distil