Instructions to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="reaperdoesntknow/Gemma-3-270m-Opus-Distil")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/Gemma-3-270m-Opus-Distil")
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/Gemma-3-270m-Opus-Distil", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "reaperdoesntknow/Gemma-3-270m-Opus-Distil"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/reaperdoesntknow/Gemma-3-270m-Opus-Distil

SGLang

How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "reaperdoesntknow/Gemma-3-270m-Opus-Distil" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "reaperdoesntknow/Gemma-3-270m-Opus-Distil" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with Docker Model Runner:
```
docker model run hf.co/reaperdoesntknow/Gemma-3-270m-Opus-Distil
```

CIx-Gemma-3-270M Reasoning SFT

Model Summary

This model is a fine-tuned derivative of google/gemma-3-270m, adapted using the Convergent Intelligence sparse fine-tuning setup originally tested on Liquid Foundation Models.

The checkpoint was trained on reasoning-style English examples from angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k using a targeted adaptation strategy and the custom CIxOpt optimizer framework.

The goal of this model is to test whether a compact Gemma 3 270M backbone can be shaped toward reasoning-style text generation through selective parameter participation rather than broad full-model modification.

This is an experimental research checkpoint intended for evaluation, local testing, optimizer research, and continued fine-tuning.

Base Model

Base model: google/gemma-3-270m
Model family: Gemma 3
Approximate size: 270M parameters
Task: Causal language modeling / text generation
Language: English-focused fine-tuning
Library: Hugging Face Transformers
License: Gemma license

If this checkpoint was instead trained from google/gemma-3-270m-it, update the base_model field accordingly.

Dataset

Fine-tuning data:

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

The dataset was processed into text-generation / chat-style training examples. Empty, malformed, or unusable samples were filtered before tokenization.

Training used causal language modeling labels with padding masked using -100.

Training Method

This model was trained using the same CIx sparse-adaptation setup used for LFM experiments.

The training approach emphasized:

text preserve the compact pretrained backbone adapt selected reasoning and response-shaping surfaces avoid unnecessary full-model disturbance use heterogeneous optimizer routing by parameter type

CIxOpt Optimizer

Training used CIxOpt, a custom heterogeneous optimizer designed for architecture-aware routing.

CIxOpt supports:

AdamW-style adaptive updates
Lion-style sign momentum
AdaMax-compatible routing
Optional ASGD-style averaging
Optional low-rank projected momentum
Gradient centralization
Decoupled weight decay
Discrepancy-aware caution filtering for sign updates
fp32 optimizer state for bf16/fp16 safety
Parameter-name-aware routing

The intended optimizer behavior is:

text large projection matrices -> Lion-style sign momentum normalization / sensitive params -> AdamW-style updates embedding / lm-head surfaces -> conservative adaptive routing

This makes the checkpoint useful for testing whether small models can be efficiently adapted with custom optimizer routing rather than full uniform AdamW updates.

Sparse Fine-Tuning Strategy

The setup used sparse parameter participation rather than unrestricted full-model training.

The intended adaptation pattern was:

text freeze or reduce movement in lower representational structure train selected higher-level adaptation surfaces preserve base language structure where possible shape reasoning and response behavior through targeted updates

This checkpoint should be treated as an experimental adaptation artifact, not a fully benchmarked general-purpose assistant.

Intended Use

This model is intended for:

Research on compact Gemma fine-tuning
CIxOpt optimizer experiments
Small-model reasoning-style generation
Local text-generation experiments
Instruction-following and response-style studies
Efficient adaptation research
Continued fine-tuning and ablation testing
Comparison against the base google/gemma-3-270m

Potential use cases:

Technical explanation
Lightweight reasoning experiments
Prompt-response generation
Local prototyping
Small agent backbone testing
Educational model behavior analysis

Out-of-Scope Use

This model is not intended for high-stakes autonomous deployment.

Do not use this model as the sole decision-maker for:

Medical diagnosis
Legal judgment
Financial decisions
Emergency response
Cyber offensive automation
Personnel screening
Surveillance or targeting decisions
Critical infrastructure decisions
Any setting requiring verified factual accuracy

Limitations

This is an experimental fine-tuned checkpoint. Expected limitations include:

May hallucinate facts, dates, citations, or technical details
May inherit limitations from the Gemma 3 270M base model
May overproduce reasoning-style outputs
May be sensitive to prompt format
May repeat or drift during longer generations
Has not been fully evaluated for factuality, safety, math, coding, or instruction-following
Fine-tuning on reasoning-style data does not guarantee correct reasoning
Sparse adaptation may change some behaviors unevenly while leaving others close to the base model
Small model size limits world knowledge, reasoning depth, and robustness

Safety Notes

Users should independently validate important outputs.

Before deployment, additional evaluation is recommended:

Hallucination testing
Bias and toxicity evaluation
Refusal behavior testing
Prompt-injection sensitivity testing
Side-by-side comparison against the base model
Domain-specific factuality testing
Human review of outputs
Guardrails for public-facing applications

Example Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

prompt = "Explain why small language models are useful for edge reasoning experiments."

inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Chat-Style Usage

If the tokenizer provides a chat template:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": "Why is sparse fine-tuning useful for compact language models?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=384,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

generated = output[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Suggested Generation Settings

Balanced exploratory generation:

generation_config = {
    "max_new_tokens": 384,
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "repetition_penalty": 1.05,
}

More deterministic generation:

generation_config = {
    "max_new_tokens": 384,
    "do_sample": False,
}

For smaller models, shorter outputs are often more stable:

generation_config = {
    "max_new_tokens": 128,
    "do_sample": True,
    "temperature": 0.6,
    "top_p": 0.9,
    "repetition_penalty": 1.1,
}

Training Configuration

Approximate training configuration:

text base_model: google/gemma-3-270m
dataset: angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
task: causal language modeling / reasoning-style SFT
optimizer: CIxOpt state_dtype: fp32
optimizer state model_dtype: bf16 where supported

Evaluation

Formal benchmark results have not yet been added.

Recommended evaluations:

Held-out perplexity
Base model comparison against google/gemma-3-270m
Short-form reasoning checks
IFEval-style instruction-following tests
Repetition and degeneration testing
Human preference review
Truthfulness / hallucination checks
Prompt-format robustness testing
CIxOpt vs AdamW ablation

Responsible Use

This model may generate plausible but incorrect text. It should be used with human oversight.

Developers should follow the Gemma usage terms and apply appropriate safety review before deploying the model in user-facing or operational settings.

Citation

Base model:

bibtex @misc{google_gemma_3_270m,
title = {Gemma 3 270M},
author = {Google DeepMind},
publisher = {Hugging Face},
year = {2025}
}

Fine-tuning dataset:

bibtex @misc{angrygiraffe_reasoning_dataset,
title = {claude-opus-4.6-4.7-reasoning-8.7k},
author = {angrygiraffe},
publisher = {Hugging Face}
}

Author / Maintainer

Fine-tuning and optimizer experimentation by:

Convergent Intelligence LLC

Research focus: AI systems, intelligence analysis, mathematical frameworks, optimizer design, and efficient model adaptation.

Disclaimer

This model is provided for research and experimentation. It should not be treated as a verified expert system. Outputs require human review, especially in factual, technical, legal, medical, financial, operational, or safety-critical contexts.