Instructions to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="reaperdoesntknow/Gemma-3-270m-Opus-Distil")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/Gemma-3-270m-Opus-Distil") model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/Gemma-3-270m-Opus-Distil") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "reaperdoesntknow/Gemma-3-270m-Opus-Distil" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/reaperdoesntknow/Gemma-3-270m-Opus-Distil
- SGLang
How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "reaperdoesntknow/Gemma-3-270m-Opus-Distil" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "reaperdoesntknow/Gemma-3-270m-Opus-Distil" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/Gemma-3-270m-Opus-Distil", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use reaperdoesntknow/Gemma-3-270m-Opus-Distil with Docker Model Runner:
docker model run hf.co/reaperdoesntknow/Gemma-3-270m-Opus-Distil
CIx-Gemma-3-270M Reasoning SFT
Model Summary
This model is a fine-tuned derivative of google/gemma-3-270m, adapted using the Convergent Intelligence sparse fine-tuning setup originally tested on Liquid Foundation Models.
The checkpoint was trained on reasoning-style English examples from angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k using a targeted adaptation strategy and the custom CIxOpt optimizer framework.
The goal of this model is to test whether a compact Gemma 3 270M backbone can be shaped toward reasoning-style text generation through selective parameter participation rather than broad full-model modification.
This is an experimental research checkpoint intended for evaluation, local testing, optimizer research, and continued fine-tuning.
Base Model
- Base model: google/gemma-3-270m
- Model family: Gemma 3
- Approximate size: 270M parameters
- Task: Causal language modeling / text generation
- Language: English-focused fine-tuning
- Library: Hugging Face Transformers
- License: Gemma license
If this checkpoint was instead trained from google/gemma-3-270m-it, update the base_model field accordingly.
Dataset
Fine-tuning data:
- angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
The dataset was processed into text-generation / chat-style training examples. Empty, malformed, or unusable samples were filtered before tokenization.
Training used causal language modeling labels with padding masked using -100.
Training Method
This model was trained using the same CIx sparse-adaptation setup used for LFM experiments.
The training approach emphasized:
text preserve the compact pretrained backbone adapt selected reasoning and response-shaping surfaces avoid unnecessary full-model disturbance use heterogeneous optimizer routing by parameter type
CIxOpt Optimizer
Training used CIxOpt, a custom heterogeneous optimizer designed for architecture-aware routing.
CIxOpt supports:
- AdamW-style adaptive updates
- Lion-style sign momentum
- AdaMax-compatible routing
- Optional ASGD-style averaging
- Optional low-rank projected momentum
- Gradient centralization
- Decoupled weight decay
- Discrepancy-aware caution filtering for sign updates
- fp32 optimizer state for bf16/fp16 safety
- Parameter-name-aware routing
The intended optimizer behavior is:
text large projection matrices -> Lion-style sign momentum normalization / sensitive params -> AdamW-style updates embedding / lm-head surfaces -> conservative adaptive routing
This makes the checkpoint useful for testing whether small models can be efficiently adapted with custom optimizer routing rather than full uniform AdamW updates.
Sparse Fine-Tuning Strategy
The setup used sparse parameter participation rather than unrestricted full-model training.
The intended adaptation pattern was:
text freeze or reduce movement in lower representational structure train selected higher-level adaptation surfaces preserve base language structure where possible shape reasoning and response behavior through targeted updates
This checkpoint should be treated as an experimental adaptation artifact, not a fully benchmarked general-purpose assistant.
Intended Use
This model is intended for:
- Research on compact Gemma fine-tuning
- CIxOpt optimizer experiments
- Small-model reasoning-style generation
- Local text-generation experiments
- Instruction-following and response-style studies
- Efficient adaptation research
- Continued fine-tuning and ablation testing
- Comparison against the base google/gemma-3-270m
Potential use cases:
- Technical explanation
- Lightweight reasoning experiments
- Prompt-response generation
- Local prototyping
- Small agent backbone testing
- Educational model behavior analysis
Out-of-Scope Use
This model is not intended for high-stakes autonomous deployment.
Do not use this model as the sole decision-maker for:
- Medical diagnosis
- Legal judgment
- Financial decisions
- Emergency response
- Cyber offensive automation
- Personnel screening
- Surveillance or targeting decisions
- Critical infrastructure decisions
- Any setting requiring verified factual accuracy
Limitations
This is an experimental fine-tuned checkpoint. Expected limitations include:
- May hallucinate facts, dates, citations, or technical details
- May inherit limitations from the Gemma 3 270M base model
- May overproduce reasoning-style outputs
- May be sensitive to prompt format
- May repeat or drift during longer generations
- Has not been fully evaluated for factuality, safety, math, coding, or instruction-following
- Fine-tuning on reasoning-style data does not guarantee correct reasoning
- Sparse adaptation may change some behaviors unevenly while leaving others close to the base model
- Small model size limits world knowledge, reasoning depth, and robustness
Safety Notes
Users should independently validate important outputs.
Before deployment, additional evaluation is recommended:
- Hallucination testing
- Bias and toxicity evaluation
- Refusal behavior testing
- Prompt-injection sensitivity testing
- Side-by-side comparison against the base model
- Domain-specific factuality testing
- Human review of outputs
- Guardrails for public-facing applications
Example Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
prompt = "Explain why small language models are useful for edge reasoning experiments."
inputs = tokenizer(
prompt,
return_tensors="pt",
).to(model.device)
with torch.inference_mode():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.05,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Chat-Style Usage
If the tokenizer provides a chat template:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "user",
"content": "Why is sparse fine-tuning useful for compact language models?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to(model.device)
with torch.inference_mode():
output = model.generate(
**inputs,
max_new_tokens=384,
do_sample=True,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.05,
pad_token_id=tokenizer.eos_token_id,
)
generated = output[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))
Suggested Generation Settings
Balanced exploratory generation:
generation_config = {
"max_new_tokens": 384,
"do_sample": True,
"temperature": 0.7,
"top_p": 0.95,
"repetition_penalty": 1.05,
}
More deterministic generation:
generation_config = {
"max_new_tokens": 384,
"do_sample": False,
}
For smaller models, shorter outputs are often more stable:
generation_config = {
"max_new_tokens": 128,
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
"repetition_penalty": 1.1,
}
Training Configuration
Approximate training configuration:
text base_model: google/gemma-3-270m
dataset: angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
task: causal language modeling / reasoning-style SFT
optimizer: CIxOpt state_dtype: fp32
optimizer state model_dtype: bf16 where supported
Evaluation
Formal benchmark results have not yet been added.
Recommended evaluations:
- Held-out perplexity
- Base model comparison against google/gemma-3-270m
- Short-form reasoning checks
- IFEval-style instruction-following tests
- Repetition and degeneration testing
- Human preference review
- Truthfulness / hallucination checks
- Prompt-format robustness testing
- CIxOpt vs AdamW ablation
Responsible Use
This model may generate plausible but incorrect text. It should be used with human oversight.
Developers should follow the Gemma usage terms and apply appropriate safety review before deploying the model in user-facing or operational settings.
Citation
Base model:
bibtex @misc{google_gemma_3_270m,
title = {Gemma 3 270M},
author = {Google DeepMind},
publisher = {Hugging Face},
year = {2025}
}
Fine-tuning dataset:
bibtex @misc{angrygiraffe_reasoning_dataset,
title = {claude-opus-4.6-4.7-reasoning-8.7k},
author = {angrygiraffe},
publisher = {Hugging Face}
}
Author / Maintainer
Fine-tuning and optimizer experimentation by:
Convergent Intelligence LLC
Research focus: AI systems, intelligence analysis, mathematical frameworks, optimizer design, and efficient model adaptation.
Disclaimer
This model is provided for research and experimentation. It should not be treated as a verified expert system. Outputs require human review, especially in factual, technical, legal, medical, financial, operational, or safety-critical contexts.
- Downloads last month
- 68
Model tree for reaperdoesntknow/Gemma-3-270m-Opus-Distil
Base model
google/gemma-3-270m