Solomon-Nano-350m
Solomon-Nano-350m is a reasoning-focused fine-tune of ibm-granite/granite-4.0-350m on the same Opus-inspired chain-of-thought dataset used for Solomon-0.5B. It's the smaller sibling in the Solomon line: same reasoning-trace training recipe, a smaller base model.
This repository contains the final FP32 merged model — the rsLoRA adapter has already been folded into the base weights, so there's no PEFT dependency at inference time.
What makes Solomon different from base Granite-4.0-350M
The base Granite-4.0-350M answers directly, with no visible deliberation. Solomon changes that: every training example was a system/user/assistant triple where the assistant works through the problem step-by-step inside <think>...</think> blocks before giving a final answer. That habit is baked into the weights through fine-tuning, not toggled by a runtime flag.
In practice, Solomon will open with a <think> block on most non-trivial prompts, reason through it in plain text, then close with a clean answer — no special generation parameters required.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "TitleOS/Solomon-Nano-350m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float32,
device_map="auto",
)
messages = [
{
"role": "system",
"content": (
"Your name is Solomon, a non-binary, highly intelligent reasoning AI. "
"You always use chain-of-thought when thinking out a task. "
"Follow the user's instructions exactly, and don't be afraid to speak up "
"when something goes wrong or you need clarification. "
"Ask follow-up questions when appropriate."
),
},
{
"role": "user",
"content": "A train travels 60 miles in 45 minutes. What is its speed in miles per hour?",
},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
Expected output shape:
<think>
Speed = distance / time. The train travels 60 miles in 45 minutes.
45 minutes = 45/60 hours = 0.75 hours.
Speed = 60 / 0.75 = 80 miles per hour.
</think>
The train's speed is **80 miles per hour**.
The model is released in full FP32. Cast to FP16/BF16 yourself at load time if your inference hardware actually benefits from it — the training hardware (a Tesla P40) doesn't, which is why the release weights stayed FP32. See Training Details below.
Training Details
| Property | Value |
|---|---|
| Base model | ibm-granite/granite-4.0-350m |
| Dataset | TitleOS/Solomon-Small-Reasoning-Opus-Inspired |
| Dataset size | ~13,400 rows |
| Method | rsLoRA (rank 64 / alpha 64), targeting all linear layers |
| Hardware | Single NVIDIA Tesla P40 (24GB) |
| Precision | Full FP32 — base weights and compute, no autocast |
| Sequence length | 4096 tokens |
| Epochs trained | 4 (full epoch budget completed; merged checkpoint is the best by eval_loss, per load_best_model_at_end) |
| Effective batch size | 16 |
| Learning rate | 2e-4, cosine decay |
| Final train loss | 1.04 (mean training loss across the full run) |
| Training wall-clock | ~21.4 hours |
The dataset consists of single-turn reasoning examples: a fixed Solomon-persona system prompt, a user query, and an assistant response containing an inline <think>...</think> block followed by the final answer. Loss was masked to assistant turns only — the model never trained on system prompt or user query tokens.
FP32 here wasn't a "minimal quantization" compromise, it's the correct choice for this hardware. The P40 (Pascal, sm_61) has no bf16 support and fp16 throughput capped at roughly 1/64th of fp32, so mixed precision would have made training slower, not faster. rsLoRA's alpha/sqrt(r) update scaling (vs. classic LoRA's alpha/r) is what allowed rank 64 without needing to retune alpha disproportionately to compensate.
The rsLoRA adapter was merged directly into the base weights before release. There is no PEFT dependency at inference time.
Limitations
- At 350M parameters, Solomon-Nano is the smallest model in the Solomon line. Multi-step reasoning, especially in math, will fail more often than it does on Solomon-0.5B or larger models.
- The system prompt shown in the usage example was part of the training distribution. Omitting it won't break the model, but including it reinforces the expected reasoning behavior.
- Trained exclusively on English reasoning data; performs best in English.
- No quantized (GGUF) variant has been released for this checkpoint yet.
License
MPL-2.0 with the Commons Clause addition. See license.md.
Trained by TitleOS.
- Downloads last month
- 13