Inelly 4.5

Model Description

Inelly 4.5 is a fine-tuned version of Qwen2.5-3B-Instruct, trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone.

Developed by: bry
Base model: Qwen2.5-3B-Instruct
Fine-tuning method: QLoRA (4-bit NF4, rank 16)
Parameters: 3.09B (base) + ~4.2M trainable (LoRA adapters)
License: Apache 2.0 (inherited from Qwen2.5)

Intended Use

Inelly 4.5 is intended for:

Conversational AI – Natural, polite, helpful dialogue
Chain-of-Thought reasoning – Step-by-step problem solving
Math & Logic – Algebraic word problems, arithmetic, deductive reasoning
Code generation – Python functions with comments
General knowledge Q&A – Science, everyday facts, explanations
Creative writing – Short poems, comparisons, lists

Out of Scope

Not intended for production deployment without further safety evaluation
Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples
May struggle with highly specialized domains (law, medicine, finance)

Training Data

Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from:

Dataset	Samples	Purpose
Bespoke-Stratos-35k	2,500	Chain-of-thought math & reasoning
OpenThoughts-114k	2,000	Code generation with reasoning
dolphin-r1	1,500	General reasoning (DeepSeek-R1 distill)
OpenHermes	2,000	Diverse conversational data
HelpSteer2	1,000	Helpful, polite response style

All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

Training Hyperparameters

Parameter	Value
Base model	Qwen2.5-3B-Instruct
Quantization	4-bit NF4 (bitsandbytes)
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Learning rate	2e-4
Batch size	8 (gradient accumulation)
Epochs	1
Max seq length	512
Optimizer	AdamW 8-bit
LR scheduler	cosine
Warmup ratio	0.05
Training time	~67 min
Hardware	RTX 2080 Ti (11GB VRAM)
Final training loss	~0.30

Model Architecture

Property	Value
Model type	Qwen2ForCausalLM
Hidden size	2,048
Layers	36
Attention heads	16
Head dim	128
Intermediate size	5,504
Vocab size	151,936
Context length	32,768
Total parameters	~3.09B
Trainable parameters	~4.2M (LoRA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5")

messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Chat Format

Inelly 4.5 uses the Qwen2 chat template:

<|im_start|>system
You are Inelly 4.5, a helpful and polite assistant.<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Performance

Informal testing across 8 categories (15 test prompts):

Category	Result
Chain-of-Thought reasoning	✅ Correct step-by-step logic
Math (algebra, word problems)	✅ Accurate with work shown
Code generation	✅ Clean, commented Python
Logic & deduction	✅ Sound reasoning
General knowledge	✅ Accurate explanations
Conversational ability	✅ Polite, natural responses
Creative writing	✅ Poems, lists, comparisons
Safety	⚠️ Inherited from base; not specifically fine-tuned

Limitations

Safety: The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions.
Context length: Fine-tuned on 512-token sequences. Performance may degrade on longer contexts.
Coherence: As with most small models, very long or complex multi-step tasks may lose coherence.
Factual accuracy: May hallucinate facts, especially in specialized domains.

Other Models in the Inelly Family

Model	Size	Focus
Inelly 4.5 (this model)	3B	Conversation + politeness + CoT
Matrix 2	7B	Deep reasoning, math, coding
Inelly 4.5 Blaze	1.5B	Compact reasoning

Acknowledgments

Qwen2.5 by Alibaba Cloud (base model)
Bespoke Labs for Stratos dataset
OpenThoughts team
Cognitive Computations for dolphin-r1

Citation

@misc{inelly45,
  title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning},
  author = {GenueAI},
  year = {2026},
  note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA},
}