| # Inelly 4.5 |
|
|
| ## Model Description |
|
|
| **Inelly 4.5** is a fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone. |
|
|
| - **Developed by:** bry |
| - **Base model:** Qwen2.5-3B-Instruct |
| - **Fine-tuning method:** QLoRA (4-bit NF4, rank 16) |
| - **Parameters:** 3.09B (base) + ~4.2M trainable (LoRA adapters) |
| - **License:** Apache 2.0 (inherited from Qwen2.5) |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| Inelly 4.5 is intended for: |
|
|
| - **Conversational AI** – Natural, polite, helpful dialogue |
| - **Chain-of-Thought reasoning** – Step-by-step problem solving |
| - **Math & Logic** – Algebraic word problems, arithmetic, deductive reasoning |
| - **Code generation** – Python functions with comments |
| - **General knowledge Q&A** – Science, everyday facts, explanations |
| - **Creative writing** – Short poems, comparisons, lists |
|
|
| ### Out of Scope |
|
|
| - Not intended for production deployment without further safety evaluation |
| - Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples |
| - May struggle with highly specialized domains (law, medicine, finance) |
|
|
| --- |
|
|
| ## Training Data |
|
|
| Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from: |
|
|
| | Dataset | Samples | Purpose | |
| |---|---|---| |
| | [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) | 2,500 | Chain-of-thought math & reasoning | |
| | [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) | 2,000 | Code generation with reasoning | |
| | [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | 1,500 | General reasoning (DeepSeek-R1 distill) | |
| | [OpenHermes](https://huggingface.co/datasets/teknium/openhermes) | 2,000 | Diverse conversational data | |
| | [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) | 1,000 | Helpful, polite response style | |
|
|
| All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens. |
|
|
| --- |
|
|
| ## Training Hyperparameters |
|
|
| | Parameter | Value | |
| |---|---| |
| | Base model | Qwen2.5-3B-Instruct | |
| | Quantization | 4-bit NF4 (bitsandbytes) | |
| | LoRA rank | 16 | |
| | LoRA alpha | 32 | |
| | LoRA dropout | 0.05 | |
| | Learning rate | 2e-4 | |
| | Batch size | 8 (gradient accumulation) | |
| | Epochs | 1 | |
| | Max seq length | 512 | |
| | Optimizer | AdamW 8-bit | |
| | LR scheduler | cosine | |
| | Warmup ratio | 0.05 | |
| | Training time | ~67 min | |
| | Hardware | RTX 2080 Ti (11GB VRAM) | |
| | Final training loss | ~0.30 | |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| | Property | Value | |
| |---|---| |
| | Model type | Qwen2ForCausalLM | |
| | Hidden size | 2,048 | |
| | Layers | 36 | |
| | Attention heads | 16 | |
| | Head dim | 128 | |
| | Intermediate size | 5,504 | |
| | Vocab size | 151,936 | |
| | Context length | 32,768 | |
| | Total parameters | ~3.09B | |
| | Trainable parameters | ~4.2M (LoRA) | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto") |
| tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5") |
| |
| messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}] |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| |
| output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9) |
| response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ### Chat Format |
|
|
| Inelly 4.5 uses the Qwen2 chat template: |
|
|
| ``` |
| <|im_start|>system |
| You are Inelly 4.5, a helpful and polite assistant.<|im_end|> |
| <|im_start|>user |
| {user message}<|im_end|> |
| <|im_start|>assistant |
| {response}<|im_end|> |
| ``` |
|
|
| --- |
|
|
| ## Performance |
|
|
| Informal testing across 8 categories (15 test prompts): |
|
|
| | Category | Result | |
| |---|---| |
| | Chain-of-Thought reasoning | ✅ Correct step-by-step logic | |
| | Math (algebra, word problems) | ✅ Accurate with work shown | |
| | Code generation | ✅ Clean, commented Python | |
| | Logic & deduction | ✅ Sound reasoning | |
| | General knowledge | ✅ Accurate explanations | |
| | Conversational ability | ✅ Polite, natural responses | |
| | Creative writing | ✅ Poems, lists, comparisons | |
| | Safety | ⚠️ Inherited from base; not specifically fine-tuned | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **Safety:** The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions. |
| - **Context length:** Fine-tuned on 512-token sequences. Performance may degrade on longer contexts. |
| - **Coherence:** As with most small models, very long or complex multi-step tasks may lose coherence. |
| - **Factual accuracy:** May hallucinate facts, especially in specialized domains. |
|
|
| --- |
|
|
| ## Other Models in the Inelly Family |
|
|
| | Model | Size | Focus | |
| |---|---|---| |
| | **Inelly 4.5** (this model) | 3B | Conversation + politeness + CoT | |
| | Matrix 2 | 7B | Deep reasoning, math, coding | |
| | Inelly 4.5 Blaze | 1.5B | Compact reasoning | |
|
|
| --- |
|
|
| ## Acknowledgments |
|
|
| - [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud (base model) |
| - [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset |
| - [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team |
| - [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1 |
|
|
| --- |
|
|
| ## Citation |
|
|
| ``` |
| @misc{inelly45, |
| title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning}, |
| author = {GenueAI}, |
| year = {2026}, |
| note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA}, |
| } |
| ``` |
|
|