Inelly-4.5 / README.md
GenueAI's picture
Update README.md
2f640b6 verified
|
Raw
History Blame Contribute Delete
5.92 kB
# Inelly 4.5
## Model Description
**Inelly 4.5** is a fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone.
- **Developed by:** bry
- **Base model:** Qwen2.5-3B-Instruct
- **Fine-tuning method:** QLoRA (4-bit NF4, rank 16)
- **Parameters:** 3.09B (base) + ~4.2M trainable (LoRA adapters)
- **License:** Apache 2.0 (inherited from Qwen2.5)
---
## Intended Use
Inelly 4.5 is intended for:
- **Conversational AI** – Natural, polite, helpful dialogue
- **Chain-of-Thought reasoning** – Step-by-step problem solving
- **Math & Logic** – Algebraic word problems, arithmetic, deductive reasoning
- **Code generation** – Python functions with comments
- **General knowledge Q&A** – Science, everyday facts, explanations
- **Creative writing** – Short poems, comparisons, lists
### Out of Scope
- Not intended for production deployment without further safety evaluation
- Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples
- May struggle with highly specialized domains (law, medicine, finance)
---
## Training Data
Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from:
| Dataset | Samples | Purpose |
|---|---|---|
| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) | 2,500 | Chain-of-thought math & reasoning |
| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) | 2,000 | Code generation with reasoning |
| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | 1,500 | General reasoning (DeepSeek-R1 distill) |
| [OpenHermes](https://huggingface.co/datasets/teknium/openhermes) | 2,000 | Diverse conversational data |
| [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) | 1,000 | Helpful, polite response style |
All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.
---
## Training Hyperparameters
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-3B-Instruct |
| Quantization | 4-bit NF4 (bitsandbytes) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 8 (gradient accumulation) |
| Epochs | 1 |
| Max seq length | 512 |
| Optimizer | AdamW 8-bit |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Training time | ~67 min |
| Hardware | RTX 2080 Ti (11GB VRAM) |
| Final training loss | ~0.30 |
---
## Model Architecture
| Property | Value |
|---|---|
| Model type | Qwen2ForCausalLM |
| Hidden size | 2,048 |
| Layers | 36 |
| Attention heads | 16 |
| Head dim | 128 |
| Intermediate size | 5,504 |
| Vocab size | 151,936 |
| Context length | 32,768 |
| Total parameters | ~3.09B |
| Trainable parameters | ~4.2M (LoRA) |
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5")
messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```
### Chat Format
Inelly 4.5 uses the Qwen2 chat template:
```
<|im_start|>system
You are Inelly 4.5, a helpful and polite assistant.<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
```
---
## Performance
Informal testing across 8 categories (15 test prompts):
| Category | Result |
|---|---|
| Chain-of-Thought reasoning | ✅ Correct step-by-step logic |
| Math (algebra, word problems) | ✅ Accurate with work shown |
| Code generation | ✅ Clean, commented Python |
| Logic & deduction | ✅ Sound reasoning |
| General knowledge | ✅ Accurate explanations |
| Conversational ability | ✅ Polite, natural responses |
| Creative writing | ✅ Poems, lists, comparisons |
| Safety | ⚠️ Inherited from base; not specifically fine-tuned |
---
## Limitations
- **Safety:** The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions.
- **Context length:** Fine-tuned on 512-token sequences. Performance may degrade on longer contexts.
- **Coherence:** As with most small models, very long or complex multi-step tasks may lose coherence.
- **Factual accuracy:** May hallucinate facts, especially in specialized domains.
---
## Other Models in the Inelly Family
| Model | Size | Focus |
|---|---|---|
| **Inelly 4.5** (this model) | 3B | Conversation + politeness + CoT |
| Matrix 2 | 7B | Deep reasoning, math, coding |
| Inelly 4.5 Blaze | 1.5B | Compact reasoning |
---
## Acknowledgments
- [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud (base model)
- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1
---
## Citation
```
@misc{inelly45,
title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning},
author = {GenueAI},
year = {2026},
note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA},
}
```