Update README.md

2f640b6 verified about 1 month ago

5.92 kB

	# Inelly 4.5

	## Model Description

	Inelly 4.5 is a fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone.

	- Developed by: bry
	- Base model: Qwen2.5-3B-Instruct
	- Fine-tuning method: QLoRA (4-bit NF4, rank 16)
	- Parameters: 3.09B (base) + ~4.2M trainable (LoRA adapters)
	- License: Apache 2.0 (inherited from Qwen2.5)

	---

	## Intended Use

	Inelly 4.5 is intended for:

	- Conversational AI – Natural, polite, helpful dialogue
	- Chain-of-Thought reasoning – Step-by-step problem solving
	- Math & Logic – Algebraic word problems, arithmetic, deductive reasoning
	- Code generation – Python functions with comments
	- General knowledge Q&A – Science, everyday facts, explanations
	- Creative writing – Short poems, comparisons, lists

	### Out of Scope

	- Not intended for production deployment without further safety evaluation
	- Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples
	- May struggle with highly specialized domains (law, medicine, finance)

	---

	## Training Data

	Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from:

	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) \| 2,500 \| Chain-of-thought math & reasoning \|
	\| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) \| 2,000 \| Code generation with reasoning \|
	\| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) \| 1,500 \| General reasoning (DeepSeek-R1 distill) \|
	\| [OpenHermes](https://huggingface.co/datasets/teknium/openhermes) \| 2,000 \| Diverse conversational data \|
	\| [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) \| 1,000 \| Helpful, polite response style \|

	All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

	---

	## Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Qwen2.5-3B-Instruct \|
	\| Quantization \| 4-bit NF4 (bitsandbytes) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Learning rate \| 2e-4 \|
	\| Batch size \| 8 (gradient accumulation) \|
	\| Epochs \| 1 \|
	\| Max seq length \| 512 \|
	\| Optimizer \| AdamW 8-bit \|
	\| LR scheduler \| cosine \|
	\| Warmup ratio \| 0.05 \|
	\| Training time \| ~67 min \|
	\| Hardware \| RTX 2080 Ti (11GB VRAM) \|
	\| Final training loss \| ~0.30 \|

	---

	## Model Architecture

	\| Property \| Value \|
	\|---\|---\|
	\| Model type \| Qwen2ForCausalLM \|
	\| Hidden size \| 2,048 \|
	\| Layers \| 36 \|
	\| Attention heads \| 16 \|
	\| Head dim \| 128 \|
	\| Intermediate size \| 5,504 \|
	\| Vocab size \| 151,936 \|
	\| Context length \| 32,768 \|
	\| Total parameters \| ~3.09B \|
	\| Trainable parameters \| ~4.2M (LoRA) \|

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5")

	messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
	response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Chat Format

	Inelly 4.5 uses the Qwen2 chat template:

	```
	<\|im_start\|>system
	You are Inelly 4.5, a helpful and polite assistant.<\|im_end\|>
	<\|im_start\|>user
	{user message}<\|im_end\|>
	<\|im_start\|>assistant
	{response}<\|im_end\|>
	```

	---

	## Performance

	Informal testing across 8 categories (15 test prompts):

	\| Category \| Result \|
	\|---\|---\|
	\| Chain-of-Thought reasoning \| ✅ Correct step-by-step logic \|
	\| Math (algebra, word problems) \| ✅ Accurate with work shown \|
	\| Code generation \| ✅ Clean, commented Python \|
	\| Logic & deduction \| ✅ Sound reasoning \|
	\| General knowledge \| ✅ Accurate explanations \|
	\| Conversational ability \| ✅ Polite, natural responses \|
	\| Creative writing \| ✅ Poems, lists, comparisons \|
	\| Safety \| ⚠️ Inherited from base; not specifically fine-tuned \|

	---

	## Limitations

	- Safety: The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions.
	- Context length: Fine-tuned on 512-token sequences. Performance may degrade on longer contexts.
	- Coherence: As with most small models, very long or complex multi-step tasks may lose coherence.
	- Factual accuracy: May hallucinate facts, especially in specialized domains.

	---

	## Other Models in the Inelly Family

	\| Model \| Size \| Focus \|
	\|---\|---\|---\|
	\| Inelly 4.5 (this model) \| 3B \| Conversation + politeness + CoT \|
	\| Matrix 2 \| 7B \| Deep reasoning, math, coding \|
	\| Inelly 4.5 Blaze \| 1.5B \| Compact reasoning \|

	---

	## Acknowledgments

	- [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud (base model)
	- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
	- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
	- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1

	---

	## Citation

	```
	@misc{inelly45,
	title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning},
	author = {GenueAI},
	year = {2026},
	note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA},
	}
	```

	# Inelly 4.5

	## Model Description

	Inelly 4.5 is a fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone.

	- Developed by: bry
	- Base model: Qwen2.5-3B-Instruct
	- Fine-tuning method: QLoRA (4-bit NF4, rank 16)
	- Parameters: 3.09B (base) + ~4.2M trainable (LoRA adapters)
	- License: Apache 2.0 (inherited from Qwen2.5)

	---

	## Intended Use

	Inelly 4.5 is intended for:

	- Conversational AI – Natural, polite, helpful dialogue
	- Chain-of-Thought reasoning – Step-by-step problem solving
	- Math & Logic – Algebraic word problems, arithmetic, deductive reasoning
	- Code generation – Python functions with comments
	- General knowledge Q&A – Science, everyday facts, explanations
	- Creative writing – Short poems, comparisons, lists

	### Out of Scope

	- Not intended for production deployment without further safety evaluation
	- Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples
	- May struggle with highly specialized domains (law, medicine, finance)

	---

	## Training Data

	Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from:

	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) \| 2,500 \| Chain-of-thought math & reasoning \|
	\| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) \| 2,000 \| Code generation with reasoning \|
	\| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) \| 1,500 \| General reasoning (DeepSeek-R1 distill) \|
	\| [OpenHermes](https://huggingface.co/datasets/teknium/openhermes) \| 2,000 \| Diverse conversational data \|
	\| [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) \| 1,000 \| Helpful, polite response style \|

	All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

	---

	## Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Qwen2.5-3B-Instruct \|
	\| Quantization \| 4-bit NF4 (bitsandbytes) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Learning rate \| 2e-4 \|
	\| Batch size \| 8 (gradient accumulation) \|
	\| Epochs \| 1 \|
	\| Max seq length \| 512 \|
	\| Optimizer \| AdamW 8-bit \|
	\| LR scheduler \| cosine \|
	\| Warmup ratio \| 0.05 \|
	\| Training time \| ~67 min \|
	\| Hardware \| RTX 2080 Ti (11GB VRAM) \|
	\| Final training loss \| ~0.30 \|

	---

	## Model Architecture

	\| Property \| Value \|
	\|---\|---\|
	\| Model type \| Qwen2ForCausalLM \|
	\| Hidden size \| 2,048 \|
	\| Layers \| 36 \|
	\| Attention heads \| 16 \|
	\| Head dim \| 128 \|
	\| Intermediate size \| 5,504 \|
	\| Vocab size \| 151,936 \|
	\| Context length \| 32,768 \|
	\| Total parameters \| ~3.09B \|
	\| Trainable parameters \| ~4.2M (LoRA) \|

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5")

	messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
	response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Chat Format

	Inelly 4.5 uses the Qwen2 chat template:

	```
	<\|im_start\|>system
	You are Inelly 4.5, a helpful and polite assistant.<\|im_end\|>
	<\|im_start\|>user
	{user message}<\|im_end\|>
	<\|im_start\|>assistant
	{response}<\|im_end\|>
	```

	---

	## Performance

	Informal testing across 8 categories (15 test prompts):

	\| Category \| Result \|
	\|---\|---\|
	\| Chain-of-Thought reasoning \| ✅ Correct step-by-step logic \|
	\| Math (algebra, word problems) \| ✅ Accurate with work shown \|
	\| Code generation \| ✅ Clean, commented Python \|
	\| Logic & deduction \| ✅ Sound reasoning \|
	\| General knowledge \| ✅ Accurate explanations \|
	\| Conversational ability \| ✅ Polite, natural responses \|
	\| Creative writing \| ✅ Poems, lists, comparisons \|
	\| Safety \| ⚠️ Inherited from base; not specifically fine-tuned \|

	---

	## Limitations

	- Safety: The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions.
	- Context length: Fine-tuned on 512-token sequences. Performance may degrade on longer contexts.
	- Coherence: As with most small models, very long or complex multi-step tasks may lose coherence.
	- Factual accuracy: May hallucinate facts, especially in specialized domains.

	---

	## Other Models in the Inelly Family

	\| Model \| Size \| Focus \|
	\|---\|---\|---\|
	\| Inelly 4.5 (this model) \| 3B \| Conversation + politeness + CoT \|
	\| Matrix 2 \| 7B \| Deep reasoning, math, coding \|
	\| Inelly 4.5 Blaze \| 1.5B \| Compact reasoning \|

	---

	## Acknowledgments

	- [Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) by Alibaba Cloud (base model)
	- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
	- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
	- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1

	---

	## Citation

	```
	@misc{inelly45,
	title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning},
	author = {GenueAI},
	year = {2026},
	note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA},
	}
	```