File size: 5,359 Bytes

# Matrix 2

## Model Description

**Matrix 2** is a fine-tuned version of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving.

- **Developed by:** Bry (GenueAI)
- **Base model:** DeepSeek-R1-Distill-Qwen-7B
- **Fine-tuning method:** QLoRA (4-bit NF4, rank 16)
- **Parameters:** 7.62B (base) + ~6.5M trainable (LoRA adapters)
- **License:** MIT (inherited from DeepSeek-R1)

---

## Intended Use

Matrix 2 is intended for:

- **Deep Chain-of-Thought reasoning** – Multi-step problem solving with clear logic
- **Mathematics** – Algebra, arithmetic, word problems, multi-step calculations
- **Code generation** – Python functions with proper logic and comments
- **Logical deduction** – Syllogisms, puzzles, transitive reasoning
- **Scientific explanations** – Physics, biology, general science
- **Complex instruction following** – Multi-part tasks requiring structured thinking

### Out of Scope

- Not intended for production deployment without further safety evaluation
- Safety alignment inherited from DeepSeek-R1 base; fine-tuning data did not include adversarial safety examples
- Larger memory footprint than 1.5B/3B variants (~5.2GB)

---

## Training Data

Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from:

| Dataset | Samples | Purpose |
|---|---|---|
| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) | 3,000 | Chain-of-thought math & reasoning |
| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) | 2,500 | Code generation with reasoning |
| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | 2,000 | General reasoning (DeepSeek-R1 distill) |

All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

---

## Training Hyperparameters

| Parameter | Value |
|---|---|
| Base model | DeepSeek-R1-Distill-Qwen-7B |
| Quantization | 4-bit NF4 (bitsandbytes) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 8 (gradient accumulation) |
| Epochs | 1 |
| Max seq length | 512 |
| Optimizer | AdamW 8-bit |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Training time | ~74 min |
| Hardware | RTX 3090 (24GB VRAM) |

---

## Model Architecture

| Property | Value |
|---|---|
| Model type | Qwen2ForCausalLM |
| Hidden size | 3,584 |
| Layers | 28 |
| Attention heads | 28 |
| Head dim | 128 |
| Intermediate size | 18,944 |
| Vocab size | 152,064 |
| Context length | 131,072 |
| Total parameters | ~7.62B |
| Trainable parameters | ~6.5M (LoRA) |

---

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2")

messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

---

## Performance

Informal GPU testing across 8 categories:

| Category | Result |
|---|---|
| Chain-of-Thought reasoning | ✅ Excellent multi-step logic |
| Math | ✅ Accurate with detailed work shown |
| Code generation | ✅ Clean, well-commented Python |
| Logic puzzles | ✅ Thorough deductive reasoning |
| General knowledge | ✅ Accurate, detailed explanations |
| Complex reasoning | ✅ Handles multi-step word problems well |

---

## Inelly / GenueAI Model Family

| Model | Size | Focus |
|---|---|---|
| **Matrix 2** (this model) | 7B | Deep CoT reasoning, math, coding |
| Inelly 4.5 | 3B | Conversation + politeness + CoT |
| Inelly 4.5 Blaze | 1.5B | Fast reasoning + CoT |

---

## Limitations

- **Safety:** Inherited from DeepSeek-R1 base; not specifically safety-tuned. May occasionally follow harmful instructions.
- **Memory:** Requires ~5.2GB VRAM for inference (FP16)
- **Context length:** Fine-tuned on 512-token sequences; base supports 128K but fine-tuned performance is optimized for shorter contexts
- **Factual accuracy:** May hallucinate in specialized domains (law, medicine, finance)
- **Speed:** Slower than 1.5B/3B variants due to size

---

## Acknowledgments

- [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) by DeepSeek AI (base model)
- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1

---

## Citation

```
@misc{matrix2,
  title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model},
  author = {Bry},
  organization = {GenueAI},
  year = {2026},
  note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA},
}
```