Matrix-2 / README.md

Best Model

053d55b verified 22 days ago

5.36 kB

	# Matrix 2

	## Model Description

	Matrix 2 is a fine-tuned version of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving.

	- Developed by: Bry (GenueAI)
	- Base model: DeepSeek-R1-Distill-Qwen-7B
	- Fine-tuning method: QLoRA (4-bit NF4, rank 16)
	- Parameters: 7.62B (base) + ~6.5M trainable (LoRA adapters)
	- License: MIT (inherited from DeepSeek-R1)

	---

	## Intended Use

	Matrix 2 is intended for:

	- Deep Chain-of-Thought reasoning – Multi-step problem solving with clear logic
	- Mathematics – Algebra, arithmetic, word problems, multi-step calculations
	- Code generation – Python functions with proper logic and comments
	- Logical deduction – Syllogisms, puzzles, transitive reasoning
	- Scientific explanations – Physics, biology, general science
	- Complex instruction following – Multi-part tasks requiring structured thinking

	### Out of Scope

	- Not intended for production deployment without further safety evaluation
	- Safety alignment inherited from DeepSeek-R1 base; fine-tuning data did not include adversarial safety examples
	- Larger memory footprint than 1.5B/3B variants (~5.2GB)

	---

	## Training Data

	Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from:

	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) \| 3,000 \| Chain-of-thought math & reasoning \|
	\| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) \| 2,500 \| Code generation with reasoning \|
	\| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) \| 2,000 \| General reasoning (DeepSeek-R1 distill) \|

	All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

	---

	## Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| DeepSeek-R1-Distill-Qwen-7B \|
	\| Quantization \| 4-bit NF4 (bitsandbytes) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Learning rate \| 2e-4 \|
	\| Batch size \| 8 (gradient accumulation) \|
	\| Epochs \| 1 \|
	\| Max seq length \| 512 \|
	\| Optimizer \| AdamW 8-bit \|
	\| LR scheduler \| cosine \|
	\| Warmup ratio \| 0.05 \|
	\| Training time \| ~74 min \|
	\| Hardware \| RTX 3090 (24GB VRAM) \|

	---

	## Model Architecture

	\| Property \| Value \|
	\|---\|---\|
	\| Model type \| Qwen2ForCausalLM \|
	\| Hidden size \| 3,584 \|
	\| Layers \| 28 \|
	\| Attention heads \| 28 \|
	\| Head dim \| 128 \|
	\| Intermediate size \| 18,944 \|
	\| Vocab size \| 152,064 \|
	\| Context length \| 131,072 \|
	\| Total parameters \| ~7.62B \|
	\| Trainable parameters \| ~6.5M (LoRA) \|

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2")

	messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
	response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	---

	## Performance

	Informal GPU testing across 8 categories:

	\| Category \| Result \|
	\|---\|---\|
	\| Chain-of-Thought reasoning \| ✅ Excellent multi-step logic \|
	\| Math \| ✅ Accurate with detailed work shown \|
	\| Code generation \| ✅ Clean, well-commented Python \|
	\| Logic puzzles \| ✅ Thorough deductive reasoning \|
	\| General knowledge \| ✅ Accurate, detailed explanations \|
	\| Complex reasoning \| ✅ Handles multi-step word problems well \|

	---

	## Inelly / GenueAI Model Family

	\| Model \| Size \| Focus \|
	\|---\|---\|---\|
	\| Matrix 2 (this model) \| 7B \| Deep CoT reasoning, math, coding \|
	\| Inelly 4.5 \| 3B \| Conversation + politeness + CoT \|
	\| Inelly 4.5 Blaze \| 1.5B \| Fast reasoning + CoT \|

	---

	## Limitations

	- Safety: Inherited from DeepSeek-R1 base; not specifically safety-tuned. May occasionally follow harmful instructions.
	- Memory: Requires ~5.2GB VRAM for inference (FP16)
	- Context length: Fine-tuned on 512-token sequences; base supports 128K but fine-tuned performance is optimized for shorter contexts
	- Factual accuracy: May hallucinate in specialized domains (law, medicine, finance)
	- Speed: Slower than 1.5B/3B variants due to size

	---

	## Acknowledgments

	- [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) by DeepSeek AI (base model)
	- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
	- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
	- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1

	---

	## Citation

	```
	@misc{matrix2,
	title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model},
	author = {Bry},
	organization = {GenueAI},
	year = {2026},
	note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA},
	}
	```

	# Matrix 2

	## Model Description

	Matrix 2 is a fine-tuned version of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving.

	- Developed by: Bry (GenueAI)
	- Base model: DeepSeek-R1-Distill-Qwen-7B
	- Fine-tuning method: QLoRA (4-bit NF4, rank 16)
	- Parameters: 7.62B (base) + ~6.5M trainable (LoRA adapters)
	- License: MIT (inherited from DeepSeek-R1)

	---

	## Intended Use

	Matrix 2 is intended for:

	- Deep Chain-of-Thought reasoning – Multi-step problem solving with clear logic
	- Mathematics – Algebra, arithmetic, word problems, multi-step calculations
	- Code generation – Python functions with proper logic and comments
	- Logical deduction – Syllogisms, puzzles, transitive reasoning
	- Scientific explanations – Physics, biology, general science
	- Complex instruction following – Multi-part tasks requiring structured thinking

	### Out of Scope

	- Not intended for production deployment without further safety evaluation
	- Safety alignment inherited from DeepSeek-R1 base; fine-tuning data did not include adversarial safety examples
	- Larger memory footprint than 1.5B/3B variants (~5.2GB)

	---

	## Training Data

	Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from:

	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) \| 3,000 \| Chain-of-thought math & reasoning \|
	\| [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) \| 2,500 \| Code generation with reasoning \|
	\| [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) \| 2,000 \| General reasoning (DeepSeek-R1 distill) \|

	All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

	---

	## Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| DeepSeek-R1-Distill-Qwen-7B \|
	\| Quantization \| 4-bit NF4 (bitsandbytes) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Learning rate \| 2e-4 \|
	\| Batch size \| 8 (gradient accumulation) \|
	\| Epochs \| 1 \|
	\| Max seq length \| 512 \|
	\| Optimizer \| AdamW 8-bit \|
	\| LR scheduler \| cosine \|
	\| Warmup ratio \| 0.05 \|
	\| Training time \| ~74 min \|
	\| Hardware \| RTX 3090 (24GB VRAM) \|

	---

	## Model Architecture

	\| Property \| Value \|
	\|---\|---\|
	\| Model type \| Qwen2ForCausalLM \|
	\| Hidden size \| 3,584 \|
	\| Layers \| 28 \|
	\| Attention heads \| 28 \|
	\| Head dim \| 128 \|
	\| Intermediate size \| 18,944 \|
	\| Vocab size \| 152,064 \|
	\| Context length \| 131,072 \|
	\| Total parameters \| ~7.62B \|
	\| Trainable parameters \| ~6.5M (LoRA) \|

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2")

	messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
	response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	---

	## Performance

	Informal GPU testing across 8 categories:

	\| Category \| Result \|
	\|---\|---\|
	\| Chain-of-Thought reasoning \| ✅ Excellent multi-step logic \|
	\| Math \| ✅ Accurate with detailed work shown \|
	\| Code generation \| ✅ Clean, well-commented Python \|
	\| Logic puzzles \| ✅ Thorough deductive reasoning \|
	\| General knowledge \| ✅ Accurate, detailed explanations \|
	\| Complex reasoning \| ✅ Handles multi-step word problems well \|

	---

	## Inelly / GenueAI Model Family

	\| Model \| Size \| Focus \|
	\|---\|---\|---\|
	\| Matrix 2 (this model) \| 7B \| Deep CoT reasoning, math, coding \|
	\| Inelly 4.5 \| 3B \| Conversation + politeness + CoT \|
	\| Inelly 4.5 Blaze \| 1.5B \| Fast reasoning + CoT \|

	---

	## Limitations

	- Safety: Inherited from DeepSeek-R1 base; not specifically safety-tuned. May occasionally follow harmful instructions.
	- Memory: Requires ~5.2GB VRAM for inference (FP16)
	- Context length: Fine-tuned on 512-token sequences; base supports 128K but fine-tuned performance is optimized for shorter contexts
	- Factual accuracy: May hallucinate in specialized domains (law, medicine, finance)
	- Speed: Slower than 1.5B/3B variants due to size

	---

	## Acknowledgments

	- [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) by DeepSeek AI (base model)
	- [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset
	- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team
	- [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1

	---

	## Citation

	```
	@misc{matrix2,
	title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model},
	author = {Bry},
	organization = {GenueAI},
	year = {2026},
	note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA},
	}
	```