Update README.md

e97a7e7 verified 20 days ago

4.24 kB

	---
	license: apache-2.0
	base_model: Nebulixlabs/Nutral-Base
	tags:
	- text-generation
	- custom-architecture
	- qwen
	- instruct
	- reasoning
	- chain-of-thought
	- peft
	- lora
	- alpaca
	language:
	- en
	pipeline_tag: text-generation
	---

	# 🧠 Nutral Reasoning Instruct

	Nutral Reasoning Instruct is a highly optimized, lightweight instructional language model capable of structured Chain-of-Thought (CoT) Reasoning. Built on top of the custom pre-trained Nutral Base, this model has undergone Supervised Fine-Tuning (SFT) to follow user instructions while explicitly generating analytical thought processes before providing the final answer.

	The model naturally outputs its internal reasoning steps inside `<think>` and `</think>` blocks, making its decision-making process transparent and highly structured.

	---

	## 📌 Model Details

	* Base Architecture: Qwen2 (`Qwen2ForCausalLM`)
	* Training Type: Supervised Fine-Tuning (SFT) with LoRA (Merged & Unloaded)
	* Natural Language: English (`en`)
	* Programming Language: Python
	* Primary Task: Instruction Following & Analytical Reasoning
	* Format Supported: ChatML + Explicit `<think>` blocks

	---

	## 📊 Architecture & Parameters

	The core architecture shares the exact high-speed, lightweight blueprint of the Nutral Base model. During Phase 2, LoRA adapters were trained and permanently merged into the base weights for zero-latency inference.

	\| Hyperparameter \| Configuration Value \|
	\| :--- \| :--- \|
	\| Total Parameters \| ~17.5 Million (17,498,368) \|
	\| Embedding Dimension \| 512 \|
	\| Number of Layers \| 8 \|
	\| Attention Heads \| 8 \|
	\| Context Window \| 256 tokens \|
	\| LoRA Configuration \| `r=8`, `alpha=16`, `dropout=0.05` \|
	\| Target Modules \| `q_proj`, `v_proj` \|

	---

	## 🛠️ Fine-Tuning Dataset & SFT Strategy

	The model was fine-tuned using a dynamically generated synthetic reasoning methodology to bypass standard TRL library limitations, ensuring perfect ChatML alignment.

	* Dataset Name: `tatsu-lab/alpaca` (Train split subset: 2,500 highly curated samples)
	* Reasoning Injection: Each instruction was dynamically categorized (e.g., Analytical Reasoning, Creative Generation, Instructional Breakdown) to synthetically generate a multi-phase thought process (Intent, Retrieval, Logic, Output).
	* Objective: Causal Language Modeling applied to structured instruction-response pairs.

	---

	## ⚙️ Hardware & SFT Infrastructure

	The Instruct phase utilized Parameter-Efficient Fine-Tuning (PEFT) on Kaggle's multi-GPU infrastructure to optimize VRAM utilization:

	* Hardware Used: 2x NVIDIA T4 Tensor Core GPUs
	* Precision Mode: FP16 (Mixed Precision)
	* Optimizer Setup: AdamW with a learning rate of `3e-4`
	* Batching: Per-device batch size of 8 with 4 gradient accumulation steps.
	* Epochs: 1

	---

	## 📦 Core Technical Libraries Used

	* `transformers` - Core model loading, ChatML formatting, and primary training loop (`Trainer`).
	* `peft` - Applied Low-Rank Adaptation (LoRA) to efficiently train specific attention weights without catastrophically forgetting base knowledge.
	* `datasets` - Used to fetch and process the Hugging Face Alpaca instruction dataset.
	* `llama.cpp` - Utilized post-training to compile the raw FP16 PyTorch tensors into highly optimized GGUF binaries for edge-device deployment.

	---

	## 💬 Prompt Format (Crucial for Reasoning)

	To utilize the reasoning capabilities correctly, you must use the ChatML format. The model is trained to expect `<\|im_start\|>system`, `<\|im_start\|>user`, and `<\|im_start\|>assistant` tags.

	```text
	<\|im_start\|>system
	You are Nutral_Qwen, a highly intelligent AI. Always reason your thoughts inside <think> and </think> blocks.<\|im_end\|>
	<\|im_start\|>user
	Write a short poem about the moon.<\|im_end\|>
	<\|im_start\|>assistant
	<think>
	[Phase 1: Intent] Task classified as 'Creative Generation'. Analyzing: 'Write a short poem about the m...'
	[Phase 2: Retrieval] Gathering key facts and constraints.
	[Phase 3: Logic] Formulating step-by-step response.
	[Phase 4: Output] Structuring final answer.
	</think>
	The silver orb in the velvet night,
	Casting down its gentle light...<\|im_end\|>