Update README.md

e97a7e7 verified 18 days ago

4.24 kB

license: apache-2.0
base_model: Nebulixlabs/Nutral-Base
tags:
  - text-generation
  - custom-architecture
  - qwen
  - instruct
  - reasoning
  - chain-of-thought
  - peft
  - lora
  - alpaca
language:
  - en
pipeline_tag: text-generation

🧠 Nutral Reasoning Instruct

Nutral Reasoning Instruct is a highly optimized, lightweight instructional language model capable of structured Chain-of-Thought (CoT) Reasoning. Built on top of the custom pre-trained Nutral Base, this model has undergone Supervised Fine-Tuning (SFT) to follow user instructions while explicitly generating analytical thought processes before providing the final answer.

The model naturally outputs its internal reasoning steps inside <think> and </think> blocks, making its decision-making process transparent and highly structured.

📌 Model Details

Base Architecture: Qwen2 (Qwen2ForCausalLM)
Training Type: Supervised Fine-Tuning (SFT) with LoRA (Merged & Unloaded)
Natural Language: English (en)
Programming Language: Python
Primary Task: Instruction Following & Analytical Reasoning
Format Supported: ChatML + Explicit <think> blocks

📊 Architecture & Parameters

The core architecture shares the exact high-speed, lightweight blueprint of the Nutral Base model. During Phase 2, LoRA adapters were trained and permanently merged into the base weights for zero-latency inference.

Hyperparameter	Configuration Value
Total Parameters	~17.5 Million (17,498,368)
Embedding Dimension	512
Number of Layers	8
Attention Heads	8
Context Window	256 tokens
LoRA Configuration	`r=8`, `alpha=16`, `dropout=0.05`
Target Modules	`q_proj`, `v_proj`

🛠️ Fine-Tuning Dataset & SFT Strategy

The model was fine-tuned using a dynamically generated synthetic reasoning methodology to bypass standard TRL library limitations, ensuring perfect ChatML alignment.

Dataset Name: tatsu-lab/alpaca (Train split subset: 2,500 highly curated samples)
Reasoning Injection: Each instruction was dynamically categorized (e.g., Analytical Reasoning, Creative Generation, Instructional Breakdown) to synthetically generate a multi-phase thought process (Intent, Retrieval, Logic, Output).
Objective: Causal Language Modeling applied to structured instruction-response pairs.

⚙️ Hardware & SFT Infrastructure

The Instruct phase utilized Parameter-Efficient Fine-Tuning (PEFT) on Kaggle's multi-GPU infrastructure to optimize VRAM utilization:

Hardware Used: 2x NVIDIA T4 Tensor Core GPUs
Precision Mode: FP16 (Mixed Precision)
Optimizer Setup: AdamW with a learning rate of 3e-4
Batching: Per-device batch size of 8 with 4 gradient accumulation steps.
Epochs: 1

📦 Core Technical Libraries Used

transformers - Core model loading, ChatML formatting, and primary training loop (Trainer).
peft - Applied Low-Rank Adaptation (LoRA) to efficiently train specific attention weights without catastrophically forgetting base knowledge.
datasets - Used to fetch and process the Hugging Face Alpaca instruction dataset.
llama.cpp - Utilized post-training to compile the raw FP16 PyTorch tensors into highly optimized GGUF binaries for edge-device deployment.

💬 Prompt Format (Crucial for Reasoning)

<|im_start|>system
You are Nutral_Qwen, a highly intelligent AI. Always reason your thoughts inside <think> and </think> blocks.<|im_end|>
<|im_start|>user
Write a short poem about the moon.<|im_end|>
<|im_start|>assistant
<think>
[Phase 1: Intent] Task classified as 'Creative Generation'. Analyzing: 'Write a short poem about the m...'
[Phase 2: Retrieval] Gathering key facts and constraints.
[Phase 3: Logic] Formulating step-by-step response.
[Phase 4: Output] Structuring final answer.
</think>
The silver orb in the velvet night,
Casting down its gentle light...<|im_end|>