Nutral-Reasoning / README.md
Nebulixlabs's picture
Update README.md
e97a7e7 verified
|
Raw
History Blame Contribute Delete
4.24 kB
metadata
license: apache-2.0
base_model: Nebulixlabs/Nutral-Base
tags:
  - text-generation
  - custom-architecture
  - qwen
  - instruct
  - reasoning
  - chain-of-thought
  - peft
  - lora
  - alpaca
language:
  - en
pipeline_tag: text-generation

🧠 Nutral Reasoning Instruct

Nutral Reasoning Instruct is a highly optimized, lightweight instructional language model capable of structured Chain-of-Thought (CoT) Reasoning. Built on top of the custom pre-trained Nutral Base, this model has undergone Supervised Fine-Tuning (SFT) to follow user instructions while explicitly generating analytical thought processes before providing the final answer.

The model naturally outputs its internal reasoning steps inside <think> and </think> blocks, making its decision-making process transparent and highly structured.


πŸ“Œ Model Details

  • Base Architecture: Qwen2 (Qwen2ForCausalLM)
  • Training Type: Supervised Fine-Tuning (SFT) with LoRA (Merged & Unloaded)
  • Natural Language: English (en)
  • Programming Language: Python
  • Primary Task: Instruction Following & Analytical Reasoning
  • Format Supported: ChatML + Explicit <think> blocks

πŸ“Š Architecture & Parameters

The core architecture shares the exact high-speed, lightweight blueprint of the Nutral Base model. During Phase 2, LoRA adapters were trained and permanently merged into the base weights for zero-latency inference.

Hyperparameter Configuration Value
Total Parameters ~17.5 Million (17,498,368)
Embedding Dimension 512
Number of Layers 8
Attention Heads 8
Context Window 256 tokens
LoRA Configuration r=8, alpha=16, dropout=0.05
Target Modules q_proj, v_proj

πŸ› οΈ Fine-Tuning Dataset & SFT Strategy

The model was fine-tuned using a dynamically generated synthetic reasoning methodology to bypass standard TRL library limitations, ensuring perfect ChatML alignment.

  • Dataset Name: tatsu-lab/alpaca (Train split subset: 2,500 highly curated samples)
  • Reasoning Injection: Each instruction was dynamically categorized (e.g., Analytical Reasoning, Creative Generation, Instructional Breakdown) to synthetically generate a multi-phase thought process (Intent, Retrieval, Logic, Output).
  • Objective: Causal Language Modeling applied to structured instruction-response pairs.

βš™οΈ Hardware & SFT Infrastructure

The Instruct phase utilized Parameter-Efficient Fine-Tuning (PEFT) on Kaggle's multi-GPU infrastructure to optimize VRAM utilization:

  • Hardware Used: 2x NVIDIA T4 Tensor Core GPUs
  • Precision Mode: FP16 (Mixed Precision)
  • Optimizer Setup: AdamW with a learning rate of 3e-4
  • Batching: Per-device batch size of 8 with 4 gradient accumulation steps.
  • Epochs: 1

πŸ“¦ Core Technical Libraries Used

  • transformers - Core model loading, ChatML formatting, and primary training loop (Trainer).
  • peft - Applied Low-Rank Adaptation (LoRA) to efficiently train specific attention weights without catastrophically forgetting base knowledge.
  • datasets - Used to fetch and process the Hugging Face Alpaca instruction dataset.
  • llama.cpp - Utilized post-training to compile the raw FP16 PyTorch tensors into highly optimized GGUF binaries for edge-device deployment.

πŸ’¬ Prompt Format (Crucial for Reasoning)

To utilize the reasoning capabilities correctly, you must use the ChatML format. The model is trained to expect <|im_start|>system, <|im_start|>user, and <|im_start|>assistant tags.

<|im_start|>system
You are Nutral_Qwen, a highly intelligent AI. Always reason your thoughts inside <think> and </think> blocks.<|im_end|>
<|im_start|>user
Write a short poem about the moon.<|im_end|>
<|im_start|>assistant
<think>
[Phase 1: Intent] Task classified as 'Creative Generation'. Analyzing: 'Write a short poem about the m...'
[Phase 2: Retrieval] Gathering key facts and constraints.
[Phase 3: Logic] Formulating step-by-step response.
[Phase 4: Output] Structuring final answer.
</think>
The silver orb in the velvet night,
Casting down its gentle light...<|im_end|>