πŸš€ Sutra-Instruct-v2-350M

Sutra-Instruct-v2-350M is a highly optimized, 350-million parameter language model built for developers and researchers. It has undergone rigorous Continuous Pre-Training (CPT) to dramatically improve its logical reasoning, mathematical deduction, and Python coding capabilities.

Despite its lightweight footprint, Sutra-Instruct-v2 acts as a highly capable "Witty Coder" and educational assistant, punching well above its weight class thanks to a carefully curated, high-IQ dataset recipe.

🧠 Model Details

  • Architecture: 350M Parameter Custom Transformer (GPT-style)
  • Format: Safetensors (Blazing fast, zero-copy loading)
  • Primary Focus: Python Code Generation, Mathematical Reasoning, Educational QA.
  • Developer: Abhiray / Jay

πŸ“š Training Data Recipe (Continuous Pre-Training)

To transition the model from a general-purpose base to a reasoning and bit of coding , we executed a Continuous Pre-Training phase using approximately 3.3 Billion high-quality tokens.

The dataset mixture was meticulously balanced to prevent overfitting while maximizing logical deduction:

Dataset Source Focus Target Tokens
The Stack Dedup bigcode/the-stack-dedup Pure Python Code (content) 1.5 Billion
DCLM-Edu HuggingFaceTB/dclm-edu High-IQ Educational Web Data 600 Million
Cosmopedia (Stanford) HuggingFaceTB/cosmopedia Synthetic Textbooks 500 Million
Tiny-Codes-QA nampdn-ai/tiny-codes Prompt/Response Coding Tasks 500 Million
Tiny-Textbooks nampdn-ai/tiny-textbooks General Knowledge / STEM 100 Million
MetaMathQA meta-math/MetaMathQA Mathematical Reasoning 100 Million

πŸ› οΈ Intended Use Cases

  • Lightweight Code Copilot: Generating Python scripts, debugging syntax, and writing algorithms locally on low-VRAM hardware.
  • Edge Devices: Perfect for deployment on Raspberry Pi, older laptops, or mobile devices where massive LLMs cannot run.
  • Educational QA: Answering STEM, logic, and general knowledge questions.
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train Abhiray/Sutra-Instruct-v2-350M

Collection including Abhiray/Sutra-Instruct-v2-350M