Instructions to use elitenandu/Qwen3-0.6B-Base-CPT-Math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="elitenandu/Qwen3-0.6B-Base-CPT-Math")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("elitenandu/Qwen3-0.6B-Base-CPT-Math") model = AutoModelForCausalLM.from_pretrained("elitenandu/Qwen3-0.6B-Base-CPT-Math") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "elitenandu/Qwen3-0.6B-Base-CPT-Math" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "elitenandu/Qwen3-0.6B-Base-CPT-Math", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/elitenandu/Qwen3-0.6B-Base-CPT-Math
- SGLang
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "elitenandu/Qwen3-0.6B-Base-CPT-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "elitenandu/Qwen3-0.6B-Base-CPT-Math", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "elitenandu/Qwen3-0.6B-Base-CPT-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "elitenandu/Qwen3-0.6B-Base-CPT-Math", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="elitenandu/Qwen3-0.6B-Base-CPT-Math", max_seq_length=2048, ) - Docker Model Runner
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Docker Model Runner:
docker model run hf.co/elitenandu/Qwen3-0.6B-Base-CPT-Math
Qwen3-0.6B-Base-CPT-Math
A continued pretraining (CPT) adapted version of Qwen3-0.6B-Base, fine-tuned on mathematics domain data to enhance the model's knowledge and reasoning capabilities in mathematical tasks.
Model Details
Model Description
This model is Qwen3-0.6B-Base fine-tuned using Continued Pretraining (CPT) with full parameter updates on a curated mathematics pretraining dataset. Unlike instruction tuning which uses Q&A pairs, this model was exposed to raw mathematical text to deepen its understanding of mathematical concepts, notation, and problem-solving patterns.
Key characteristics:
Base Model: Qwen/Qwen3-0.6B-Base
Training Method: Full finetuning (100% parameter updates, no LoRA)
Domain: Mathematics
Context Length: Up to 1024-2048 tokens
Optimization: Unsloth with Flash Attention 2
Developed by: Dayanand (based on Alibaba Qwen team's Qwen3-0.6B-Base)
Model type: Language Model (Decoder-only, Causal LM)
Language(s): English, with strong mathematical domain coverage
License: Qwen model's license (see Qwen/Qwen3-0.6B-Base)
Finetuned from model: Qwen/Qwen3-0.6B-Base
Model Sources
- Repository: GitHub - CPT Full Finetuning
- Base Model: Qwen/Qwen3-0.6B-Base
- Training Data: pritamdeb68/Math-Pretraining-Data
Uses
Direct Use
This model can be used for:
- Mathematical text generation - Generate mathematical explanations, derivations, or proofs
- Domain-specific language modeling - Continue text in mathematical contexts
- Math problem analysis - Understand and analyze mathematical problems
- Knowledge retrieval - Answer questions about mathematical concepts
Example usage:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Qwen3-0.6B-Base-CPT-Math")
tokenizer = AutoTokenizer.from_pretrained("Qwen3-0.6B-Base-CPT-Math")
inputs = tokenizer("Given a quadratic equation ax^2 + bx + c = 0", return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0]))
Downstream Use
This model can be fine-tuned for:
- Math Question Answering - Answer mathematical questions with detailed explanations
- Mathematical Reasoning - Solve step-by-step math problems
- Educational Content Generation - Create math tutorials and explanations
- Mathematical Code Generation - Generate code for mathematical algorithms
Out-of-Scope Use
- Non-English content generation - Model primarily trained on English mathematical texts
- Real-time critical applications - Not suitable for safety-critical systems
- General knowledge tasks outside mathematics - While it retains general language abilities, it's optimized for mathematical domain
- Instruction following without further fine-tuning - This is a base model, not instruction-tuned
Bias, Risks, and Limitations
Limitations
- Domain Specificity - Model performs best on mathematical content; general language performance may vary
- Model Size - 0.6B parameters means lower capability compared to larger models (7B+)
- Context Length - Maximum sequence length of 1024-2048 tokens limits very long document processing
- Training Data Bias - Mathematical domain data may have specific biases and limitations
- Hallucination Risk - Like all language models, may generate plausible-sounding but incorrect mathematical statements
Risks
- Mathematical Errors - May produce mathematically incorrect but grammatically plausible content
- Computational Resource Requirements - While small, still requires GPU for efficient inference
- Overconfidence - Model may express high confidence in incorrect mathematical statements
Recommendations
- Validation Required - Always validate mathematical outputs for correctness
- Human Review - Use model outputs as assistance, not authoritative source
- Domain Expertise - Have domain experts review critical applications
- Testing - Thoroughly test on your specific use cases before deployment
- Prompt Engineering - Use clear, well-structured prompts for better results
How to Get Started with the Model
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_id = "Qwen3-0.6B-Base-CPT-Math"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Generate text
prompt = "The derivative of f(x) = x^3 + 2x^2 is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100, temperature=0.7, top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
With Unsloth (Faster Inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Qwen3-0.6B-Base-CPT-Math",
max_seq_length=1024,
dtype=torch.bfloat16,
load_in_4bit=True,
)
# Use as normal
prompt = "Solve for x: 2x + 5 = 13"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
- Dataset: pritamdeb68/Math-Pretraining-Data
- Split:
train[:10000](10,000 samples for this run) - Domain: Mathematics (problem sets, derivations, proofs, explanations)
- Format: Raw text documents (continued pretraining format)
Data Preprocessing:
- Tokenized using Qwen tokenizer
- Packed into sequences of 1024-2048 tokens
- No special instruction formatting (raw domain text)
Training Procedure
Preprocessing
- Tokenization - All documents tokenized with Qwen tokenizer
- Packing - Short documents concatenated to fill context window (1024+ tokens)
- Sequence Masking - Standard causal language modeling masking applied
Training Hyperparameters
- Training regime: bf16 mixed precision (bfloat16 with bf16 optimizer states)
- Learning rate: 2e-5 (lower than typical LoRA due to full finetuning)
- Warmup steps: 100
- Per-device batch size: 4
- Gradient accumulation steps: 4
- Effective batch size: 16 (4 × 4)
- Number of epochs: 1
- Optimizer: AdamW 8-bit (memory efficient)
- Weight decay: 0.01
- Max sequence length: 1024
- Logging steps: 20
- Packing enabled: True (critical for CPT efficiency)
Optimization Details
- Unsloth Optimization: Flash Attention 2 enabled
- Compute Capability Required: 8.0+ (A100, A10G, RTX 3090/4090, H100, etc.)
- Memory Optimization: 8-bit AdamW for reduced optimizer state memory
Speeds, Sizes, Times
- Training Time: ~30-45 minutes on A10G GPU
- Training Tokens: ~10M tokens
- Model Size: ~1.2 GB (full precision)
- Peak VRAM: ~18-20 GB (on 23GB A10G)
- Steps Completed: 312 total training steps
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Evaluation conducted on held-out samples from Math-Pretraining-Data
- Manual evaluation of mathematical accuracy and reasoning quality
Metrics
- Training Loss: Final loss ~2.34 (converged after 1 epoch)
- Perplexity: Calculated from validation loss
- Manual Evaluation: Spot-check of generated mathematical content for:
- Syntactic correctness
- Mathematical accuracy
- Coherence and relevance
Results
Results from continued pretraining show:
- Effective domain knowledge transfer on mathematics
- Improved mathematical terminology usage
- Better mathematical problem structure understanding
Note: Comprehensive benchmark results pending formal evaluation suite
Model Examination
Interpretability Insights
- Model successfully learned mathematical domain patterns through raw text exposure
- Context window effectively used for multi-step mathematical reasoning
- Maintains base model's general language capabilities while enhancing mathematical knowledge
Environmental Impact
Carbon emissions estimate:
- Hardware Type: NVIDIA A10G Tensor GPU
- Hours used: ~0.75 hours
- Cloud Provider: Hugging Face Endpoints
- Compute Region: US-based datacenter
- Carbon Emitted: ~0.12 kg CO2eq (estimated using ML Impact calculator)
Training a 0.6B model is relatively efficient compared to larger models (7B+).
Technical Specifications
Model Architecture
- Architecture: Transformer decoder-only (causal language model)
- Parameters: 600M (0.6B)
- Attention: Multi-head self-attention with causal masking
- Activation: SiLU (Swish)
- Positional Embeddings: Rotary Position Embeddings (RoPE)
Compute Infrastructure
Hardware
- GPU: NVIDIA A10G (24GB VRAM)
- Compute Capability: 8.6
- CPU: AMD EPYC processor
- Memory: 100+ GB system RAM
Software
- PyTorch: 2.1+
- Transformers: 4.40+
- Unsloth: Latest version with Flash Attention 2
- TRL: Hugging Face TRL library for SFTTrainer
- Python: 3.12+
Citation
If you use this model, please cite:
BibTeX:
@model{qwen3_0.6b_cpt_math,
author = {Dayanand},
title = {Qwen3-0.6B-Base-CPT-Math: Continued Pretraining for Mathematical Domain Adaptation},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/YOUR-USERNAME/Qwen3-0.6B-Base-CPT-Math}}
}
APA:
Dayanand. (2026). Qwen3-0.6B-Base-CPT-Math: Continued pretraining for mathematical domain adaptation. Hugging Face. https://huggingface.co/YOUR-USERNAME/Qwen3-0.6B-Base-CPT-Math
Also cite the base model:
- Qwen Team (2024). Qwen3-0.6B-Base. Alibaba. https://huggingface.co/Qwen/Qwen3-0.6B-Base
Glossary
- CPT (Continued Pretraining): Further pretraining of a base model on domain-specific data
- Full Finetuning: Training all model parameters (vs. LoRA which only trains adapters)
- Flash Attention: Memory-efficient attention implementation enabling longer contexts
- Packing: Concatenating multiple short documents into longer sequences for training efficiency
- BF16: Brain Float 16-bit precision format, optimal for modern GPUs
- Causal LM: Language model that predicts next token based on previous tokens
- Perplexity: Measure of model uncertainty; lower is better
More Information
For detailed implementation and reproducibility:
- See GitHub Repository
- Training script:
main.py - Setup guide:
README.md - Original research: Refer to Continued Pretraining literature
Model Card Authors
- Card Author: Dayanand
- Model Developer: Dayanand
- Based on: Qwen Team (Alibaba Qwen3-0.6B-Base)
Model Card Contact
For questions or issues:
- GitHub Issues: GitHub Repository Issues
- Email: [Your Email Here]
- Hugging Face Discussions: Model Page Discussions
- Downloads last month
- 29