Instructions to use elitenandu/Qwen3-0.6B-Base-CPT-Math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="elitenandu/Qwen3-0.6B-Base-CPT-Math")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("elitenandu/Qwen3-0.6B-Base-CPT-Math")
model = AutoModelForCausalLM.from_pretrained("elitenandu/Qwen3-0.6B-Base-CPT-Math")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "elitenandu/Qwen3-0.6B-Base-CPT-Math"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "elitenandu/Qwen3-0.6B-Base-CPT-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/elitenandu/Qwen3-0.6B-Base-CPT-Math

SGLang

How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "elitenandu/Qwen3-0.6B-Base-CPT-Math" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "elitenandu/Qwen3-0.6B-Base-CPT-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "elitenandu/Qwen3-0.6B-Base-CPT-Math" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "elitenandu/Qwen3-0.6B-Base-CPT-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for elitenandu/Qwen3-0.6B-Base-CPT-Math to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="elitenandu/Qwen3-0.6B-Base-CPT-Math",
    max_seq_length=2048,
)

Docker Model Runner
How to use elitenandu/Qwen3-0.6B-Base-CPT-Math with Docker Model Runner:
```
docker model run hf.co/elitenandu/Qwen3-0.6B-Base-CPT-Math
```

Qwen3-0.6B-Base-CPT-Math

A continued pretraining (CPT) adapted version of Qwen3-0.6B-Base, fine-tuned on mathematics domain data to enhance the model's knowledge and reasoning capabilities in mathematical tasks.

Model Details

Model Description

This model is Qwen3-0.6B-Base fine-tuned using Continued Pretraining (CPT) with full parameter updates on a curated mathematics pretraining dataset. Unlike instruction tuning which uses Q&A pairs, this model was exposed to raw mathematical text to deepen its understanding of mathematical concepts, notation, and problem-solving patterns.

Key characteristics:

Base Model: Qwen/Qwen3-0.6B-Base
Training Method: Full finetuning (100% parameter updates, no LoRA)
Domain: Mathematics
Context Length: Up to 1024-2048 tokens
Optimization: Unsloth with Flash Attention 2
Developed by: Dayanand (based on Alibaba Qwen team's Qwen3-0.6B-Base)
Model type: Language Model (Decoder-only, Causal LM)
Language(s): English, with strong mathematical domain coverage
License: Qwen model's license (see Qwen/Qwen3-0.6B-Base)
Finetuned from model: Qwen/Qwen3-0.6B-Base

Model Sources

Repository: GitHub - CPT Full Finetuning
Base Model: Qwen/Qwen3-0.6B-Base
Training Data: pritamdeb68/Math-Pretraining-Data

Uses

Direct Use

This model can be used for:

Mathematical text generation - Generate mathematical explanations, derivations, or proofs
Domain-specific language modeling - Continue text in mathematical contexts
Math problem analysis - Understand and analyze mathematical problems
Knowledge retrieval - Answer questions about mathematical concepts

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen3-0.6B-Base-CPT-Math")
tokenizer = AutoTokenizer.from_pretrained("Qwen3-0.6B-Base-CPT-Math")

inputs = tokenizer("Given a quadratic equation ax^2 + bx + c = 0", return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)
print(tokenizer.decode(outputs[0]))

Downstream Use

This model can be fine-tuned for:

Math Question Answering - Answer mathematical questions with detailed explanations
Mathematical Reasoning - Solve step-by-step math problems
Educational Content Generation - Create math tutorials and explanations
Mathematical Code Generation - Generate code for mathematical algorithms

Out-of-Scope Use

Non-English content generation - Model primarily trained on English mathematical texts
Real-time critical applications - Not suitable for safety-critical systems
General knowledge tasks outside mathematics - While it retains general language abilities, it's optimized for mathematical domain
Instruction following without further fine-tuning - This is a base model, not instruction-tuned

Bias, Risks, and Limitations

Limitations

Domain Specificity - Model performs best on mathematical content; general language performance may vary
Model Size - 0.6B parameters means lower capability compared to larger models (7B+)
Context Length - Maximum sequence length of 1024-2048 tokens limits very long document processing
Training Data Bias - Mathematical domain data may have specific biases and limitations
Hallucination Risk - Like all language models, may generate plausible-sounding but incorrect mathematical statements

Risks

Mathematical Errors - May produce mathematically incorrect but grammatically plausible content
Computational Resource Requirements - While small, still requires GPU for efficient inference
Overconfidence - Model may express high confidence in incorrect mathematical statements

Recommendations

Validation Required - Always validate mathematical outputs for correctness
Human Review - Use model outputs as assistance, not authoritative source
Domain Expertise - Have domain experts review critical applications
Testing - Thoroughly test on your specific use cases before deployment
Prompt Engineering - Use clear, well-structured prompts for better results

How to Get Started with the Model

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_id = "Qwen3-0.6B-Base-CPT-Math"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Generate text
prompt = "The derivative of f(x) = x^3 + 2x^2 is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100, temperature=0.7, top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

With Unsloth (Faster Inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Qwen3-0.6B-Base-CPT-Math",
    max_seq_length=1024,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)

# Use as normal
prompt = "Solve for x: 2x + 5 = 13"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Dataset: pritamdeb68/Math-Pretraining-Data
Split: train[:10000] (10,000 samples for this run)
Domain: Mathematics (problem sets, derivations, proofs, explanations)
Format: Raw text documents (continued pretraining format)

Data Preprocessing:

Tokenized using Qwen tokenizer
Packed into sequences of 1024-2048 tokens
No special instruction formatting (raw domain text)

Training Procedure

Preprocessing

Tokenization - All documents tokenized with Qwen tokenizer
Packing - Short documents concatenated to fill context window (1024+ tokens)
Sequence Masking - Standard causal language modeling masking applied

Training Hyperparameters

Training regime: bf16 mixed precision (bfloat16 with bf16 optimizer states)
Learning rate: 2e-5 (lower than typical LoRA due to full finetuning)
Warmup steps: 100
Per-device batch size: 4
Gradient accumulation steps: 4
Effective batch size: 16 (4 × 4)
Number of epochs: 1
Optimizer: AdamW 8-bit (memory efficient)
Weight decay: 0.01
Max sequence length: 1024
Logging steps: 20
Packing enabled: True (critical for CPT efficiency)

Optimization Details

Unsloth Optimization: Flash Attention 2 enabled
Compute Capability Required: 8.0+ (A100, A10G, RTX 3090/4090, H100, etc.)
Memory Optimization: 8-bit AdamW for reduced optimizer state memory

Speeds, Sizes, Times

Training Time: ~30-45 minutes on A10G GPU
Training Tokens: ~10M tokens
Model Size: ~1.2 GB (full precision)
Peak VRAM: ~18-20 GB (on 23GB A10G)
Steps Completed: 312 total training steps

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation conducted on held-out samples from Math-Pretraining-Data
Manual evaluation of mathematical accuracy and reasoning quality

Metrics

Training Loss: Final loss ~2.34 (converged after 1 epoch)
Perplexity: Calculated from validation loss
Manual Evaluation: Spot-check of generated mathematical content for:
- Syntactic correctness
- Mathematical accuracy
- Coherence and relevance

Results

Results from continued pretraining show:

Effective domain knowledge transfer on mathematics
Improved mathematical terminology usage
Better mathematical problem structure understanding

Note: Comprehensive benchmark results pending formal evaluation suite

Model Examination

Interpretability Insights

Model successfully learned mathematical domain patterns through raw text exposure
Context window effectively used for multi-step mathematical reasoning
Maintains base model's general language capabilities while enhancing mathematical knowledge

Environmental Impact

Carbon emissions estimate:

Hardware Type: NVIDIA A10G Tensor GPU
Hours used: ~0.75 hours
Cloud Provider: Hugging Face Endpoints
Compute Region: US-based datacenter
Carbon Emitted: ~0.12 kg CO2eq (estimated using ML Impact calculator)

Training a 0.6B model is relatively efficient compared to larger models (7B+).

Technical Specifications

Model Architecture

Architecture: Transformer decoder-only (causal language model)
Parameters: 600M (0.6B)
Attention: Multi-head self-attention with causal masking
Activation: SiLU (Swish)
Positional Embeddings: Rotary Position Embeddings (RoPE)

Compute Infrastructure

Hardware

GPU: NVIDIA A10G (24GB VRAM)
Compute Capability: 8.6
CPU: AMD EPYC processor
Memory: 100+ GB system RAM

Software

PyTorch: 2.1+
Transformers: 4.40+
Unsloth: Latest version with Flash Attention 2
TRL: Hugging Face TRL library for SFTTrainer
Python: 3.12+

Citation

If you use this model, please cite:

BibTeX:

@model{qwen3_0.6b_cpt_math,
  author = {Dayanand},
  title = {Qwen3-0.6B-Base-CPT-Math: Continued Pretraining for Mathematical Domain Adaptation},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR-USERNAME/Qwen3-0.6B-Base-CPT-Math}}
}

APA:

Dayanand. (2026). Qwen3-0.6B-Base-CPT-Math: Continued pretraining for mathematical domain adaptation. Hugging Face. https://huggingface.co/YOUR-USERNAME/Qwen3-0.6B-Base-CPT-Math

Also cite the base model:

Qwen Team (2024). Qwen3-0.6B-Base. Alibaba. https://huggingface.co/Qwen/Qwen3-0.6B-Base

Glossary

CPT (Continued Pretraining): Further pretraining of a base model on domain-specific data
Full Finetuning: Training all model parameters (vs. LoRA which only trains adapters)
Flash Attention: Memory-efficient attention implementation enabling longer contexts
Packing: Concatenating multiple short documents into longer sequences for training efficiency
BF16: Brain Float 16-bit precision format, optimal for modern GPUs
Causal LM: Language model that predicts next token based on previous tokens
Perplexity: Measure of model uncertainty; lower is better

More Information

For detailed implementation and reproducibility:

See GitHub Repository
Training script: main.py
Setup guide: README.md
Original research: Refer to Continued Pretraining literature

Model Card Authors

Card Author: Dayanand
Model Developer: Dayanand
Based on: Qwen Team (Alibaba Qwen3-0.6B-Base)

Model Card Contact

For questions or issues:

GitHub Issues: GitHub Repository Issues
Email: [Your Email Here]
Hugging Face Discussions: Model Page Discussions

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

BF16