Shivik 1.7B - Phase 1 (General Knowledge)

Model Description

Shivik Phase 1 is a 1.7B parameter language model trained for reasoning and chain-of-thought (CoT) generation.

Base Architecture: Based on Llama 3.2 1B architecture
Parameters: ~1.7B
Training: Phase 1 - General knowledge foundation (80K samples)
Capabilities: General reasoning, basic CoT structure, math, code, and language understanding

Training Details

Phase 1 Training (This Model)

Samples: 80,000
Data Mix:
- 50% Web & General Knowledge (Cosmopedia, Tulu-3, PersonaHub, General-Knowledge)
- 20% Textbooks & Education (TextbookReasoning, GPTscience)
- 10% Medical & Health (medical-o1-reasoning, medical-QA)
- 10% Code (Magicoder-OSS, Magicoder-Evol)
- 5% STEM & Engineering (Electrical-engineering, OpenMathInstruct)
- 5% Reasoning Basics (reasoning-base-20k, thinker)
Epochs: 1
Max Length: 1024 tokens
Training Method: LoRA fine-tuning (rank 64)

Architecture

Hidden Size: 2048
Layers: 16
Attention Heads: 32 (8 KV heads)
Vocabulary: 128,262 tokens (extended with reasoning tokens)
Context Length: 131,072 tokens

Model Performance

Evaluation Results

Format Score: 6/9
Has <think> tags: ✅ Yes
Has <answer> tags: ✅ Yes
Correct answers: ✅ Yes (tested on math problems)
Content generation: ✅ 1500+ chars average
Status: ⚠️ Missing <step> tags (can be added with better prompting)

Comparison

vs Phase 2/3: Phase 1 is the ONLY working model (Phase 2/3 broken)
vs Base Model: Significant improvement in reasoning structure
Use Case: Best for general Q&A with reasoning, not yet perfect CoT format

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_id = "abhishek-0122/Shivik-1.7B-Phase1-General"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Format prompt
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are Shivik, an advanced reasoning AI. Show your thinking using <think> tags. Break down your reasoning into steps. Provide answers in <answer> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>

What is 15 × 24?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

'''

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Expected Output Format

<think>

15 × 24 can be broken down step by step.

First, let me use the distributive property:
15 × 24 = 15 × (20 + 4)
       = (15 × 20) + (15 × 4)
       = 300 + 60
       = 360

</think>
<answer>
360
</answer>

Recommended Generation Parameters

generation_config = {
    "max_new_tokens": 1024,      # Adjust based on task complexity
    "temperature": 0.7,           # Lower (0.3-0.5) for math, higher (0.7-0.9) for creative
    "top_p": 0.9,
    "repetition_penalty": 1.2,    # Prevents repetition
    "do_sample": True,
}

Limitations

⚠️ Incomplete CoT format: Has <think> and <answer> tags, but missing <step> tags
⚠️ Not production-ready: This is Phase 1, more training needed for perfect CoT
⚠️ Better with prompting: Needs explicit instructions to use step-by-step reasoning
⚠️ 1.7B size: Smaller than models like Qwen-3B, may have less knowledge

Recommended Use Cases

✅ Good for:

General Q&A with reasoning structure
Math problems with explanation
Code explanation
Educational content
Experimenting with CoT prompting

❌ Not recommended for:

Production CoT applications (wait for Phase 2 distilled)
Tasks requiring perfect multi-step format
Safety-critical applications

Model Family

This is part of the Shivik model series:

Phase 1 (This Model): General knowledge foundation - WORKING
Phase 2: Long-form CoT training - BROKEN (only outputs tags)
Phase 3: Format refinement - BROKEN (built on broken Phase 2)
Phase 2 Distilled (Upcoming): Fixed with teacher distillation

Future Plans

🔄 Phase 2 Distilled: Training with teacher models (DeepSeek-R1, Qwen-Math, Qwen-Coder)
✨ Phase 3 Refined: Perfect CoT format with <step> and <verify> tags
📈 Larger Models: 2.5B and 3.5B variants
🧠 GNN Memory: Graph neural network for persistent memory

Citation

@model{shivik-phase1-2025,
  title={Shivik 1.7B Phase 1: General Knowledge Foundation},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/abhishek-0122/Shivik-1.7B-Phase1-General}
}

License

Apache 2.0

Contact

Creator: [Your Name/Handle]
Project: Shivik - Reasoning-capable small language models

Note: This is an experimental model from an active research project. Phase 1 works but is not production-ready. A distilled version with proper CoT format is in development.

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

BF16