Shivik 1.7B - Phase 1 (General Knowledge)

Model Description

Shivik Phase 1 is a 1.7B parameter language model trained for reasoning and chain-of-thought (CoT) generation.

  • Base Architecture: Based on Llama 3.2 1B architecture
  • Parameters: ~1.7B
  • Training: Phase 1 - General knowledge foundation (80K samples)
  • Capabilities: General reasoning, basic CoT structure, math, code, and language understanding

Training Details

Phase 1 Training (This Model)

  • Samples: 80,000
  • Data Mix:
    • 50% Web & General Knowledge (Cosmopedia, Tulu-3, PersonaHub, General-Knowledge)
    • 20% Textbooks & Education (TextbookReasoning, GPTscience)
    • 10% Medical & Health (medical-o1-reasoning, medical-QA)
    • 10% Code (Magicoder-OSS, Magicoder-Evol)
    • 5% STEM & Engineering (Electrical-engineering, OpenMathInstruct)
    • 5% Reasoning Basics (reasoning-base-20k, thinker)
  • Epochs: 1
  • Max Length: 1024 tokens
  • Training Method: LoRA fine-tuning (rank 64)

Architecture

  • Hidden Size: 2048
  • Layers: 16
  • Attention Heads: 32 (8 KV heads)
  • Vocabulary: 128,262 tokens (extended with reasoning tokens)
  • Context Length: 131,072 tokens

Model Performance

Evaluation Results

  • Format Score: 6/9
  • Has <think> tags: โœ… Yes
  • Has <answer> tags: โœ… Yes
  • Correct answers: โœ… Yes (tested on math problems)
  • Content generation: โœ… 1500+ chars average
  • Status: โš ๏ธ Missing <step> tags (can be added with better prompting)

Comparison

  • vs Phase 2/3: Phase 1 is the ONLY working model (Phase 2/3 broken)
  • vs Base Model: Significant improvement in reasoning structure
  • Use Case: Best for general Q&A with reasoning, not yet perfect CoT format

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_id = "abhishek-0122/Shivik-1.7B-Phase1-General"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Format prompt
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are Shivik, an advanced reasoning AI. Show your thinking using <think> tags. Break down your reasoning into steps. Provide answers in <answer> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>

What is 15 ร— 24?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

'''

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Expected Output Format

<think>

15 ร— 24 can be broken down step by step.

First, let me use the distributive property:
15 ร— 24 = 15 ร— (20 + 4)
       = (15 ร— 20) + (15 ร— 4)
       = 300 + 60
       = 360

</think>
<answer>
360
</answer>

Recommended Generation Parameters

generation_config = {
    "max_new_tokens": 1024,      # Adjust based on task complexity
    "temperature": 0.7,           # Lower (0.3-0.5) for math, higher (0.7-0.9) for creative
    "top_p": 0.9,
    "repetition_penalty": 1.2,    # Prevents repetition
    "do_sample": True,
}

Limitations

  • โš ๏ธ Incomplete CoT format: Has <think> and <answer> tags, but missing <step> tags
  • โš ๏ธ Not production-ready: This is Phase 1, more training needed for perfect CoT
  • โš ๏ธ Better with prompting: Needs explicit instructions to use step-by-step reasoning
  • โš ๏ธ 1.7B size: Smaller than models like Qwen-3B, may have less knowledge

Recommended Use Cases

โœ… Good for:

  • General Q&A with reasoning structure
  • Math problems with explanation
  • Code explanation
  • Educational content
  • Experimenting with CoT prompting

โŒ Not recommended for:

  • Production CoT applications (wait for Phase 2 distilled)
  • Tasks requiring perfect multi-step format
  • Safety-critical applications

Model Family

This is part of the Shivik model series:

  1. Phase 1 (This Model): General knowledge foundation - WORKING
  2. Phase 2: Long-form CoT training - BROKEN (only outputs tags)
  3. Phase 3: Format refinement - BROKEN (built on broken Phase 2)
  4. Phase 2 Distilled (Upcoming): Fixed with teacher distillation

Future Plans

  • ๐Ÿ”„ Phase 2 Distilled: Training with teacher models (DeepSeek-R1, Qwen-Math, Qwen-Coder)
  • โœจ Phase 3 Refined: Perfect CoT format with <step> and <verify> tags
  • ๐Ÿ“ˆ Larger Models: 2.5B and 3.5B variants
  • ๐Ÿง  GNN Memory: Graph neural network for persistent memory

Citation

@model{shivik-phase1-2025,
  title={Shivik 1.7B Phase 1: General Knowledge Foundation},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/abhishek-0122/Shivik-1.7B-Phase1-General}
}

License

Apache 2.0

Contact

  • Creator: [Your Name/Handle]
  • Project: Shivik - Reasoning-capable small language models

Note: This is an experimental model from an active research project. Phase 1 works but is not production-ready. A distilled version with proper CoT format is in development.

Downloads last month
1
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support