Shivik-1B ๐Ÿงฎ

A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving.

๐Ÿ“Š Performance

Benchmark Score
GSM8K (5-shot) 29.72%

๐Ÿ—๏ธ Architecture

Component Value
Parameters ~1.07B
Hidden Size 2048
Layers 16
Attention Heads 32 (8 KV heads - GQA)
Context Length 131,072 tokens
Vocabulary 128,262 tokens
Precision bfloat16

๐Ÿš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "theaicompany02/Shivik-1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Math problem
prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?

Answer: Let me solve this step by step.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ“ˆ Training Details

  • Architecture: Custom SHIVIK transformer with GQA
  • Training Data: GSM8K, OpenMathReasoning, mathematical reasoning datasets
  • Method: Supervised fine-tuning with chain-of-thought reasoning
  • Precision: bfloat16 mixed precision

๐Ÿ’ก Best Practices

  1. Use step-by-step prompting: Add "Let me solve this step by step" or "Think step by step"
  2. Temperature: Use 0.7 for reasoning, 0.0 for deterministic answers
  3. Format: Structure prompts as "Question: ... Answer:"

โš ๏ธ Limitations

  • Optimized for math; may underperform on general tasks
  • Best with explicit reasoning prompts
  • May struggle with very complex multi-step problems (5+ steps)

๐Ÿ“œ License

Apache 2.0

๐Ÿ™ Acknowledgments

Built as part of the SHIVIK project - creating competitive small language models through intelligent training.

Downloads last month
15
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results