Shivik-1B ๐งฎ
A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving.
๐ Performance
| Benchmark | Score |
|---|---|
| GSM8K (5-shot) | 29.72% |
๐๏ธ Architecture
| Component | Value |
|---|---|
| Parameters | ~1.07B |
| Hidden Size | 2048 |
| Layers | 16 |
| Attention Heads | 32 (8 KV heads - GQA) |
| Context Length | 131,072 tokens |
| Vocabulary | 128,262 tokens |
| Precision | bfloat16 |
๐ Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "theaicompany02/Shivik-1B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Math problem
prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?
Answer: Let me solve this step by step.
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ Training Details
- Architecture: Custom SHIVIK transformer with GQA
- Training Data: GSM8K, OpenMathReasoning, mathematical reasoning datasets
- Method: Supervised fine-tuning with chain-of-thought reasoning
- Precision: bfloat16 mixed precision
๐ก Best Practices
- Use step-by-step prompting: Add "Let me solve this step by step" or "Think step by step"
- Temperature: Use 0.7 for reasoning, 0.0 for deterministic answers
- Format: Structure prompts as "Question: ... Answer:"
โ ๏ธ Limitations
- Optimized for math; may underperform on general tasks
- Best with explicit reasoning prompts
- May struggle with very complex multi-step problems (5+ steps)
๐ License
Apache 2.0
๐ Acknowledgments
Built as part of the SHIVIK project - creating competitive small language models through intelligent training.
- Downloads last month
- 15
Evaluation results
- Accuracy (5-shot) on GSM8Kself-reported29.720