---
license: apache-2.0
language:
- en
tags:
- math
- reasoning
- chain-of-thought
- gsm8k
- shivik
metrics:
- accuracy
pipeline_tag: text-generation
model-index:
- name: Shivik-1B
  results:
  - task:
      type: mathematical-reasoning
    dataset:
      name: GSM8K
      type: gsm8k
    metrics:
    - name: Accuracy (5-shot)
      type: accuracy
      value: 29.72
---

# Shivik-1B 🧮

A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving.

## 📊 Performance

| Benchmark | Score |
|-----------|-------|
| **GSM8K (5-shot)** | **29.72%** |

## 🏗️ Architecture

| Component | Value |
|-----------|-------|
| Parameters | ~1.07B |
| Hidden Size | 2048 |
| Layers | 16 |
| Attention Heads | 32 (8 KV heads - GQA) |
| Context Length | 131,072 tokens |
| Vocabulary | 128,262 tokens |
| Precision | bfloat16 |

## 🚀 Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "theaicompany02/Shivik-1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Math problem
prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?

Answer: Let me solve this step by step.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## 📈 Training Details

- **Architecture**: Custom SHIVIK transformer with GQA
- **Training Data**: GSM8K, OpenMathReasoning, mathematical reasoning datasets
- **Method**: Supervised fine-tuning with chain-of-thought reasoning
- **Precision**: bfloat16 mixed precision

## 💡 Best Practices

1. **Use step-by-step prompting**: Add "Let me solve this step by step" or "Think step by step"
2. **Temperature**: Use 0.7 for reasoning, 0.0 for deterministic answers
3. **Format**: Structure prompts as "Question: ... Answer:"

## ⚠️ Limitations

- Optimized for math; may underperform on general tasks
- Best with explicit reasoning prompts
- May struggle with very complex multi-step problems (5+ steps)

## 📜 License

Apache 2.0

## 🙏 Acknowledgments

Built as part of the SHIVIK project - creating competitive small language models through intelligent training.