|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- math |
|
|
- reasoning |
|
|
- chain-of-thought |
|
|
- gsm8k |
|
|
- shivik |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: Shivik-1B |
|
|
results: |
|
|
- task: |
|
|
type: mathematical-reasoning |
|
|
dataset: |
|
|
name: GSM8K |
|
|
type: gsm8k |
|
|
metrics: |
|
|
- name: Accuracy (5-shot) |
|
|
type: accuracy |
|
|
value: 29.72 |
|
|
--- |
|
|
|
|
|
# Shivik-1B ๐งฎ |
|
|
|
|
|
A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving. |
|
|
|
|
|
## ๐ Performance |
|
|
|
|
|
| Benchmark | Score | |
|
|
|-----------|-------| |
|
|
| **GSM8K (5-shot)** | **29.72%** | |
|
|
|
|
|
## ๐๏ธ Architecture |
|
|
|
|
|
| Component | Value | |
|
|
|-----------|-------| |
|
|
| Parameters | ~1.07B | |
|
|
| Hidden Size | 2048 | |
|
|
| Layers | 16 | |
|
|
| Attention Heads | 32 (8 KV heads - GQA) | |
|
|
| Context Length | 131,072 tokens | |
|
|
| Vocabulary | 128,262 tokens | |
|
|
| Precision | bfloat16 | |
|
|
|
|
|
## ๐ Quick Start |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_id = "theaicompany02/Shivik-1B" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Math problem |
|
|
prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get? |
|
|
|
|
|
Answer: Let me solve this step by step. |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=256, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## ๐ Training Details |
|
|
|
|
|
- **Architecture**: Custom SHIVIK transformer with GQA |
|
|
- **Training Data**: GSM8K, OpenMathReasoning, mathematical reasoning datasets |
|
|
- **Method**: Supervised fine-tuning with chain-of-thought reasoning |
|
|
- **Precision**: bfloat16 mixed precision |
|
|
|
|
|
## ๐ก Best Practices |
|
|
|
|
|
1. **Use step-by-step prompting**: Add "Let me solve this step by step" or "Think step by step" |
|
|
2. **Temperature**: Use 0.7 for reasoning, 0.0 for deterministic answers |
|
|
3. **Format**: Structure prompts as "Question: ... Answer:" |
|
|
|
|
|
## โ ๏ธ Limitations |
|
|
|
|
|
- Optimized for math; may underperform on general tasks |
|
|
- Best with explicit reasoning prompts |
|
|
- May struggle with very complex multi-step problems (5+ steps) |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## ๐ Acknowledgments |
|
|
|
|
|
Built as part of the SHIVIK project - creating competitive small language models through intelligent training. |
|
|
|