--- license: apache-2.0 language: - en tags: - math - reasoning - chain-of-thought - gsm8k - shivik metrics: - accuracy pipeline_tag: text-generation model-index: - name: Shivik-1B results: - task: type: mathematical-reasoning dataset: name: GSM8K type: gsm8k metrics: - name: Accuracy (5-shot) type: accuracy value: 29.72 --- # Shivik-1B 🧮 A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving. ## 📊 Performance | Benchmark | Score | |-----------|-------| | **GSM8K (5-shot)** | **29.72%** | ## 🏗️ Architecture | Component | Value | |-----------|-------| | Parameters | ~1.07B | | Hidden Size | 2048 | | Layers | 16 | | Attention Heads | 32 (8 KV heads - GQA) | | Context Length | 131,072 tokens | | Vocabulary | 128,262 tokens | | Precision | bfloat16 | ## 🚀 Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "theaicompany02/Shivik-1B" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # Math problem prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get? Answer: Let me solve this step by step. """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## 📈 Training Details - **Architecture**: Custom SHIVIK transformer with GQA - **Training Data**: GSM8K, OpenMathReasoning, mathematical reasoning datasets - **Method**: Supervised fine-tuning with chain-of-thought reasoning - **Precision**: bfloat16 mixed precision ## 💡 Best Practices 1. **Use step-by-step prompting**: Add "Let me solve this step by step" or "Think step by step" 2. **Temperature**: Use 0.7 for reasoning, 0.0 for deterministic answers 3. **Format**: Structure prompts as "Question: ... Answer:" ## ⚠️ Limitations - Optimized for math; may underperform on general tasks - Best with explicit reasoning prompts - May struggle with very complex multi-step problems (5+ steps) ## 📜 License Apache 2.0 ## 🙏 Acknowledgments Built as part of the SHIVIK project - creating competitive small language models through intelligent training.