Shivik-1B / README.md
theaicompany02's picture
Upload Shivik-1B (GSM8K 29.72%)
719e837 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - math
  - reasoning
  - chain-of-thought
  - gsm8k
  - shivik
metrics:
  - accuracy
pipeline_tag: text-generation
model-index:
  - name: Shivik-1B
    results:
      - task:
          type: mathematical-reasoning
        dataset:
          name: GSM8K
          type: gsm8k
        metrics:
          - name: Accuracy (5-shot)
            type: accuracy
            value: 29.72

Shivik-1B ๐Ÿงฎ

A 1B parameter model optimized for mathematical reasoning and chain-of-thought problem solving.

๐Ÿ“Š Performance

Benchmark Score
GSM8K (5-shot) 29.72%

๐Ÿ—๏ธ Architecture

Component Value
Parameters ~1.07B
Hidden Size 2048
Layers 16
Attention Heads 32 (8 KV heads - GQA)
Context Length 131,072 tokens
Vocabulary 128,262 tokens
Precision bfloat16

๐Ÿš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "theaicompany02/Shivik-1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Math problem
prompt = """Question: A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?

Answer: Let me solve this step by step.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ“ˆ Training Details

  • Architecture: Custom SHIVIK transformer with GQA
  • Training Data: GSM8K, OpenMathReasoning, mathematical reasoning datasets
  • Method: Supervised fine-tuning with chain-of-thought reasoning
  • Precision: bfloat16 mixed precision

๐Ÿ’ก Best Practices

  1. Use step-by-step prompting: Add "Let me solve this step by step" or "Think step by step"
  2. Temperature: Use 0.7 for reasoning, 0.0 for deterministic answers
  3. Format: Structure prompts as "Question: ... Answer:"

โš ๏ธ Limitations

  • Optimized for math; may underperform on general tasks
  • Best with explicit reasoning prompts
  • May struggle with very complex multi-step problems (5+ steps)

๐Ÿ“œ License

Apache 2.0

๐Ÿ™ Acknowledgments

Built as part of the SHIVIK project - creating competitive small language models through intelligent training.