πŸ”± Indro-Veda: The Sovereign Reasoning Model (500M)

Indro-Veda is a state-of-the-art Small Language Model (SLM) developed by Indro-ai. With 500 million parameters, it is specifically engineered to demonstrate high-level reasoning, logical deduction, and structured problem-solvingβ€”capabilities typically reserved for much larger models.

The name "Indro-Veda" signifies the fusion of supreme intelligence (Indro) and profound knowledge (Veda).

πŸš€ Model Highlights

  • Architecture: Optimized Transformer-based architecture (Llama-style).
  • Parameters: 500 Million.
  • Training Tokens: 3 Billion curated tokens.
  • Specialization: Mathematics, Algorithmic Code, and High-Quality Educational Reasoning.
  • Framework: Trained using PyTorch/XLA on TPU infrastructure for maximum efficiency.

🧠 Training Philosophy: "Reasoning over Recall"

Unlike traditional small models that focus on memorizing facts, Indro-Veda is trained on a Reasoning-Heavy Dataset Mixture:

  1. Logical Core (Math): Powered by UltraData-Math to ensure the model understands step-by-step derivation.
  2. Structural Core (Code): Trained on starcoderdata to enhance algorithmic thinking and syntax awareness.
  3. Knowledge Core (Education): Built on FineWeb-Edu to provide a clean, high-signal educational foundation.

πŸ“Š Dataset Distribution

The model was pre-trained on the Indro-Veda Dataset (3B Tokens) with a fixed-ratio mixture designed to prevent catastrophic forgetting and maintain balanced intelligence across domains.

Component Focus Data Source
Reasoning Mathematics & Logic UltraData-Math
Structure Programming & Algorithms Starcoderdata
Knowledge High-Quality Educational Web FineWeb-Edu
Identity Sovereign Alignment Indro-ai Proprietary

πŸ› οΈ Usage

You can use Indro-Veda with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Indro-ai/Indro-Veda-500M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Explain the concept of logical deduction."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Indro-ai/Indro-Veda-v1-500M