Instructions to use bharatgenai/Param2-17B-A2.4B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bharatgenai/Param2-17B-A2.4B-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bharatgenai/Param2-17B-A2.4B-Thinking", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bharatgenai/Param2-17B-A2.4B-Thinking", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use bharatgenai/Param2-17B-A2.4B-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bharatgenai/Param2-17B-A2.4B-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bharatgenai/Param2-17B-A2.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bharatgenai/Param2-17B-A2.4B-Thinking

SGLang

How to use bharatgenai/Param2-17B-A2.4B-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bharatgenai/Param2-17B-A2.4B-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bharatgenai/Param2-17B-A2.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bharatgenai/Param2-17B-A2.4B-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bharatgenai/Param2-17B-A2.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use bharatgenai/Param2-17B-A2.4B-Thinking with Docker Model Runner:
```
docker model run hf.co/bharatgenai/Param2-17B-A2.4B-Thinking
```

Param-2-17B

BharatGen presents Param-2-17B-MoE-A2.4B, a large-scale Mixture-of-Experts (MoE) language model designed to deliver high model capacity while retaining the inference efficiency of a much smaller dense model. It uses a Hybrid MoE architecture with 17B total parameters, while activating only 2.4B parameters per token.

The model is pretrained from scratch, with a strong emphasis on linguistic diversity, cultural grounding, and multilingual representation, particularly for Indian languages. It is released as an early post-training checkpoint with advanced capabilities including reasoning, tool calling, mathematics, and code generation, making it suitable for diverse downstream applications and further fine-tuning.

🌟 Key Highlights

17B parameter Mixture of Experts (MoE) language model
Multilingual: English, Hindi + 21 Indian languages
Trained on ~22 trillion tokens across two pretraining phases
Uses 64 specialized experts, dynamically activated per token
Supports long-context understanding (up to 4096 tokens)
Efficient inference: Only 2.4B active parameters per token
Advanced Capabilities: Thinking & Reasoning, Tool Calling, Mathematics, Code Generation
Designed for diverse downstream applications and further fine-tuning

🗣️ Supported Languages

In addition to English and Hindi, the model supports the following 21 Indian languages:

🌏 Languages
Assamese • Bengali • Bodo Dogri • Gujarati • Kannada Konkani • Kashmiri • Maithili Malayalam • Manipuri • Marathi Nepali • Oriya • Punjabi Sanskrit • Santali • Sindhi Tamil • Telugu • Urdu

This broad language coverage enables stronger performance on region-specific tasks and promotes inclusive representation of India’s diverse and underrepresented linguistic communities.

🏗️ Model Architecture

The model utilizes a Hybrid MoE approach. Unlike traditional MoE models, this architecture uses a combination of Shared Experts (always active) and Routed Experts (activated selectively) to ensure both general knowledge stability and specialized task performance.

Metric	Value
Architecture	Hybrid MoE
Total Parameters	17B
Active Parameters/Token	2.4B
Total Experts	64
Top-K Experts	6
Shared Experts	2
Hidden Size	2048
Intermediate Size	9216
Max Context Length	4096 tokens
Vocabulary Size	128,000
Activation Function	SiLU
Normalization	RMSNorm
Num. Hidden Layers	21
Attention Heads	32 (Query) / 8 (KV)
Attention Mechanism	Grouped Query Attention
Precision	bf16-mixed

Architecture Innovation

Our Hybrid MoE architecture introduces two key innovations: (1) a dense layer before sparse MoE layers that transforms raw inputs into structured representations for improved stability, and (2) Two shared routing experts that remain always active alongside 6 dynamically routed experts. The shared experts specifically address Indian language diversity by maintaining stable cross-lingual representations, enabling smooth code-switching, and facilitating knowledge transfer across all 23 supported languages.

✨ Key Features

Multilingual-First Design: Pretrained to support English, Hindi, and 21 Indian languages with better regional linguistic coverage.
Shared + Routed Experts: Separates general language understanding from language-specific specialization for improved multilingual performance.
Efficient Sparse Scaling: 17B total parameters with 2.4B active parameters per token, enabling high capacity with efficient inference.
Cross-Lingual Representation Learning: Improves performance on low-resource Indian languages through shared multilingual knowledge.
Large-Scale Multilingual Pretraining: Trained on ~22 trillion tokens emphasizing linguistic diversity and culturally grounded data.
Long-Context Multilingual Processing: Supports up to 4096 tokens for improved understanding of long-form and multi-turn content. (A longer-context variant of this model, supporting up to 128k context length, will be released soon.)

📚 Training Data

Dataset Composition

PT-1: Pre-Training Phase 1

Data Type	Tokens
English	8.5T
Hindi	998.6B
Math	374.8B
Code	1.83T
Multilingual Indian Languages	1.5T
Technical	500B
Instruction	351B
Total Phase 1	14.05T

PT-2: Pre-Training Phase 2

Data Type	Tokens
English	2.5T
Math	773B
Code	2.46T
Multilingual Indian Languages	1T
Instruction	740B
Reasoning	607B
Total Phase 2	8.08T

Combined Total Training Data: 22.13 Trillion Tokens

🎯 Pre-Training Details

Total Training Tokens: ~22 Trillion
Data Quality: Highly curated using 22 Indian language quality classifiers along with an in-house preprocessing pipeline.
Scheduler: Cosine Annealing
Training Framework: NVIDIA NeMo
Hardware: NVIDIA H100 GPUs
Total Training Duration: 24 Days
Precision: bf16-mixed

🌍 Environmental Impact

Carbon Emissions

Training large-scale foundation models requires substantial computational resources. In the interest of transparency and responsible AI development, we provide an estimated carbon footprint of our training process. Training Configuration

Metric	Value
Total GPU Hours	576 Hours
Hardware Used	NVIDIA H100 GPUs
Estimated Power Usage	~258,048 kWh (approximate)
Carbon Footprint	~194 metric tons CO₂eq

Note:

Energy estimation is based on GPU TDP with a 25% infrastructure overhead.
Carbon intensity reflects the India national grid average (0.75 kg CO₂eq/kWh).
Actual emissions may vary depending on data center efficiency (PUE), renewable energy mix, and workload utilization.
Calculations follow standard ML CO₂ estimation practices and are aligned with methodologies used in ML CO₂ Impact reporting.

Efficiency Measures

Sparse Activation: Only 2.4B out of 17B parameters are activated per token, reducing computational overhead during inference
Efficient Architecture: Hybrid MoE design optimizes compute-to-performance ratio
Multi-Phase Training: Structured training phases ensure efficient use of computational resources

🚀 Model Inference

Requirements:

Transformers version - 4.52.3

Support for Transformers v5 coming soon

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from parsers import parse_model_output

model_name = "bharatgenai/Param2-17B-A2.4B-Thinking"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=False
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto"
)

conversation = [
    {"role": "system", "content": "You are helpful assistant."},
    {"role": "user", "content": "What is the BharatGen Mission?"}
]

inputs = tokenizer.apply_chat_template(
    conversation=conversation,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

with torch.no_grad():
    output = model.generate(
        inputs,
        max_new_tokens=300,
        do_sample=True,
        top_k=50,
        top_p=0.9,
        temperature=0.7,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=False,
    )

generated_tokens = output[0][inputs.shape[-1]:]

# 🔥 IMPORTANT: skip_special_tokens=False
generated_text = tokenizer.decode(
    generated_tokens,
    skip_special_tokens=False
)

parsed = parse_model_output(generated_text)

print("\n========== RAW ==========\n", generated_text)
print("\n========== REASONING ==========\n", parsed["reasoning"])
print("\n========== TOOL CALLS ==========\n", parsed["tool_calls"])
print("\n========== FINAL ANSWER ==========\n", parsed["final_answer"])

Note: The parser is available within the model directory. You may download and customize it according to your use case.

✅ Quick Guide for Best Results

Use Deterministic Decoding for Consistency

Set do_sample = False to get stable, repeatable, and more reliable outputs. This is recommended for evaluations, production use, and structured tasks.

Prompt Tuning is Essential

Clearly specify the task, format, and constraints in your prompt. Well-structured instructions significantly improve accuracy and relevance for specific use cases.

Leverage Strong Structured Output Capabilities

The model performs well at generating Markdown, tables, JSON, and other structured formats, making it ideal for documentation, reports, and formatted outputs.

🌟 Model Capabilities & Examples

🧮 Example: Mathematical Reasoning (Chain-of-Thought)

Prompt

A class has 30 students.

The average score of the whole class in a test is 70. The average score of the top 10 students is 85. The average score of the bottom 10 students is 50.

👉 What is the average score of the middle 10 students?

Model Response

🧠 Click to view Reasoning Process

<think>

Ahh.. Thinking about the problem..

Okay, let's see. The problem is about a class of 30 students. The average score of the whole class is 70. The top 10 students have an average of 85, and the bottom 10 have an average of 50. We need to find the average score of the middle 10 students.

First, I need to recall that the average score is the total sum of all scores divided by the number of students. So, for the entire class, the total sum of scores would be 30 students multiplied by 70, which is 2100. That makes sense because average is sum divided by number.

Now, the top 10 students have an average of 85. So their total sum would be 10 times 85, which is 850. Similarly, the bottom 10 students have an average of 50, so their total sum is 10 times 50, which is 500.

But here's the thing: the class is divided into three groups—top, middle, and bottom. Each group has 10 students. So, the total sum of all three groups should equal the total sum of the entire class. That is, 850 (top) + sum of middle 10 + 500 (bottom) = 2100.

To find the sum of the middle 10 students, we subtract the top and bottom sums from the total. Let's compute that: 2100 minus 850 minus 500. That would be 2100 - 850 = 1250, then 1250 - 500 = 750. So the middle 10 students have a total sum of 750.

Now, to find their average, we take that sum and divide it by the number of students in the middle group, which is 10. So 750 divided by 10 equals 75. Therefore, the average score of the middle 10 students is 75.

Wait, let me check my math again to make sure.

Total class sum: 30 * 70 = 2100. Correct.

Top 10 sum: 10 * 85 = 850. Bottom 10 sum: 10 * 50 = 500. Adding those: 850 + 500 = 1350.

Subtracting that from the total: 2100 - 1350 = 750. Divided by 10 students: 750 / 10 = 75. Yep, that seems right.

I don't think there's any overlap here because the top and bottom groups are separate, and the middle group is the remaining 10. So this should be straightforward. So the answer is 75.

Hmm, but just to think again, sometimes in these problems, there might be some trick where the middle group is not exactly 10 students, but in this case, the problem states the class has 30 students, divided into top 10, middle 10, bottom 10. So they add up to 30. So each is 10. Therefore, the logic holds. So the average of the middle 10 is 75. So I think that's the answer. </think>

The class has 30 students with an average score of 70. The total sum of all scores is (30 \times 70 = 2100).

The top 10 students have an average score of 85, giving a total of (10 \times 85 = 850). The bottom 10 students have an average score of 50, giving a total of (10 \times 50 = 500).

Subtracting the top and bottom sums from the total gives the middle 10 students' total: (2100 - 850 - 500 = 750).

Dividing this by 10 students gives the average score of the middle 10 students: (\frac{750}{10} = 75).

The average score of the middle 10 students is (\boxed{75}).

🇮🇳 Example: Multilingual Coding (Telugu)

Prompt

Write a binary search program in Python. All comments in the code must be written in Telugu language.

Model Response

🐍 Click to view Reasoning Process

<think>

Reasoning

User binary search కోసం Python program కోరారు. Comments అన్ని Telugu లో ఉండాలి. Binary search సాధారణ algorithm, array, low, high, mid. Telugu లో వివరణలు ఇవ్వాలి. Code లో comments Telugu లో ఉండాలి, code logic correct ఉండాలి. Python లో function name in English, comments Telugu. </think>

def binary_search(arr, target):
    # ఈ ఫంక్షన్ లక్ష్య విలువను కనుగుకోవడానికి binary search ను అమలు చేస్తుంది
    low = 0
    high = len(arr) - 1
    
    while low <= high:
        mid = (low + high) // 2
        
        # మధ్య విలువను పరిశీలిస్తాము
        if arr[mid] == target:
            # లక్ష్య విలువను కనుగొన్నాం, దాని index ను తిరిగి ఇస్తాం
            return mid
        elif arr[mid] < target:
            # లక్ష్య విలువ పెద్దది, కాబట్టి low ను పెంచుతాం
            low = mid + 1
        else:
            # target చిన్నది, కాబట్టి high ను తగ్గిస్తాం
            high = mid - 1
            
    # లక్ష్య విలువ లేదు, -1 ను తిరిగి ఇస్తాం
    return -1

# ఉదాహరణ వినియోగం
arr = [2, 3, 5, 7, 11, 13]
target = 7
result = binary_search(arr, target)
print(f"లక్ష్య విలువ {target} యొక్క index: {result}")

🌏 Example: Multilingual Cultural Question (Hindi)

Prompt

भारत अपनी विविध संस्कृति के लिए जाना जाता है। भारत में अलग-अलग राज्यों की भाषाएँ, भोजन और त्योहार क्यों अलग होते हैं? इस सांस्कृतिक विविधता का भारतीय समाज पर क्या सकारात्मक प्रभाव पड़ता है?

Model Response

💭 Click to view Reasoning Process

<think>

प्रश्न की समझ

उपयोगकर्ता ने भारत की विविध संस्कृति के कारण राज्य‑विशिष्ट भाषाओं, भोजन और त्योहारों में अंतर के बारे में पूछा है तथा इस विविधता के सामाजिक प्रभाव पर चर्चा चाहता है। इसलिए उत्तर में दो भाग चाहिए: (1) सांस्कृतिक विविधता के कारणों का स्पष्टीकरण, (2) इस विविधता के सकारात्मक प्रभावों की सूची।

मुख्य कारणों की पहचान

भौगोलिक विविधता – पहाड़, मैदान, समुद्र तट, रेगिस्तान आदि विभिन्न जलवायु और स्थल बनाते हैं, जिससे अलग‑अलग कृषि‑उत्पादन और खाद्य‑प्रणाली विकसित होती है।
जनजातीय एवं भाषा विविधता – विभिन्न जनजातीय समूह, उनके मूल भाषाएँ, लिपियाँ और सांस्कृतिक परम्पराएँ राज्य‑स्तर पर भाषा का प्रयोग निर्धारित करती हैं।
ऐतिहासिक एवं राजनैतिक प्रभाव – विभिन्न राजवंश, उपनिवेश, और स्वतंत्रता के बाद के राज्य‑निर्माण ने प्रशासनिक सीमाएँ और सांस्कृतिक पहचान को अलग किया।
धर्म और उत्सव परंपरा – प्रत्येक क्षेत्र में प्रमुख धर्म (हिंदू, मुस्लिम, सिख, बौद्ध आदि) और उनके स्थानीय उत्सव, रीति‑रिवाज अलग‑अलग विकसित होते हैं।
स्थानीय संसाधन एवं परम्पराएँ – प्राकृतिक संसाधन (जैसे जल, खनिज, समुद्र) और स्थानीय कारीगर, हस्तशिल्प आदि सांस्कृतिक अभिव्यक्ति को आकार देते हैं।

उत्तर का ढांचा तय करना

पहले कारणों को क्रमबद्ध बिंदुओं में प्रस्तुत करें, जिससे उपयोगकर्ता को स्पष्ट समझ मिले।
फिर प्रत्येक कारण के बाद तत्काल परिणाम (भाषाएँ, भोजन, त्योहार) को संक्षिप्त रूप में जोड़ें, जैसा कि संदर्भ उत्तर में किया गया है।
अंत में सामाजिक प्रभाव को पाँच प्रमुख बिंदुओं में विभाजित करें, प्रत्येक बिंदु को अलग‑अलग उप‑बिंदु के साथ विस्तृत करें, और अंत में एक समग्र निष्कर्ष दें जो विविधता को एकता के साथ जोड़े।

सकारात्मक प्रभावों की सूची बनाना

सांस्कृतिक अभिव्यक्ति – विविधता के कारण विभिन्न कला, संगीत, नृत्य, साहित्य आदि का विकास।
आर्थिक प्रभाव – पर्यटन, हस्तशिल्प, स्थानीय बाजारों की विविधता, जिससे आय उत्पन्न होती है।
शिक्षा एवं अनुसंधान – विभिन्न भाषा‑साहित्य, इतिहास, सामाजिक विज्ञान के अध्ययन के अवसर।
वैश्विक मंच पर पहचान – अंतरराष्ट्रीय मंच पर विविध सांस्कृतिक उत्पादों का प्रदर्शन, जिससे भारत की छवि मजबूत होती है।
सामाजिक मूल्य एवं सहिष्णुता – विविधता के माध्यम से विभिन्न धर्म, जातीय समूहों के बीच सहिष्णुता और सहयोग का विकास।

उत्तर का स्वर और शैली

संदर्भ उत्तर में सरल, बिंदु‑आधारित शैली है, प्रत्येक बिंदु को बोल्ड शीर्षक के साथ प्रस्तुत किया गया है। इसलिए इसी शैली को अपनाते हुए, कारणों को क्रमांकित सूची में लिखें और प्रभावों को भी समान रूप में प्रस्तुत करें। यह उपयोगकर्ता की अनुरोध के अनुसार उत्तर को स्पष्ट, संगठित और सकारात्मक बनाता है।

अंतिम रूपरेखा

परिचयात्मक वाक्य: “भारत अपनी विविध संस्कृति के लिए जाना जाता है, इसलिए प्रत्येक राज्य में अलग‑अलग भाषाएँ, भोजन और त्योहार होते हैं।”
कारणों की सूची (1‑5) और उनके तुरंत प्रभाव।
सकारात्मक प्रभावों की सूची (1‑5) के साथ विस्तृत बिंदु।
निष्कर्ष: विविधता एकता के साथ समृद्धि लाती है।

यह सोच प्रक्रिया सीधे संदर्भ उत्तर की सामग्री और संरचना की ओर ले जाती है, जिससे वही उत्तर उत्पन्न होगा। </think>

भारत अपनी विविध संस्कृति के लिए जाना जाता है, इसलिए प्रत्येक राज्य में अलग- अलग भाषाएँ, भोजन और त्योहार होते हैं।

भिन्न भौगोलिक स्थिति और जलवायु भारत में पहाड़, मैदान, जंगल, समुद्र तट और रेगिस्तान जैसे विविध भू-भाग हैं। इसलिए प्रत्येक राज्य में अलग-अलग जलवायु है और अलग-अलग कृषि उत्पादन होता है। यह अलग-अलग भोजन और खाने की परंपराओं को जन्म देता है।
भाषाओं और जातीय विविधता भारत में विविध जातीय समूह और भाषाएँ हैं। प्रत्येक राज्य में अपनी भाषा है, जो उस राज्य की पहचान है। भाषा के कारण अलग-अलग साहित्य, गीत और कहानियाँ होती हैं।
ऐतिहासिक और राजनीतिक प्रभाव भारत में विभिन्न राजवंश, उपनिवेश और राज्य थे। प्रत्येक राज्य में अपनी प्रशासनिक और सांस्कृतिक परंपराएँ थीं। इस कारण भोजन, त्योहार और रीति-रिवाज़ अलग-अलग हो गए।
धर्म और त्योहार परंपरा भारत में विभिन्न धर्म हैं, जैसे हिंदू, मुस्लिम, सिख, बौद्ध आदि। प्रत्येक धर्म का अपना त्योहार होता है और अलग-अलग रीति-रिवाज़ होते हैं।
भोजन और त्योहार के कारण अलग-अलग संस्कृति प्रत्येक राज्य में अलग-अलग भोजन और त्योहार होते हैं, जो उस राज्य की सांस्कृतिक पहचान है।

भारतीय समाज पर सांस्कृतिक विविधता का सकारात्मक प्रभाव

सांस्कृतिक अभिव्यक्ति विविधता के कारण भारत में विभिन्न कला, संगीत, नृत्य और साहित्य का विकास हुआ। यह भारत को एक सांस्कृतिक शक्ति बनाता है।
आर्थिक प्रभाव विविधता के कारण भारत में पर्यटन, हस्तशिल्प और स्थानीय बाजारों की विविधता है। यह आय और रोजगार का स्रोत बनता है।
शिक्षा और अनुसंधान विविधता के कारण भारत में विभिन्न भाषाओं, साहित्य, इतिहास और सामाजिक विज्ञान का अध्ययन होता है। यह भारतीय शिक्षा प्रणाली को मजबूत बनाता है।
वैश्विक मंच पर पहचान विविधता के कारण भारत में विभिन्न सांस्कृतिक उत्पाद वैश्विक मंच पर दिखाई देते हैं। यह भारत की छवि को मजबूत बनाता है।
सामाजिक मूल्य और सहिष्णुता विविधता के कारण भारत में विभिन्न धर्म, जातीय समूह और भाषाएँ हैं। यह सहिष्णुता और सामाजिक सहयोग का विकास करता है।

इस प्रकार, भारत की सांस्कृतिक विविधता एक सकारात्मक प्रभाव है, जो एकता के साथ समृद्धि और पहचान को विकसित करता है।

📊 Benchmarks

Our model has been evaluated across multiple benchmark categories to assess its capabilities in different domains. All results are reported as zero-shot performance unless otherwise specified.

🌐 General English Benchmarks

Performance on fundamental language understanding and reasoning tasks:

Task	Param-2-17B-A2.4B	Qwen1.5-MoE-A2.7B	DeepSeek-R1-Distill-Qwen-14B	DeepSeek-V2-Lite	gpt-oss-20b
ARC Challenge	56.83	44.03	53.75	45.31	48.98
ARC Easy	77.69	68.9	74.75	70.41	75.51
HellaSwag	77.43	77.26	79.04	73.53	57.8
SQuADv2	33.86	13.96	23.77	25.2	8.66
MMLU	57.79	62.22	73.05	46.57	66.37

💻 Code, Math & Reasoning Benchmarks

Performance on specialized reasoning, mathematical problem-solving, and code generation:

Task	Param-2-17B-A2.4B	Qwen1.5-MoE-A2.7B	DeepSeek-R1-Distill-Qwen-14B	DeepSeek-V2-Lite	gpt-oss-20b
GSM8K	57.32	61.49	87.87	38.21	36.54
MathQA	40.23	35.61	48.44	38.73	36.68
HumanEval	36.59	33.54	26.22	26.83	24.39
MBPP	47	38.6	61	26.6	-

📝 Instruction Following Benchmarks

Evaluation of instruction comprehension and task execution capabilities:

Task	Param-2-17B-A2.4B	Qwen1.5-MoE-A2.7B	DeepSeek-R1-Distill-Qwen-14B	DeepSeek-V2-Lite	gpt-oss-20b
TriviaQA	54.64	38	17.88	34.62	29.05
LogiQA	52.35	31.18	33.49	29.34	32.87
WinoGrande	67.96	69.22	72.69	67.01	67.48
TruthfulQA	46.42	43.57	58.63	28.89	52.93
IFEval	34.05	13.19	51.8	12.75	31.42

🇮🇳 Indic Language Benchmarks

Performance on Indian language understanding and culturally grounded tasks:

Task	Param-2-17B-A2.4B	Qwen1.5-MoE-A2.7B	DeepSeek-R1-Distill-Qwen-14B	DeepSeek-V2-Lite	gpt-oss-20b
Sanskriti	66.54	28.71	74.58	64.84	75.14
ARC Challenge (Indic)	36.37	21.71	23.63	21.76	31.55
TriviaQA (Indic MCQ)	72.95	25.21	53.99	29.18	71.67
Indic BoolQ	75.98	46.48	70.03	62.9	70.02
HellaSwag Hi	49.63	25.86	39.02	30.23	41.25
MMLU Hi	59.23	33.27	44.52	24.26	50.29

Note: ARC Challenge (Indic) and TriviaQA (Indic MCQ) scores are macro-averaged across 10 Indian languages:
bn (Bengali), gu (Gujarati), hi (Hindi), kn (Kannada), ml (Malayalam), mr (Marathi), or (Odia), pa (Punjabi), ta (Tamil), te (Telugu).

Notes:

All benchmarks reflect zero-shot performance unless noted

Evaluation pipeline follows standard LM Evaluation Harness protocols

All models have been tested with thinking=false

⚙️ Intended Use

This is an early post-training checkpoint is designed for:

Research and Development: Investigating MoE architectures and multilingual language models
Downstream Fine-Tuning: Building task-specific models for Indian language applications
Academic Research: Studying cross-lingual transfer and low-resource language modeling
Application Development: Creating culturally-aware AI applications for Indian markets
Community Innovation: Enabling the open-source community to build India-specific use cases

Recommended Use Cases

Multilingual chatbots and conversational AI
Content generation in Indian languages
Cross-lingual information retrieval
Educational tools for Indian language learning
Document understanding and summarization
Cultural and regional knowledge applications

⚠️ Limitations

Technical Limitations

Hardware Requirements: Despite sparse activation, the 17B total parameters require significant VRAM (approximately 34GB+ in bfloat16) to load the full model
Context Window: Optimized for 4096 tokens; performance may degrade on extremely long-form documents beyond this limit

Model Limitations

No Guardrails: The model has not undergone RLHF or other alignment procedures
Potential Biases: May contain biases present in the training data
Language Imbalance: Performance may vary across different Indian languages based on data availability
Not Production-Ready: Requires fine-tuning and safety evaluation before production deployment

🔒 Responsible AI & Safety

Important Considerations

Base Model Status: This is an early post-training checkpoint that requires additional fine-tuning and safety alignment before production use.
Data Sources and Artifacts: Training data includes content crawled from the internet, which may contain inherited artifacts, biases, or inaccuracies.
No Safety Alignment: The model has not undergone comprehensive safety training and may generate:
- Biased or stereotypical content
- Factually incorrect information
- Potentially harmful or offensive text
Deployment Requirements: Organizations deploying this model must:
- Implement appropriate content filtering and safety measures
- Conduct thorough testing for their specific use case
- Monitor outputs for quality and safety
- Comply with applicable legal and ethical guidelines
Community Responsibility: Users are expected to:
- Use the model ethically and responsibly
- Avoid malicious applications
- Report issues and contribute to model improvement
- Credit the authors as per licensing terms

🤝 Acknowledgement & Collaboration

A Home-Grown Effort

This work represents a bottom-up initiative to develop large language models from scratch within India with limited resources. It reflects our humble, resource-constrained journey to contribute meaningfully to the open-source AI ecosystem and foster collaboration within the broader community.

Community Collaboration

We welcome contributions and open dialogue:

Share feedback and report issues
Collaborate on improvements and extensions
Contribute to dataset curation and evaluation
Build innovative applications on top of this foundation

Future versions will introduce better alignment, improved training scale, and more curated datasets. Together, we aim to evolve toward safer and more capable AI systems.

📜 License

This post-trained checkpoint is released under the BharatGen non-commercial license.

Please refer to the LICENSE file for detailed terms and conditions.

Contact

For any questions or feedback, please contact:

Bhargav Patel (bhargav.patel@bharatgen.com)

Downloads last month: 41,921

Safetensors

Model size

17B params

Tensor type

BF16

Param-2-17B

🌟 Key Highlights

🗣️ Supported Languages

🏗️ Model Architecture

Architecture Innovation

✨ Key Features

📚 Training Data

Dataset Composition

PT-1: Pre-Training Phase 1

PT-2: Pre-Training Phase 2

🎯 Pre-Training Details

🌍 Environmental Impact

Carbon Emissions

Efficiency Measures

🚀 Model Inference

✅ Quick Guide for Best Results

Use Deterministic Decoding for Consistency

Prompt Tuning is Essential

Leverage Strong Structured Output Capabilities

🌟 Model Capabilities & Examples

🧮 Example: Mathematical Reasoning (Chain-of-Thought)

Prompt

Model Response

Ahh.. Thinking about the problem..

🇮🇳 Example: Multilingual Coding (Telugu)

Prompt

Model Response

Reasoning

🌏 Example: Multilingual Cultural Question (Hindi)

Prompt

Model Response

प्रश्न की समझ

मुख्य कारणों की पहचान

उत्तर का ढांचा तय करना

सकारात्मक प्रभावों की सूची बनाना

उत्तर का स्वर और शैली

अंतिम रूपरेखा

📊 Benchmarks

🌐 General English Benchmarks

💻 Code, Math & Reasoning Benchmarks

📝 Instruction Following Benchmarks

🇮🇳 Indic Language Benchmarks

⚙️ Intended Use

Recommended Use Cases

⚠️ Limitations

Technical Limitations

Model Limitations

🔒 Responsible AI & Safety

Important Considerations

🤝 Acknowledgement & Collaboration

A Home-Grown Effort

Community Collaboration

📜 License

Contact

Spaces using bharatgenai/Param2-17B-A2.4B-Thinking 3