๐Ÿ‡ฎ๐Ÿ‡ณ Kalki 1.5

India's First Fully Agentic 1T Parameter AI Model

Chat Homepage
Hugging Face Twitter Follow Discord ModelScope
License

๐Ÿš€ 1. Model Introduction

Kalki 1.5 represents a monumental leap in sovereign AI capabilities as India's First Fully Agentic 1T Parameter AI. Built upon the breakthrough Kalki Mixture-of-Experts (MoE) architecture, Kalki 1.5 is custom-tuned for complex, long-horizon software engineering tasks and multi-modal tool use.

Kalki 1.5 features substantial optimizations over predecessor models:

  • Unprecedented Scale: A 1-Trillion parameter Mixture-of-Experts model, activating 32 Billion parameters per token.
  • Agentic Workflows: Designed for autonomous tool navigation, file edits, Postgres queries, and multi-step debugging.
  • Extreme Token Efficiency: Approximately 30% reduction in reasoning tokens compared to Kalki-0.6, delivering much faster completion speeds.
  • Multimodal Integration: Built-in visual understanding with the UpmarkViT encoder, facilitating UI analysis and visual debugging.

๐Ÿ“Š 2. Model Summary

Specification Details
Architecture Mixture-of-Experts (MoE) with MLA (Multi-head Latent Attention)
Total Parameters 1.0T
Activated Parameters 32B
Number of Layers 61 (includes dense/routing layer)
Vocabulary Size 160K
Context Length 256K tokens
Activation Function SwiGLU
Vision Encoder UpmarkViT (400M parameters)

๐Ÿ† 3. Evaluation Results

Kalki 1.5 outperforms leading global models across critical coding and agentic benchmarks. The table below compares performance:

Benchmark Kalki-0.6 GPT-5.5 Claude Opus 4.8 Kalki 1.5 ๐Ÿ‡ฎ๐Ÿ‡ณ
Coding Excellence (Higher is Better)
Kalki Code Bench v2 50.9 69.0 67.4 82.5
Program Bench 48.3 69.1 63.8 76.8
MLS Bench Lite 26.7 35.5 42.8 58.2
Agentic & Tool Use (Higher is Better)
Kalki Claw 24/7 Bench 42.9 52.8 50.4 68.4
MCP Atlas 69.4 79.4 81.3 91.2
MCP Mark Verified 72.8 92.9 76.4 94.5
Testing Methodology & Footnotes
  1. General Testing Details
    • Kalki 1.5 was tested with thinking mode enabled via Kalki Code CLI at temperature = 1.0, top-p = 0.95, and a 262,144-token context length. GPT-5.5 ran in Codex with xhigh mode, and Opus 4.8 in Claude Code with xhigh mode.
  2. Coding Benchmarks
    • Kalki Code Bench V2: Evaluates agents on realistic software engineering tasks across 10+ mainstream languages, highlighting complex backend service modifications, security audits, and ML pipelines.
    • Program Bench: Assesses program reconstruction from compiled binaries and documentation. Under strict sandbox conditions, the agent builds source code from scratch and is validated against behavioral test suites.
    • MLS-Bench-Lite: Evaluation of autonomous ML generation capabilities, requiring the model to design and run training runs over a 5-hour window.
  3. Agentic Benchmarks
    • Kalki Claw 24/7 Bench: In-house benchmark tracking multi-day coworking tasks spanning coding, research, and analysis.
    • MCP-Atlas / MCPMark-Verified: Assesses Model Context Protocol (MCP) tool execution. Evaluated with a 100-step tool budget and 32k max tokens per step.

โšก 4. Native INT4 Quantization

Kalki 1.5 natively supports highly-optimized INT4 quantization. This drastically reduces GPU VRAM consumption while preserving over 99% of original FP16 task performance, enabling deployability on standard enterprise servers.


โš™๏ธ 5. Deployment

Access Kalki 1.5's high-speed API directly via platform.upmarking.com with standard OpenAI/Anthropic SDK compatibility.

For local deployment, Kalki 1.5 can be served using the following inference frameworks:

  • vLLM
  • SGLang
  • KTransformers

Ensure you have the required transformers library version:

pip install "transformers>=4.57.1,<5.0.0"

Refer to the Model Deployment Guide for step-by-step setup guides.


๐Ÿ’ป 6. Usage Examples

Below is a simple chat completion example calling the Kalki 1.5 API in Thinking mode.

import openai

def simple_chat(client: openai.OpenAI, model_name: str):
    messages = [
        {'role': 'system', 'content': 'You are Kalki, India\'s First Fully Agentic 1T Parameter AI created by Upmarking.'},
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'How can we optimize memory constraints in MoE architectures?'}
            ],
        },
    ]
    response = client.chat.completions.create(
        model=model_name, 
        messages=messages, 
        stream=False, 
        max_tokens=4096
    )
    print('====== Reasoning Process ======')
    print(response.choices[0].message.reasoning)
    print('====== Final Answer ======')
    print(response.choices[0].message.content)

Made with โค๏ธ by Upmarking.
Downloads last month
839
Safetensors
Model size
1.1T params
Tensor type
BF16
ยท
F32
ยท
I32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support