Instructions to use Kirim-ai/Kirim-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kirim-ai/Kirim-V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kirim-ai/Kirim-V2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Kirim-ai/Kirim-V2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Kirim-ai/Kirim-V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kirim-ai/Kirim-V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kirim-ai/Kirim-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Kirim-ai/Kirim-V2

SGLang

How to use Kirim-ai/Kirim-V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kirim-ai/Kirim-V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kirim-ai/Kirim-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kirim-ai/Kirim-V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kirim-ai/Kirim-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Kirim-ai/Kirim-V2 with Docker Model Runner:
```
docker model run hf.co/Kirim-ai/Kirim-V2
```

Kirim-V2 26B

Kirim-V2 is an advanced general-purpose language model with 26 billion parameters, featuring an innovative sparse activation architecture where 18 billion parameters are actively engaged during inference. This design delivers high performance while maintaining computational efficiency.

Model Architecture

Total Parameters: 26B
Active Parameters: ~18B (sparse activation)
Context Length: Extended context window
Architecture: Transformer-based with mixture-of-experts components

Key Capabilities

Core Competencies

Natural language understanding and generation across multiple domains
Complex reasoning and multi-step problem solving
Code generation and technical documentation
Creative writing and content creation

Advanced Features

Web Search Integration: Built-in capability to search and retrieve real-time information
Tool Use: Seamless integration with external tools and APIs
Multilingual Support: Strong performance across multiple languages
Long-form Generation: Coherent output for extended documents and articles

Performance Highlights

Kirim-V2 represents a significant advancement over Kirim-V1, featuring:

Enhanced reasoning capabilities for complex tasks
Improved factual accuracy through integrated search
Better instruction following and task completion
More natural and contextually appropriate responses

Use Cases

Research & Analysis: Information gathering with real-time web search
Software Development: Code generation, debugging, and documentation
Content Creation: Articles, reports, creative writing, and technical documentation
Question Answering: Accurate responses with source verification
Task Automation: Multi-step workflows with tool integration

Model Specifications

Architecture: Sparse Transformer
Training Data: Diverse web corpus, code, and specialized datasets
Tokenizer: Custom trained tokenizer optimized for multilingual performance
Optimization: Mixed precision training with gradient checkpointing

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Kirim-ai/Kirim-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

prompt = "Explain quantum entanglement in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Limitations

May occasionally generate plausible-sounding but incorrect information
Performance depends on prompt quality and task complexity
Web search capability requires appropriate API configuration
Not specifically fine-tuned for safety-critical applications

Ethical Considerations

This model should be used responsibly. Users should verify critical information independently and be aware of potential biases in generated content. The model is not intended for making decisions in high-stakes scenarios without human oversight.

License

This model is released under the Apache 2.0 License.

Citation

@model{kirimv2_2025,
  title={Kirim-V2: A 26B Parameter Sparse Activation Language Model},
  author={Qiling Tech},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/Kirim-ai/Kirim-V2}
}

Release Date: 2026 Model Type: Causal Language Model

Downloads last month: 32

Safetensors

Model size

28B params

Tensor type

BF16

Collection including Kirim-ai/Kirim-V2

Kirim 2 (发送 2)

Collection

3 items • Updated Mar 2