Instructions to use aivedha/aicippy-Coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aivedha/aicippy-Coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aivedha/aicippy-Coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("aivedha/aicippy-Coder")
model = AutoModelForMultimodalLM.from_pretrained("aivedha/aicippy-Coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aivedha/aicippy-Coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aivedha/aicippy-Coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aivedha/aicippy-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aivedha/aicippy-Coder

SGLang

How to use aivedha/aicippy-Coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aivedha/aicippy-Coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aivedha/aicippy-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aivedha/aicippy-Coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aivedha/aicippy-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use aivedha/aicippy-Coder with Docker Model Runner:
```
docker model run hf.co/aivedha/aicippy-Coder
```

AiCIPPY-Coder

The Agentic Coding Intelligence behind AiCIPPY
by AiVedha · AiVibe Software Services Private Limited

aicippy.com · aivedha.ai · aivibe.cloud · PyPI

Highlights

We are releasing AiCIPPY-Coder — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.

Efficient Yet Powerful: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
Advanced Agentic Capabilities: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
Seamless IDE and CLI Integration: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.

Model Overview

AiCIPPY-Coder carries the following architecture:

Property	Value
Model Type	Causal Language Model
Training Stage	Pretraining & Post-training
Total Parameters	80B
Activated Parameters	3B
Non-Embedding Parameters	79B
Hidden Dimension	2048
Number of Layers	48
Context Length	262,144 tokens (native)
Thinking Mode	Non-thinking (no `<think>` blocks)

Architecture Details:

Hybrid Layout: 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
Gated Attention: 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
Gated DeltaNet: 32 heads for V, 16 for QK, Head Dim 128
Mixture of Experts: 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512

Note: This model operates in non-thinking mode only. The <think></think> output blocks are not generated. Setting enable_thinking=False is not required.

Quickstart

Ensure you are using the latest version of transformers before proceeding.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aivedha/aicippy-Coder"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Prepare input
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("AiCIPPY-Coder:", content)

Note: If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to 32,768 tokens.

For local use, AiCIPPY-Coder is compatible with Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers.

Deployment

AiCIPPY-Coder can be served via sglang or vllm as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.

SGLang

SGLang is a fast serving framework for large language and vision language models.

pip install 'sglang[all]>=v0.5.8'

Launch the server with 256K context using tensor parallelism:

python -m sglang.launch_server \
  --model aivedha/aicippy-Coder \
  --port 30000 \
  --tp-size 2 \
  --tool-call-parser aicippy-coder

Note: If the server fails to start, reduce context length with --context-length 32768.

API endpoint available at: http://localhost:30000/v1

vLLM

vLLM is a high-throughput, memory-efficient inference and serving engine for LLMs.

pip install 'vllm>=0.15.0'

Launch with 256K context:

vllm serve aivedha/aicippy-Coder \
  --port 8000 \
  --tensor-parallel-size 2 \
  --enable-auto-tool-choice \
  --tool-call-parser aicippy-coder

Note: Reduce context length to 32768 if startup fails.

API endpoint available at: http://localhost:8000/v1

Agentic Coding with AiCIPPY-Coder

AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:

# Tool implementation
def square_the_number(num: float) -> float:
    return num ** 2

# Tool definition
tools = [
    {
        "type": "function",
        "function": {
            "name": "square_the_number",
            "description": "Returns the square of the given number.",
            "parameters": {
                "type": "object",
                "required": ["input_num"],
                "properties": {
                    "input_num": {
                        "type": "number",
                        "description": "The number to be squared."
                    }
                }
            }
        }
    }
]

from openai import OpenAI

# Point to your AiCIPPY-Coder local endpoint
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

messages = [{"role": "user", "content": "Square the number 1024"}]

completion = client.chat.completions.create(
    messages=messages,
    model="aivedha/aicippy-Coder",
    max_tokens=65536,
    tools=tools,
)

print(completion.choices[0])

Best Practices

For optimal generation quality, use the following sampling parameters:

Parameter	Recommended Value
`temperature`	`1.0`
`top_p`	`0.95`
`top_k`	`40`

About AiCIPPY

AiCIPPY is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.

Platform: aicippy.com
CLI: pip install aicippy
Organisation: AiVibe Software Services Private Limited, Chennai, India

About AiVedha

AiVedha (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (prod-kulys2bmix2nm). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.

License

This model is released under the Apache 2.0 License. See LICENSE for full terms.

The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.

Citation

If you use AiCIPPY-Coder in your research or products, please cite:

@misc{aivibe_aicippy_coder_2026,
  title        = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
  author       = {{AiVibe Software Services Private Limited}},
  year         = {2026},
  url          = {https://huggingface.co/aivedha/aicippy-Coder}}

Downloads last month: 2

Safetensors

Model size

80B params

Tensor type

BF16

Model tree for aivedha/aicippy-Coder

Unable to build the model tree, the base model loops to the model itself. Learn more.