AiVibe Logo

AiCIPPY-Coder

The Agentic Coding Intelligence behind AiCIPPY
by AiVedha · AiVibe Software Services Private Limited

aicippy.com · aivedha.ai · aivibe.cloud · PyPI


Highlights

We are releasing AiCIPPY-Coder — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.

  • Efficient Yet Powerful: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
  • Advanced Agentic Capabilities: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
  • Seamless IDE and CLI Integration: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.

Model Overview

AiCIPPY-Coder carries the following architecture:

Property Value
Model Type Causal Language Model
Training Stage Pretraining & Post-training
Total Parameters 80B
Activated Parameters 3B
Non-Embedding Parameters 79B
Hidden Dimension 2048
Number of Layers 48
Context Length 262,144 tokens (native)
Thinking Mode Non-thinking (no <think> blocks)

Architecture Details:

  • Hybrid Layout: 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
  • Gated Attention: 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
  • Gated DeltaNet: 32 heads for V, 16 for QK, Head Dim 128
  • Mixture of Experts: 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512

Note: This model operates in non-thinking mode only. The <think></think> output blocks are not generated. Setting enable_thinking=False is not required.


Quickstart

Ensure you are using the latest version of transformers before proceeding.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aivedha/aicippy-Coder"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Prepare input
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("AiCIPPY-Coder:", content)

Note: If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to 32,768 tokens.

For local use, AiCIPPY-Coder is compatible with Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers.


Deployment

AiCIPPY-Coder can be served via sglang or vllm as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.

SGLang

SGLang is a fast serving framework for large language and vision language models.

pip install 'sglang[all]>=v0.5.8'

Launch the server with 256K context using tensor parallelism:

python -m sglang.launch_server \
  --model aivedha/aicippy-Coder \
  --port 30000 \
  --tp-size 2 \
  --tool-call-parser aicippy-coder

Note: If the server fails to start, reduce context length with --context-length 32768.

API endpoint available at: http://localhost:30000/v1


vLLM

vLLM is a high-throughput, memory-efficient inference and serving engine for LLMs.

pip install 'vllm>=0.15.0'

Launch with 256K context:

vllm serve aivedha/aicippy-Coder \
  --port 8000 \
  --tensor-parallel-size 2 \
  --enable-auto-tool-choice \
  --tool-call-parser aicippy-coder

Note: Reduce context length to 32768 if startup fails.

API endpoint available at: http://localhost:8000/v1


Agentic Coding with AiCIPPY-Coder

AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:

# Tool implementation
def square_the_number(num: float) -> float:
    return num ** 2

# Tool definition
tools = [
    {
        "type": "function",
        "function": {
            "name": "square_the_number",
            "description": "Returns the square of the given number.",
            "parameters": {
                "type": "object",
                "required": ["input_num"],
                "properties": {
                    "input_num": {
                        "type": "number",
                        "description": "The number to be squared."
                    }
                }
            }
        }
    }
]

from openai import OpenAI

# Point to your AiCIPPY-Coder local endpoint
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

messages = [{"role": "user", "content": "Square the number 1024"}]

completion = client.chat.completions.create(
    messages=messages,
    model="aivedha/aicippy-Coder",
    max_tokens=65536,
    tools=tools,
)

print(completion.choices[0])

Best Practices

For optimal generation quality, use the following sampling parameters:

Parameter Recommended Value
temperature 1.0
top_p 0.95
top_k 40

About AiCIPPY

AiCIPPY is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.

  • Platform: aicippy.com
  • CLI: pip install aicippy
  • Organisation: AiVibe Software Services Private Limited, Chennai, India

About AiVedha

AiVedha (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (prod-kulys2bmix2nm). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.


License

This model is released under the Apache 2.0 License. See LICENSE for full terms.

The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.


Citation

If you use AiCIPPY-Coder in your research or products, please cite:

@misc{aivibe_aicippy_coder_2026,
  title        = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
  author       = {{AiVibe Software Services Private Limited}},
  year         = {2026},
  url          = {https://huggingface.co/aivedha/aicippy-Coder}}
Downloads last month
19
Safetensors
Model size
80B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aivedha/aicippy-Coder

Unable to build the model tree, the base model loops to the model itself. Learn more.