File size: 8,033 Bytes

---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE
pipeline_tag: text-generation
base_model: aivedha/aicippy-Coder
tags:
  - aicippy
  - aivedha
  - aivibe
  - coding-agent
  - code-generation
  - agentic-coding
---

<p align="center">
  <img src="https://aivibe.cloud/assets/aivibe-logo.png" alt="AiVibe Logo" width="180"/>
</p>

<h1 align="center">AiCIPPY-Coder</h1>

<p align="center">
  <b>The Agentic Coding Intelligence behind AiCIPPY</b><br/>
  <i>by AiVedha · AiVibe Software Services Private Limited</i>
</p>

<p align="center">
  <a href="https://aicippy.com">aicippy.com</a> ·
  <a href="https://aivedha.ai">aivedha.ai</a> ·
  <a href="https://aivibe.cloud">aivibe.cloud</a> ·
  <a href="https://pypi.org/project/aicippy">PyPI</a>
</p>

---

## Highlights

We are releasing **AiCIPPY-Coder** — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.

- **Efficient Yet Powerful**: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
- **Advanced Agentic Capabilities**: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
- **Seamless IDE and CLI Integration**: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.

---

## Model Overview

**AiCIPPY-Coder** carries the following architecture:

| Property | Value |
|---|---|
| Model Type | Causal Language Model |
| Training Stage | Pretraining & Post-training |
| Total Parameters | 80B |
| Activated Parameters | 3B |
| Non-Embedding Parameters | 79B |
| Hidden Dimension | 2048 |
| Number of Layers | 48 |
| Context Length | 262,144 tokens (native) |
| Thinking Mode | Non-thinking (no `<think>` blocks) |

**Architecture Details:**
- **Hybrid Layout:** 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
- **Gated Attention:** 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
- **Gated DeltaNet:** 32 heads for V, 16 for QK, Head Dim 128
- **Mixture of Experts:** 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512

> **Note:** This model operates in non-thinking mode only. The `<think></think>` output blocks are not generated. Setting `enable_thinking=False` is not required.

---

## Quickstart

Ensure you are using the latest version of `transformers` before proceeding.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aivedha/aicippy-Coder"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Prepare input
prompt = "Write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("AiCIPPY-Coder:", content)
```

> **Note:** If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to `32,768` tokens.

For local use, AiCIPPY-Coder is compatible with **Ollama**, **LMStudio**, **MLX-LM**, **llama.cpp**, and **KTransformers**.

---

## Deployment

AiCIPPY-Coder can be served via `sglang` or `vllm` as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.

### SGLang

[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language and vision language models.

```shell
pip install 'sglang[all]>=v0.5.8'
```

Launch the server with 256K context using tensor parallelism:

```shell
python -m sglang.launch_server \
  --model aivedha/aicippy-Coder \
  --port 30000 \
  --tp-size 2 \
  --tool-call-parser aicippy-coder
```

> **Note:** If the server fails to start, reduce context length with `--context-length 32768`.

API endpoint available at: `http://localhost:30000/v1`

---

### vLLM

[vLLM](https://github.com/vllm-project/vllm) is a high-throughput, memory-efficient inference and serving engine for LLMs.

```shell
pip install 'vllm>=0.15.0'
```

Launch with 256K context:

```shell
vllm serve aivedha/aicippy-Coder \
  --port 8000 \
  --tensor-parallel-size 2 \
  --enable-auto-tool-choice \
  --tool-call-parser aicippy-coder
```

> **Note:** Reduce context length to `32768` if startup fails.

API endpoint available at: `http://localhost:8000/v1`

---

## Agentic Coding with AiCIPPY-Coder

AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:

```python
# Tool implementation
def square_the_number(num: float) -> float:
    return num ** 2

# Tool definition
tools = [
    {
        "type": "function",
        "function": {
            "name": "square_the_number",
            "description": "Returns the square of the given number.",
            "parameters": {
                "type": "object",
                "required": ["input_num"],
                "properties": {
                    "input_num": {
                        "type": "number",
                        "description": "The number to be squared."
                    }
                }
            }
        }
    }
]

from openai import OpenAI

# Point to your AiCIPPY-Coder local endpoint
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

messages = [{"role": "user", "content": "Square the number 1024"}]

completion = client.chat.completions.create(
    messages=messages,
    model="aivedha/aicippy-Coder",
    max_tokens=65536,
    tools=tools,
)

print(completion.choices[0])
```

---

## Best Practices

For optimal generation quality, use the following sampling parameters:

| Parameter | Recommended Value |
|---|---|
| `temperature` | `1.0` |
| `top_p` | `0.95` |
| `top_k` | `40` |

---

## About AiCIPPY

**AiCIPPY** is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.

- **Platform:** [aicippy.com](https://aicippy.com)
- **CLI:** `pip install aicippy`
- **Organisation:** AiVibe Software Services Private Limited, Chennai, India

---

## About AiVedha

**AiVedha** (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (`prod-kulys2bmix2nm`). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.

---

## License

This model is released under the **Apache 2.0 License**. See [LICENSE](https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE) for full terms.

The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.

---

## Citation

If you use AiCIPPY-Coder in your research or products, please cite:

```bibtex
@misc{aivibe_aicippy_coder_2026,
  title        = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
  author       = {{AiVibe Software Services Private Limited}},
  year         = {2026},
  url          = {https://huggingface.co/aivedha/aicippy-Coder}}
```