---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE
pipeline_tag: text-generation
base_model: aivedha/aicippy-Coder
tags:
- aicippy
- aivedha
- aivibe
- coding-agent
- code-generation
- agentic-coding
---
AiCIPPY-Coder
The Agentic Coding Intelligence behind AiCIPPY
by AiVedha · AiVibe Software Services Private Limited
aicippy.com ·
aivedha.ai ·
aivibe.cloud ·
PyPI
---
## Highlights
We are releasing **AiCIPPY-Coder** — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.
- **Efficient Yet Powerful**: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
- **Advanced Agentic Capabilities**: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
- **Seamless IDE and CLI Integration**: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.
---
## Model Overview
**AiCIPPY-Coder** carries the following architecture:
| Property | Value |
|---|---|
| Model Type | Causal Language Model |
| Training Stage | Pretraining & Post-training |
| Total Parameters | 80B |
| Activated Parameters | 3B |
| Non-Embedding Parameters | 79B |
| Hidden Dimension | 2048 |
| Number of Layers | 48 |
| Context Length | 262,144 tokens (native) |
| Thinking Mode | Non-thinking (no `` blocks) |
**Architecture Details:**
- **Hybrid Layout:** 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
- **Gated Attention:** 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
- **Gated DeltaNet:** 32 heads for V, 16 for QK, Head Dim 128
- **Mixture of Experts:** 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512
> **Note:** This model operates in non-thinking mode only. The `` output blocks are not generated. Setting `enable_thinking=False` is not required.
---
## Quickstart
Ensure you are using the latest version of `transformers` before proceeding.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "aivedha/aicippy-Coder"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare input
prompt = "Write a quick sort algorithm."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate
generated_ids = model.generate(
**model_inputs,
max_new_tokens=65536
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("AiCIPPY-Coder:", content)
```
> **Note:** If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to `32,768` tokens.
For local use, AiCIPPY-Coder is compatible with **Ollama**, **LMStudio**, **MLX-LM**, **llama.cpp**, and **KTransformers**.
---
## Deployment
AiCIPPY-Coder can be served via `sglang` or `vllm` as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.
### SGLang
[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language and vision language models.
```shell
pip install 'sglang[all]>=v0.5.8'
```
Launch the server with 256K context using tensor parallelism:
```shell
python -m sglang.launch_server \
--model aivedha/aicippy-Coder \
--port 30000 \
--tp-size 2 \
--tool-call-parser aicippy-coder
```
> **Note:** If the server fails to start, reduce context length with `--context-length 32768`.
API endpoint available at: `http://localhost:30000/v1`
---
### vLLM
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput, memory-efficient inference and serving engine for LLMs.
```shell
pip install 'vllm>=0.15.0'
```
Launch with 256K context:
```shell
vllm serve aivedha/aicippy-Coder \
--port 8000 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser aicippy-coder
```
> **Note:** Reduce context length to `32768` if startup fails.
API endpoint available at: `http://localhost:8000/v1`
---
## Agentic Coding with AiCIPPY-Coder
AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:
```python
# Tool implementation
def square_the_number(num: float) -> float:
return num ** 2
# Tool definition
tools = [
{
"type": "function",
"function": {
"name": "square_the_number",
"description": "Returns the square of the given number.",
"parameters": {
"type": "object",
"required": ["input_num"],
"properties": {
"input_num": {
"type": "number",
"description": "The number to be squared."
}
}
}
}
}
]
from openai import OpenAI
# Point to your AiCIPPY-Coder local endpoint
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
messages = [{"role": "user", "content": "Square the number 1024"}]
completion = client.chat.completions.create(
messages=messages,
model="aivedha/aicippy-Coder",
max_tokens=65536,
tools=tools,
)
print(completion.choices[0])
```
---
## Best Practices
For optimal generation quality, use the following sampling parameters:
| Parameter | Recommended Value |
|---|---|
| `temperature` | `1.0` |
| `top_p` | `0.95` |
| `top_k` | `40` |
---
## About AiCIPPY
**AiCIPPY** is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.
- **Platform:** [aicippy.com](https://aicippy.com)
- **CLI:** `pip install aicippy`
- **Organisation:** AiVibe Software Services Private Limited, Chennai, India
---
## About AiVedha
**AiVedha** (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (`prod-kulys2bmix2nm`). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.
---
## License
This model is released under the **Apache 2.0 License**. See [LICENSE](https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE) for full terms.
The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.
---
## Citation
If you use AiCIPPY-Coder in your research or products, please cite:
```bibtex
@misc{aivibe_aicippy_coder_2026,
title = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
author = {{AiVibe Software Services Private Limited}},
year = {2026},
url = {https://huggingface.co/aivedha/aicippy-Coder}}
```