| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | license_link: https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE |
| | pipeline_tag: text-generation |
| | base_model: aivedha/aicippy-Coder |
| | tags: |
| | - aicippy |
| | - aivedha |
| | - aivibe |
| | - coding-agent |
| | - code-generation |
| | - agentic-coding |
| | --- |
| | |
| | <p align="center"> |
| | <img src="https://aivibe.cloud/assets/aivibe-logo.png" alt="AiVibe Logo" width="180"/> |
| | </p> |
| |
|
| | <h1 align="center">AiCIPPY-Coder</h1> |
| |
|
| | <p align="center"> |
| | <b>The Agentic Coding Intelligence behind AiCIPPY</b><br/> |
| | <i>by AiVedha · AiVibe Software Services Private Limited</i> |
| | </p> |
| |
|
| | <p align="center"> |
| | <a href="https://aicippy.com">aicippy.com</a> · |
| | <a href="https://aivedha.ai">aivedha.ai</a> · |
| | <a href="https://aivibe.cloud">aivibe.cloud</a> · |
| | <a href="https://pypi.org/project/aicippy">PyPI</a> |
| | </p> |
| |
|
| | --- |
| |
|
| | ## Highlights |
| |
|
| | We are releasing **AiCIPPY-Coder** — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows. |
| |
|
| | - **Efficient Yet Powerful**: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale. |
| | - **Advanced Agentic Capabilities**: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks. |
| | - **Seamless IDE and CLI Integration**: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others. |
| |
|
| | --- |
| |
|
| | ## Model Overview |
| |
|
| | **AiCIPPY-Coder** carries the following architecture: |
| |
|
| | | Property | Value | |
| | |---|---| |
| | | Model Type | Causal Language Model | |
| | | Training Stage | Pretraining & Post-training | |
| | | Total Parameters | 80B | |
| | | Activated Parameters | 3B | |
| | | Non-Embedding Parameters | 79B | |
| | | Hidden Dimension | 2048 | |
| | | Number of Layers | 48 | |
| | | Context Length | 262,144 tokens (native) | |
| | | Thinking Mode | Non-thinking (no `<think>` blocks) | |
| |
|
| | **Architecture Details:** |
| | - **Hybrid Layout:** 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE) |
| | - **Gated Attention:** 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64 |
| | - **Gated DeltaNet:** 32 heads for V, 16 for QK, Head Dim 128 |
| | - **Mixture of Experts:** 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512 |
| |
|
| | > **Note:** This model operates in non-thinking mode only. The `<think></think>` output blocks are not generated. Setting `enable_thinking=False` is not required. |
| | |
| | --- |
| | |
| | ## Quickstart |
| | |
| | Ensure you are using the latest version of `transformers` before proceeding. |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_name = "aivedha/aicippy-Coder" |
| |
|
| | # Load tokenizer and model |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | |
| | # Prepare input |
| | prompt = "Write a quick sort algorithm." |
| | messages = [ |
| | {"role": "user", "content": prompt} |
| | ] |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True, |
| | ) |
| | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| | |
| | # Generate |
| | generated_ids = model.generate( |
| | **model_inputs, |
| | max_new_tokens=65536 |
| | ) |
| | output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() |
| | |
| | content = tokenizer.decode(output_ids, skip_special_tokens=True) |
| | print("AiCIPPY-Coder:", content) |
| | ``` |
| | |
| | > **Note:** If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to `32,768` tokens. |
| | |
| | For local use, AiCIPPY-Coder is compatible with **Ollama**, **LMStudio**, **MLX-LM**, **llama.cpp**, and **KTransformers**. |
| | |
| | --- |
| | |
| | ## Deployment |
| | |
| | AiCIPPY-Coder can be served via `sglang` or `vllm` as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform. |
| | |
| | ### SGLang |
| | |
| | [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language and vision language models. |
| | |
| | ```shell |
| | pip install 'sglang[all]>=v0.5.8' |
| | ``` |
| | |
| | Launch the server with 256K context using tensor parallelism: |
| | |
| | ```shell |
| | python -m sglang.launch_server \ |
| | --model aivedha/aicippy-Coder \ |
| | --port 30000 \ |
| | --tp-size 2 \ |
| | --tool-call-parser aicippy-coder |
| | ``` |
| | |
| | > **Note:** If the server fails to start, reduce context length with `--context-length 32768`. |
| | |
| | API endpoint available at: `http://localhost:30000/v1` |
| | |
| | --- |
| | |
| | ### vLLM |
| | |
| | [vLLM](https://github.com/vllm-project/vllm) is a high-throughput, memory-efficient inference and serving engine for LLMs. |
| | |
| | ```shell |
| | pip install 'vllm>=0.15.0' |
| | ``` |
| | |
| | Launch with 256K context: |
| | |
| | ```shell |
| | vllm serve aivedha/aicippy-Coder \ |
| | --port 8000 \ |
| | --tensor-parallel-size 2 \ |
| | --enable-auto-tool-choice \ |
| | --tool-call-parser aicippy-coder |
| | ``` |
| | |
| | > **Note:** Reduce context length to `32768` if startup fails. |
| | |
| | API endpoint available at: `http://localhost:8000/v1` |
| | |
| | --- |
| | |
| | ## Agentic Coding with AiCIPPY-Coder |
| | |
| | AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly: |
| | |
| | ```python |
| | # Tool implementation |
| | def square_the_number(num: float) -> float: |
| | return num ** 2 |
| | |
| | # Tool definition |
| | tools = [ |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "square_the_number", |
| | "description": "Returns the square of the given number.", |
| | "parameters": { |
| | "type": "object", |
| | "required": ["input_num"], |
| | "properties": { |
| | "input_num": { |
| | "type": "number", |
| | "description": "The number to be squared." |
| | } |
| | } |
| | } |
| | } |
| | } |
| | ] |
| | |
| | from openai import OpenAI |
| |
|
| | # Point to your AiCIPPY-Coder local endpoint |
| | client = OpenAI( |
| | base_url="http://localhost:8000/v1", |
| | api_key="EMPTY" |
| | ) |
| | |
| | messages = [{"role": "user", "content": "Square the number 1024"}] |
| |
|
| | completion = client.chat.completions.create( |
| | messages=messages, |
| | model="aivedha/aicippy-Coder", |
| | max_tokens=65536, |
| | tools=tools, |
| | ) |
| | |
| | print(completion.choices[0]) |
| | ``` |
| | |
| | --- |
| | |
| | ## Best Practices |
| | |
| | For optimal generation quality, use the following sampling parameters: |
| | |
| | | Parameter | Recommended Value | |
| | |---|---| |
| | | `temperature` | `1.0` | |
| | | `top_p` | `0.95` | |
| | | `top_k` | `40` | |
| | |
| | --- |
| | |
| | ## About AiCIPPY |
| | |
| | **AiCIPPY** is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito. |
| | |
| | - **Platform:** [aicippy.com](https://aicippy.com) |
| | - **CLI:** `pip install aicippy` |
| | - **Organisation:** AiVibe Software Services Private Limited, Chennai, India |
| | |
| | --- |
| | |
| | ## About AiVedha |
| | |
| | **AiVedha** (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (`prod-kulys2bmix2nm`). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio. |
| | |
| | --- |
| | |
| | ## License |
| | |
| | This model is released under the **Apache 2.0 License**. See [LICENSE](https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE) for full terms. |
| | |
| | The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms. |
| | |
| | --- |
| | |
| | ## Citation |
| | |
| | If you use AiCIPPY-Coder in your research or products, please cite: |
| | |
| | ```bibtex |
| | @misc{aivibe_aicippy_coder_2026, |
| | title = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha}, |
| | author = {{AiVibe Software Services Private Limited}}, |
| | year = {2026}, |
| | url = {https://huggingface.co/aivedha/aicippy-Coder}} |
| | ``` |