Update README.md

732520b verified 3 days ago

8.03 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE
	pipeline_tag: text-generation
	base_model: aivedha/aicippy-Coder
	tags:
	- aicippy
	- aivedha
	- aivibe
	- coding-agent
	- code-generation
	- agentic-coding
	---

	<p align="center">
	<img src="https://aivibe.cloud/assets/aivibe-logo.png" alt="AiVibe Logo" width="180"/>
	</p>

	<h1 align="center">AiCIPPY-Coder</h1>

	<p align="center">
	<b>The Agentic Coding Intelligence behind AiCIPPY</b><br/>
	<i>by AiVedha · AiVibe Software Services Private Limited</i>
	</p>

	<p align="center">
	<a href="https://aicippy.com">aicippy.com</a> ·
	<a href="https://aivedha.ai">aivedha.ai</a> ·
	<a href="https://aivibe.cloud">aivibe.cloud</a> ·
	<a href="https://pypi.org/project/aicippy">PyPI</a>
	</p>

	---

	## Highlights

	We are releasing AiCIPPY-Coder — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.

	- Efficient Yet Powerful: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
	- Advanced Agentic Capabilities: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
	- Seamless IDE and CLI Integration: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.

	---

	## Model Overview

	AiCIPPY-Coder carries the following architecture:

	\| Property \| Value \|
	\|---\|---\|
	\| Model Type \| Causal Language Model \|
	\| Training Stage \| Pretraining & Post-training \|
	\| Total Parameters \| 80B \|
	\| Activated Parameters \| 3B \|
	\| Non-Embedding Parameters \| 79B \|
	\| Hidden Dimension \| 2048 \|
	\| Number of Layers \| 48 \|
	\| Context Length \| 262,144 tokens (native) \|
	\| Thinking Mode \| Non-thinking (no `<think>` blocks) \|

	Architecture Details:
	- Hybrid Layout: 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
	- Gated Attention: 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
	- Gated DeltaNet: 32 heads for V, 16 for QK, Head Dim 128
	- Mixture of Experts: 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512

	> Note: This model operates in non-thinking mode only. The `<think></think>` output blocks are not generated. Setting `enable_thinking=False` is not required.

	---

	## Quickstart

	Ensure you are using the latest version of `transformers` before proceeding.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "aivedha/aicippy-Coder"

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# Prepare input
	prompt = "Write a quick sort algorithm."
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# Generate
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=65536
	)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

	content = tokenizer.decode(output_ids, skip_special_tokens=True)
	print("AiCIPPY-Coder:", content)
	```

	> Note: If you encounter out-of-memory (OOM) issues, reduce the context length — for example, to `32,768` tokens.

	For local use, AiCIPPY-Coder is compatible with Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers.

	---

	## Deployment

	AiCIPPY-Coder can be served via `sglang` or `vllm` as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.

	### SGLang

	[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language and vision language models.

	```shell
	pip install 'sglang[all]>=v0.5.8'
	```

	Launch the server with 256K context using tensor parallelism:

	```shell
	python -m sglang.launch_server \
	--model aivedha/aicippy-Coder \
	--port 30000 \
	--tp-size 2 \
	--tool-call-parser aicippy-coder
	```

	> Note: If the server fails to start, reduce context length with `--context-length 32768`.

	API endpoint available at: `http://localhost:30000/v1`

	---

	### vLLM

	[vLLM](https://github.com/vllm-project/vllm) is a high-throughput, memory-efficient inference and serving engine for LLMs.

	```shell
	pip install 'vllm>=0.15.0'
	```

	Launch with 256K context:

	```shell
	vllm serve aivedha/aicippy-Coder \
	--port 8000 \
	--tensor-parallel-size 2 \
	--enable-auto-tool-choice \
	--tool-call-parser aicippy-coder
	```

	> Note: Reduce context length to `32768` if startup fails.

	API endpoint available at: `http://localhost:8000/v1`

	---

	## Agentic Coding with AiCIPPY-Coder

	AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:

	```python
	# Tool implementation
	def square_the_number(num: float) -> float:
	return num ** 2

	# Tool definition
	tools = [
	{
	"type": "function",
	"function": {
	"name": "square_the_number",
	"description": "Returns the square of the given number.",
	"parameters": {
	"type": "object",
	"required": ["input_num"],
	"properties": {
	"input_num": {
	"type": "number",
	"description": "The number to be squared."
	}
	}
	}
	}
	}
	]

	from openai import OpenAI

	# Point to your AiCIPPY-Coder local endpoint
	client = OpenAI(
	base_url="http://localhost:8000/v1",
	api_key="EMPTY"
	)

	messages = [{"role": "user", "content": "Square the number 1024"}]

	completion = client.chat.completions.create(
	messages=messages,
	model="aivedha/aicippy-Coder",
	max_tokens=65536,
	tools=tools,
	)

	print(completion.choices[0])
	```

	---

	## Best Practices

	For optimal generation quality, use the following sampling parameters:

	\| Parameter \| Recommended Value \|
	\|---\|---\|
	\| `temperature` \| `1.0` \|
	\| `top_p` \| `0.95` \|
	\| `top_k` \| `40` \|

	---

	## About AiCIPPY

	AiCIPPY is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.

	- Platform: [aicippy.com](https://aicippy.com)
	- CLI: `pip install aicippy`
	- Organisation: AiVibe Software Services Private Limited, Chennai, India

	---

	## About AiVedha

	AiVedha (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (`prod-kulys2bmix2nm`). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.

	---

	## License

	This model is released under the Apache 2.0 License. See [LICENSE](https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE) for full terms.

	The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.

	---

	## Citation

	If you use AiCIPPY-Coder in your research or products, please cite:

	```bibtex
	@misc{aivibe_aicippy_coder_2026,
	title = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
	author = {{AiVibe Software Services Private Limited}},
	year = {2026},
	url = {https://huggingface.co/aivedha/aicippy-Coder}}
	```