| | --- |
| | license: other |
| | license_name: deepseek-license |
| | license_link: LICENSE |
| | pipeline_tag: text-generation |
| | tags: |
| | - code |
| | - mixture-of-experts |
| | - SarvaCode |
| | - india-stack |
| | language: |
| | - en |
| | base_model: |
| | - deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct |
| | --- |
| | |
| | # SarvaCode-16B-Indigenous |
| |
|
| | **SarvaCode** is an indigenously customized, open-source Mixture-of-Experts (MoE) code language model. It is built upon the DeepSeek-Coder-V2 architecture but optimized for the **Indian Software Ecosystem**. |
| |
|
| | While global models focus on general code, SarvaCode is fine-tuned to understand **Indian English instructions**, local financial protocols (GST, TDS), and the technical frameworks of **India Stack** (UPI, ONDC, Aadhaar/UIDAI). |
| |
|
| | ## 1. Key Improvements |
| | Compared to the base Lite model, **SarvaCode** features: |
| | - **Higher Active Parameters:** Increased from 6 to **8 active experts per token**, boosting reasoning power to **~3.2B active parameters** per message. |
| | - **Indigenous Logic:** Enhanced accuracy for Indian-specific tasks like GST calculation logic, IFSC validation, and regional date/currency formatting. |
| | - **India Stack Awareness:** Pre-loaded context for integrating with NPCI (UPI), ONDC, and DigiLocker APIs. |
| | - **Massive Context:** Maintains a **128K context window** to digest entire Indian government technical gazettes or large codebases in one go. |
| |
|
| | ## 2. Model Specifications |
| |
|
| | | **Model** | **#Total Params** | **#Active Params** | **Context Length** | **Specialization** | |
| | | :---: | :---: | :---: | :---: | :---: | |
| | | **SarvaCode-16B** | 16B | **3.2B** | 128k | India Stack & Fintech | |
| |
|
| | ## 3. How to Run Locally |
| |
|
| | ### Inference with Transformers |
| | Ensure you use `trust_remote_code=True` to load the specialized MoE configuration. |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | import torch |
| | |
| | model_path = "./SarvaCode" # Your local directory |
| | tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() |
| | |
| | # Example: Indian Financial Logic |
| | input_text = "User: Write a Python function to calculate the GST for a service with an 18% slab, ensuring the output separates CGST and SGST.\n\nAssistant:" |
| | |
| | inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=256) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |