File size: 6,479 Bytes
b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d 6cd76ac b42532d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- code
- industrial-code
- verilog
- cuda
- triton
- chip-design
- cad
---
# InCoder-32B: Code Foundation Model for Industrial Scenarios
<div align="center">
[](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder)
[](https://github.com/CSJianYang/Industrial-Coder)
[](https://huggingface.co/papers/2603.16790)
[](LICENSE)
</div>
## Model Summary
**InCoder-32B** (Industrial-Coder-32B) is the first 32B-parameter code foundation model purpose-built for industrial code intelligence. While general-purpose code LLMs excel at mainstream software tasks, they often struggle with the unique demands of industrial programming β hardware semantics, specialized language constructs, strict resource constraints, and domain-specific correctness verification.
Presented in the paper [InCoder-32B: Code Foundation Model for Industrial Scenarios](https://huggingface.co/papers/2603.16790), InCoder-32B unifies code intelligence across five industrial domains:
| Domain | Languages & Frameworks |
|---|---|
| π§ **Chip Design** | Verilog, SystemVerilog, RTL |
| β‘ **GPU Kernel Optimization** | CUDA, Triton |
| π₯οΈ **Embedded Systems** | C/C++, ARM Cortex-M4, STM32 |
| π¨ **Compiler Optimization** | x86-64 ASM, C/C++, LLVM-IR |
| π **3D Modeling / CAD** | CadQuery, OpenCascade, Python |
InCoder-32B achieves highly competitive performance on general tasks while establishing the strongest open-source baselines across all evaluated industrial domains.
---
## Key Results
### General Code Benchmarks
| Benchmark | InCoder-32B |
|---|---|
| SWE-bench Verified | **74.8%** |
| LiveCodeBench (Pass@1) | **49.14%** |
| BFCL v3 | **60.99%** |
| HumanEval+ | **89.6%** |
| MBPP+ | **78.3%** |
| BigCodeBench (Full) | **49.8%** |
### Industrial Code Benchmarks
| Benchmark | Domain | InCoder-32B | Best Competing Open-Weight |
|---|---|---|---|
| VeriScope Score | Chip Design | **80.7** | 83.2 (GLM-5) |
| CAD-Coder Compile | 3D Modeling | **82.0%** | 48.0% (Kimi-K2-Thinking) |
| KernelBench L1 | GPU Optimization | **22.2%** | 16.2% (GLM-5) |
| KernelBench L2 | GPU Optimization | **36.0%** | 28.0% (KernelBench L2) |
> InCoder-32B leads all open-weight baselines on CAD-Coder and KernelBench (all three levels), and even surpasses proprietary models like Claude-Sonnet-4.6 on CAD-Coder IoU and KernelBench L1/L2/L3.
---
## Model Architecture
InCoder-32B adopts a standard decoder-only Transformer architecture with the following configuration:
| Hyperparameter | Value |
|---|---|
| Parameters | ~32B |
| Layers | 64 |
| Hidden Size | 5,120 |
| Max Context Length | 131,072 (128K) |
| Positional Encoding | RoPE (ΞΈ = 500,000) |
| Precision | BFloat16 |
---
## Training Pipeline: Code-Flow
InCoder-32B is trained through a three-stage **Code-Flow** pipeline:
### Stage 1 β Pre-training & Annealing
- **Industrial Recall**: Data pipeline using rule-based filtering, FastText classifiers, and semantic retrieval for Verilog, CUDA, firmware C, and CadQuery.
- **Refinement**: OCR extraction from technical manuals, multi-level deduplication, and repository-level fork consolidation.
- **Training**: 15T total tokens using Autoregressive LM + Fill-in-the-Middle (FIM) objectives.
### Stage 2 β Mid-Training (Context Extension)
Context window extended progressively from 8K to 128K tokens:
- **8K β 32K**: Targets file-level tasks like completing RTL modules or kernel functions.
- **32K β 128K**: Unlocks long-context capabilities for extended debugging and cross-module projects.
### Stage 3 β Post-Training
2.5M supervised fine-tuning (SFT) samples constructed from real industrial tasks with execution-grounded verification using toolchains like Icarus Verilog, `nvcc`, and Renode (STM32 simulator).
---
## Usage
### Installation
```bash
pip install transformers accelerate
```
### Basic Inference
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Multilingual-Multimodal-NLP/IndustrialCoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = """Write a synthesizable Verilog module for a UART transmitter (8N1 protocol).
The module should accept 8-bit parallel data and serialize it onto a TX line."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.2,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Fill-in-the-Middle (FIM)
InCoder-32B supports FIM completion for code infilling tasks:
```python
prefix = """// CUDA kernel for RMS Normalization
__global__ void rms_norm_kernel(float* output, const float* input,
const float* weight, int N, float eps) {
int idx = blockIdx.x;
"""
suffix = """
output[idx * N + tid] = normalized * weight[tid];
}"""
fim_prompt = f"<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"
inputs = tokenizer(fim_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Limitations & Disclaimers
Based on failure analysis, the model may struggle with:
- **API Knowledge**: Linker errors from undefined HAL/CMSIS functions in embedded C.
- **Functional Semantics**: Producing compilable but functionally incorrect RTL under complex logic scenarios.
- **Optimization**: Correct but sub-optimal GPU kernel performance.
Always review and test generated code in a sandboxed environment. Industrial code (RTL, embedded firmware) requires expert review before deployment.
---
## Citation
```bibtex
@article{yang2026incoder,
title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn
and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin
and others},
journal={arXiv preprint arXiv:2603.16790},
year={2026}
}
``` |