File size: 6,479 Bytes
b42532d
 
 
 
 
 
 
 
 
 
 
 
 
 
6cd76ac
 
 
 
 
 
b42532d
6cd76ac
 
 
 
 
 
b42532d
 
 
6cd76ac
 
 
 
 
 
 
 
 
b42532d
6cd76ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b42532d
6cd76ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b42532d
6cd76ac
 
 
 
b42532d
 
 
6cd76ac
 
b42532d
 
 
6cd76ac
 
b42532d
6cd76ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b42532d
6cd76ac
b42532d
 
 
 
6cd76ac
b42532d
6cd76ac
 
 
 
 
 
 
 
 
 
 
 
 
 
b42532d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- code
- industrial-code
- verilog
- cuda
- triton
- chip-design
- cad
---

# InCoder-32B: Code Foundation Model for Industrial Scenarios

<div align="center">

[![HuggingFace](https://img.shields.io/badge/πŸ€—-Model%20Hub-yellow)](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder)
[![GitHub](https://img.shields.io/badge/GitHub-Industrial--Coder-blue)](https://github.com/CSJianYang/Industrial-Coder)
[![arXiv](https://img.shields.io/badge/arXiv-2603.16790-red)](https://huggingface.co/papers/2603.16790)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)

</div>

## Model Summary

**InCoder-32B** (Industrial-Coder-32B) is the first 32B-parameter code foundation model purpose-built for industrial code intelligence. While general-purpose code LLMs excel at mainstream software tasks, they often struggle with the unique demands of industrial programming β€” hardware semantics, specialized language constructs, strict resource constraints, and domain-specific correctness verification. 

Presented in the paper [InCoder-32B: Code Foundation Model for Industrial Scenarios](https://huggingface.co/papers/2603.16790), InCoder-32B unifies code intelligence across five industrial domains:

| Domain | Languages & Frameworks |
|---|---|
| πŸ”§ **Chip Design** | Verilog, SystemVerilog, RTL |
| ⚑ **GPU Kernel Optimization** | CUDA, Triton |
| πŸ–₯️ **Embedded Systems** | C/C++, ARM Cortex-M4, STM32 |
| πŸ”¨ **Compiler Optimization** | x86-64 ASM, C/C++, LLVM-IR |
| πŸ“ **3D Modeling / CAD** | CadQuery, OpenCascade, Python |

InCoder-32B achieves highly competitive performance on general tasks while establishing the strongest open-source baselines across all evaluated industrial domains.

---

## Key Results

### General Code Benchmarks

| Benchmark | InCoder-32B |
|---|---|
| SWE-bench Verified | **74.8%** |
| LiveCodeBench (Pass@1) | **49.14%** |
| BFCL v3 | **60.99%** |
| HumanEval+ | **89.6%** |
| MBPP+ | **78.3%** |
| BigCodeBench (Full) | **49.8%** |

### Industrial Code Benchmarks

| Benchmark | Domain | InCoder-32B | Best Competing Open-Weight |
|---|---|---|---|
| VeriScope Score | Chip Design | **80.7** | 83.2 (GLM-5) |
| CAD-Coder Compile | 3D Modeling | **82.0%** | 48.0% (Kimi-K2-Thinking) |
| KernelBench L1 | GPU Optimization | **22.2%** | 16.2% (GLM-5) |
| KernelBench L2 | GPU Optimization | **36.0%** | 28.0% (KernelBench L2) |

> InCoder-32B leads all open-weight baselines on CAD-Coder and KernelBench (all three levels), and even surpasses proprietary models like Claude-Sonnet-4.6 on CAD-Coder IoU and KernelBench L1/L2/L3.

---

## Model Architecture

InCoder-32B adopts a standard decoder-only Transformer architecture with the following configuration:

| Hyperparameter | Value |
|---|---|
| Parameters | ~32B |
| Layers | 64 |
| Hidden Size | 5,120 |
| Max Context Length | 131,072 (128K) |
| Positional Encoding | RoPE (ΞΈ = 500,000) |
| Precision | BFloat16 |

---

## Training Pipeline: Code-Flow

InCoder-32B is trained through a three-stage **Code-Flow** pipeline:

### Stage 1 β€” Pre-training & Annealing
- **Industrial Recall**: Data pipeline using rule-based filtering, FastText classifiers, and semantic retrieval for Verilog, CUDA, firmware C, and CadQuery.
- **Refinement**: OCR extraction from technical manuals, multi-level deduplication, and repository-level fork consolidation.
- **Training**: 15T total tokens using Autoregressive LM + Fill-in-the-Middle (FIM) objectives.

### Stage 2 β€” Mid-Training (Context Extension)
Context window extended progressively from 8K to 128K tokens:
- **8K β†’ 32K**: Targets file-level tasks like completing RTL modules or kernel functions.
- **32K β†’ 128K**: Unlocks long-context capabilities for extended debugging and cross-module projects.

### Stage 3 β€” Post-Training
2.5M supervised fine-tuning (SFT) samples constructed from real industrial tasks with execution-grounded verification using toolchains like Icarus Verilog, `nvcc`, and Renode (STM32 simulator).

---

## Usage

### Installation

```bash
pip install transformers accelerate
```

### Basic Inference

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Multilingual-Multimodal-NLP/IndustrialCoder"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """Write a synthesizable Verilog module for a UART transmitter (8N1 protocol).
The module should accept 8-bit parallel data and serialize it onto a TX line."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.2,
    do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Fill-in-the-Middle (FIM)

InCoder-32B supports FIM completion for code infilling tasks:

```python
prefix = """// CUDA kernel for RMS Normalization
__global__ void rms_norm_kernel(float* output, const float* input, 
                                 const float* weight, int N, float eps) {
    int idx = blockIdx.x;
"""
suffix = """
    output[idx * N + tid] = normalized * weight[tid];
}"""

fim_prompt = f"<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"
inputs = tokenizer(fim_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Limitations & Disclaimers

Based on failure analysis, the model may struggle with:
- **API Knowledge**: Linker errors from undefined HAL/CMSIS functions in embedded C.
- **Functional Semantics**: Producing compilable but functionally incorrect RTL under complex logic scenarios.
- **Optimization**: Correct but sub-optimal GPU kernel performance.

Always review and test generated code in a sandboxed environment. Industrial code (RTL, embedded firmware) requires expert review before deployment.

---

## Citation

```bibtex
@article{yang2026incoder,
  title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
  author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn 
          and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin 
          and others},
  journal={arXiv preprint arXiv:2603.16790},
  year={2026}
}
```