Instructions to use Multilingual-Multimodal-NLP/IndustrialCoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Multilingual-Multimodal-NLP/IndustrialCoder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Multilingual-Multimodal-NLP/IndustrialCoder", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Multilingual-Multimodal-NLP/IndustrialCoder", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Multilingual-Multimodal-NLP/IndustrialCoder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Multilingual-Multimodal-NLP/IndustrialCoder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/IndustrialCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Multilingual-Multimodal-NLP/IndustrialCoder

SGLang

How to use Multilingual-Multimodal-NLP/IndustrialCoder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Multilingual-Multimodal-NLP/IndustrialCoder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/IndustrialCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Multilingual-Multimodal-NLP/IndustrialCoder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/IndustrialCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Multilingual-Multimodal-NLP/IndustrialCoder with Docker Model Runner:
```
docker model run hf.co/Multilingual-Multimodal-NLP/IndustrialCoder
```

csjiaya commited on Mar 18

Commit

6cd76ac

verified ·

1 Parent(s): a1ab353

Create README.md

Browse files

Files changed (1) hide show

README.md +287 -0

README.md ADDED Viewed

	@@ -0,0 +1,287 @@

+# InCoder-32B: Code Foundation Model for Industrial Scenarios
+<div align="center">
+[![HuggingFace](https://img.shields.io/badge/🤗-Model%20Hub-yellow)](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder)
+[![GitHub](https://img.shields.io/badge/GitHub-Industrial--Coder-blue)](https://github.com/CSJianYang/Industrial-Coder)
+[![arXiv](https://img.shields.io/badge/arXiv-2603.16790-red)](https://arxiv.org/abs/2603.16790)
+[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)
+</div>
+## Model Summary
+**InCoder-32B** (Industrial-Coder-32B) is the first 32B-parameter code foundation model purpose-built for industrial code intelligence. While general-purpose code LLMs excel at mainstream software tasks, they struggle with the unique demands of industrial programming — hardware semantics, specialized language constructs, strict resource constraints, and domain-specific correctness verification. InCoder-32B unifies code intelligence across five industrial domains:
+| Domain | Languages & Frameworks |
+|---|---|
+| 🔧 **Chip Design** | Verilog, SystemVerilog, RTL |
+| ⚡ **GPU Kernel Optimization** | CUDA, Triton |
+| 🖥️ **Embedded Systems** | C/C++, ARM Cortex-M4, STM32 |
+| 🔨 **Compiler Optimization** | x86-64 ASM, C/C++, LLVM-IR |
+| 📐 **3D Modeling / CAD** | CadQuery, OpenCascade, Python |
+InCoder-32B achieves competitive general-purpose performance while establishing the strongest open-source baselines across all evaluated industrial domains.
+---
+## Key Results
+### General Code Benchmarks
+| Benchmark | InCoder-32B |
+|---|---|
+| SWE-bench Verified | **74.8%** |
+| LiveCodeBench (Pass@1) | **49.14%** |
+| BFCL v3 | **60.99%** |
+| HumanEval+ | **89.6%** |
+| MBPP+ | **78.3%** |
+| BigCodeBench (Full) | **49.8%** |
+| τ²-bench (Retail) | **85.1%** |
+| τ²-bench (Telecom) | **86.8%** |
+### Industrial Code Benchmarks
+| Benchmark | Domain | InCoder-32B | Best Competing Open-Weight |
+|---|---|---|---|
+| VeriScope Score | Chip Design | **80.7** | 83.2 (GLM-5) |
+| VeriRepair Fix | Chip Design | **80.0%** | 90.0% (GLM-5) |
+| RealBench Syn@1 (Module) | Chip Design | **74.8%** | 50.1% (Kimi-K2-Instruct) |
+| ArchXBench (n) | Chip Design | **51.0** | 50.0 (Claude-Sonnet-4.6) |
+| CAD-Coder Compile | 3D Modeling | **82.0%** | 48.0% (Kimi-K2-Thinking) |
+| CAD-Coder IoU | 3D Modeling | **53.5** | 20.0 (Kimi-K2-Thinking) |
+| EmbedCGen Main | Code Optimization | **35.2%** | 90.2% (GLM-5) |
+| KernelBench L1 | GPU Optimization | **22.2%** | 16.2% (GLM-5) |
+| KernelBench L2 | GPU Optimization | **36.0%** | 28.0% (KernelBench L2) |
+| KernelBench L3 | GPU Optimization | **14.0%** | 8.0% (MiniMax-M2.5) |
+| TritonBench G-call | GPU Optimization | **18.5%** | 28.8% (Claude-Sonnet-4.6) |
+> InCoder-32B leads all open-weight baselines on CAD-Coder and KernelBench (all three levels), and even surpasses the proprietary Claude-Sonnet-4.6 on CAD-Coder IoU and KernelBench L1/L2/L3.
+---
+## Model Architecture
+InCoder-32B adopts a standard decoder-only Transformer architecture with the following configuration:
+| Hyperparameter | Value |
+|---|---|
+| Parameters | ~32B |
+| Layers | 64 |
+| Hidden Size | 5,120 |
+| Intermediate Size | 27,648 |
+| Attention Heads | 40 |
+| KV Heads (GQA) | 8 |
+| Head Dimension | 128 |
+| Vocabulary Size | 76,800 |
+| Max Context Length | 131,072 (128K) |
+| Activation | SiLU |
+| Positional Encoding | RoPE (θ = 500,000) |
+| Precision | BFloat16 |
+| Tie Embeddings | No |
+---
+## Training Pipeline
+InCoder-32B is trained through a three-stage **Code-Flow** pipeline:
+### Stage 1 — Pre-training & Annealing
+Industrial code corpora (Verilog, CUDA, firmware C, CadQuery scripts) are severely underrepresented in existing datasets like The Stack v2. We construct a dedicated data pipeline using:
+- **Three-step domain recall**: rule-based filtering (file extensions, keywords like `endmodule`, `__global__`), FastText classifier, and semantic encoder retrieval
+- **OCR extraction** from technical books and hardware reference manuals
+- **Multi-level deduplication**: exact hash, MinHash LSH, repository-level fork consolidation, cross-source dedup
+- **Domain-specific validation**: AST comparison, re-compilation, synthesis checks
+- **Data refinement**: normalized formatting, cross-file dependency resolution, code-text alignment annotations
+Training details:
+- **Hardware**: 4,096 GPUs
+- **Objectives**: Autoregressive LM + Fill-in-the-Middle (FIM)
+- **Learning rate**: 3 × 10⁻⁴ (constant)
+- **Batch size**: 2,048 globally
+- **Total tokens**: 15T
+- **Curriculum**: function-level → file-level → multi-file/project-level
+### Stage 2 — Mid-Training (Context Extension)
+Context window is extended progressively from 8K to 128K tokens across two sub-stages, combined with domain-specific data synthesis:
+**Stage 2.1 — 8K → 32K:**
+- Targets file-level tasks: completing RTL modules, infilling kernel functions, generating testbenches
+- Data mix: reasoning QA (40%), agent trajectories (20%), commits (15%), industrial artifacts (15%), FIM (10%)
+**Stage 2.2 — 32K → 128K:**
+- Unlocks long-context capabilities: extended debugging sessions, cross-module projects
+- Graduated warm-up: long sequences start at 10%, linearly increases to 50%
+- Data mix shifts toward long-context: agent trajectories (30%), FIM (25%), reasoning QA (25%)
+**Synthetic Industrial QA Pipeline:**
+1. *Scenario specification* — identified with practicing hardware engineers
+2. *Seed code generation* — realistic RTL patterns, CUDA memory access idioms, interrupt-driven firmware
+3. *QA synthesis with automated verification* — code execution validation, static analysis, logical consistency checks
+### Stage 3 — Post-Training
+2.5M supervised fine-tuning samples are constructed directly from real industrial coding tasks with execution-grounded verification across four environments:
+| Environment | Toolchain |
+|---|---|
+| Chip Design | Icarus Verilog, Verilator, Yosys |
+| GPU Optimization | NVIDIA A100, nvcc, Triton compiler |
+| 3D Modeling | CadQuery, OpenCascade |
+| Embedded Systems | arm-none-eabi-gcc, Renode simulator (STM32F407) |
+SFT samples are organized into three categories:
+- **Direct solution** — requirement-to-implementation
+- **Defect repair** — failure-feedback-fix loop with closed-loop repair trajectories
+- **Performance & structural optimization** — improving correct code for efficiency, readability, or architecture
+---
+## Benchmarks
+### Industrial Benchmarks (New)
+This release introduces several new industrial benchmarks:
+- **VeriScope** — 568 Verilog generation problems across 5 difficulty levels (combinational logic → dual-core out-of-order RISC-V SoC with cache coherence). Graded via RTL simulation.
+- **VeriRepair** — ~22K training / 300 test Verilog bug-fix samples with 4 major error categories and 20 error types.
+- **EmbedCGen** — 500 bare-metal embedded C generation problems for STM32F407 (ARM Cortex-M4), evaluated via cross-compilation + Renode simulation.
+### General Benchmarks Evaluated
+| Category | Benchmarks |
+|---|---|
+| Code Generation | EvalPlus (HumanEval, MBPP), BigCodeBench, FullStackBench |
+| Code Reasoning | CRUXEval, LiveCodeBench |
+| Code Efficiency | Mercury |
+| Text-to-SQL | Spider, BIRD |
+| Agentic Coding | Terminal-Bench v1/v2, SWE-bench Verified |
+| Tool Use | Mind2Web, BFCL v3, τ²-bench |
+| Industrial | VeriScope, VeriRepair, RealBench, ArchXBench, CAD-Coder, EmbedCGen, SuperCoder, TritonBench, KernelBench |
+---
+## Usage
+### Installation
+```bash
+pip install transformers accelerate
+```
+### Basic Inference
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "Multilingual-Multimodal-NLP/IndustrialCoder"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+prompt = """Write a synthesizable Verilog module for a UART transmitter (8N1 protocol).
+The module should accept 8-bit parallel data and serialize it onto a TX line."""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=1024,
+    temperature=0.2,
+    do_sample=True,
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Fill-in-the-Middle (FIM)
+InCoder-32B supports FIM completion for code infilling tasks:
+```python
+prefix = """// CUDA kernel for RMS Normalization
+__global__ void rms_norm_kernel(float* output, const float* input,
+                                 const float* weight, int N, float eps) {
+    int idx = blockIdx.x;
+"""
+suffix = """
+    output[idx * N + tid] = normalized * weight[tid];
+}"""
+fim_prompt = f"<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"
+inputs = tokenizer(fim_prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Chat / Instruction Format
+```python
+messages = [
+    {
+        "role": "user",
+        "content": "Optimize this CUDA matrix multiplication kernel for an NVIDIA A100 using shared memory tiling with TILE_SIZE=32."
+    }
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.1, do_sample=True)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+```
+---
+## Limitations & Known Failure Modes
+Based on analysis of 1,882 failure cases across 9 industrial benchmarks:
+- **Compilation & syntax errors**: Dominant in Verilog tasks — 71% of RealBench failures involve malformed literals, incorrect port declarations, or bit-width mismatches.
+- **Incomplete API knowledge**: 47% of EmbedCGen failures are linker errors from undefined or incorrectly typed HAL/CMSIS functions; 33% of TritonBench failures are NameErrors from incorrect Triton API usage.
+- **Format compliance**: 46% of VeriScope failures are unparseable structured outputs where the required format is ignored entirely.
+- **Functional correctness under precise semantics**: 79% of VeriRepair failures produce compilable but functionally incorrect code; most CAD-Coder failures stem from systematic Euler angle convention misinterpretation.
+- **Optimization gap**: 33% of KernelBench failures produce functionally correct but insufficiently fast GPU kernels; 83% of SuperCoder failures result in the model copying input assembly without modification.
+---
+## Ablation Findings
+- **Repository transition data** outperforms static snapshots for planning tasks
+- **Mid-training reasoning trajectories** improve robustness under distribution shift
+- **Thinking paths** unlock emergent capabilities absent in standard instruction tuning
+- **Scaling industrial SFT data** is a reliable performance driver across all 9 industrial benchmarks (83M → 167M → 250M tokens shows consistent improvement)
+---
+## Citation
+```bibtex
+@article{yang2026incoder,
+  title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
+  author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn
+          and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin
+          and others},
+  journal={arXiv preprint arXiv:2603.16790},
+  year={2026}
+}
+```
+---
+## Model Card Authors
+Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang, Chenghua Lin, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Bryan Dai, Weifeng Lv
+Affiliations: Beihang University, IQuest Research, Shanghai Jiao Tong University, ELLIS, University of Manchester, Shanghai Artificial Intelligence Laboratory, Sichuan University, Gaoling School of Artificial Intelligence (Renmin University of China), Langboat
+---
+## License
+This model is released under the [Apache 2.0 License](LICENSE).