Text Generation
Transformers
Safetensors
iquestcoder
code
industrial-code
long-context
conversational
custom_code
Instructions to use Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4
- SGLang
How to use Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4 with Docker Model Runner:
docker model run hf.co/Multilingual-Multimodal-NLP/IndustrialCoder-32B-AWQ-INT4
Add model card for InCoder-32B
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
license: other
|
| 5 |
+
tags:
|
| 6 |
+
- code
|
| 7 |
+
- industrial-code
|
| 8 |
+
- long-context
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# InCoder-32B: Industrial Code Foundation Model
|
| 12 |
+
|
| 13 |
+
**InCoder-32B** (Industrial-Coder-32B) is the first 32B-parameter code foundation model purpose-built for industrial code intelligence. While general code LLMs excel at standard programming tasks, their performance often degrades in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints.
|
| 14 |
+
|
| 15 |
+
InCoder-32B unifies code intelligence across:
|
| 16 |
+
- **Chip Design** (Verilog / RTL)
|
| 17 |
+
- **GPU Kernel Optimization** (CUDA / Triton)
|
| 18 |
+
- **Embedded Systems** (ARM Cortex-M, STM32)
|
| 19 |
+
- **Compiler Optimization** (x86-64 assembly, LLVM)
|
| 20 |
+
- **3D Modeling** (CAD/CAM via CadQuery / OpenCascade)
|
| 21 |
+
|
| 22 |
+
The model supports a native long-context window of up to **128K tokens**.
|
| 23 |
+
|
| 24 |
+
- **Paper:** [InCoder-32B: Code Foundation Model for Industrial Scenarios](https://huggingface.co/papers/2603.16790)
|
| 25 |
+
- **Repository:** [GitHub - Industrial-Coder](https://github.com/CSJianYang/Industrial-Coder)
|
| 26 |
+
- **Project Page:** [IndustrialCoder](https://huggingface.co/Multilingual-Multimodal-NLP/IndustrialCoder)
|
| 27 |
+
|
| 28 |
+
## Performance
|
| 29 |
+
|
| 30 |
+
### General Code Benchmarks
|
| 31 |
+
| Model | Size | HumanEval | HumanEval+ | MBPP | MBPP+ | BCB Full | BCB Hard |
|
| 32 |
+
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 33 |
+
| Qwen2.5-Coder-32B-Instruct | 32B | 93.3 | 86.6 | 90.2 | 77.8 | 48.0 | 24.3 |
|
| 34 |
+
| **InCoder-32B** | **32B** | **94.5** | **89.6** | **91.8** | **78.3** | **49.8** | **31.1** |
|
| 35 |
+
|
| 36 |
+
### Industrial Code Benchmarks
|
| 37 |
+
| Domain | Benchmark | InCoder-32B | Claude-Sonnet-4.6 | Qwen3.5-397B-A17B |
|
| 38 |
+
|---|---|:---:|:---:|:---:|
|
| 39 |
+
| **Chip Design** | VeriScope Score | 80.7 | **87.7** | 73.1 |
|
| 40 |
+
| **GPU Optim.** | KernelBench L1/L2/L3 | **22.2/36.0/14.0** | 11.1/28.0/2.0 | 4.0/10.0/0.0 |
|
| 41 |
+
| **3D Modeling** | CAD-Coder Compile (%) | **82.0** | 77.0 | 79.0 |
|
| 42 |
+
|
| 43 |
+
## Training Pipeline: Code-Flow
|
| 44 |
+
|
| 45 |
+
InCoder-32B was developed using a three-stage **Code-Flow** pipeline:
|
| 46 |
+
1. **Pre-training & Annealing:** Curated industrial code mixed with general code pre-training using multi-level deduplication.
|
| 47 |
+
2. **Mid-training (Context Extension):** Progressive context extension from 8K to 128K tokens using synthetic industrial reasoning data and agent trajectories.
|
| 48 |
+
3. **Post-training:** Execution-grounded SFT across hardware design, GPU kernels, and systems programming.
|
| 49 |
+
|
| 50 |
+
## Quickstart
|
| 51 |
+
|
| 52 |
+
### Installation
|
| 53 |
+
```bash
|
| 54 |
+
pip install -U "transformers>=4.57.1" accelerate safetensors
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### Usage with Transformers
|
| 58 |
+
Note: The model uses a custom architecture, so `trust_remote_code=True` is required.
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
import torch
|
| 62 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 63 |
+
|
| 64 |
+
model_name = "Multilingual-Multimodal-NLP/IndustrialCoder"
|
| 65 |
+
|
| 66 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
| 67 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 68 |
+
model_name,
|
| 69 |
+
torch_dtype="auto",
|
| 70 |
+
device_map="auto",
|
| 71 |
+
trust_remote_code=True,
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
messages = [{"role": "user", "content": "Optimize this CUDA kernel for better memory coalescing."}]
|
| 75 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 76 |
+
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 77 |
+
|
| 78 |
+
with torch.no_grad():
|
| 79 |
+
out = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.85, top_k=20)
|
| 80 |
+
|
| 81 |
+
print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## Citation
|
| 85 |
+
```bibtex
|
| 86 |
+
@article{yang2026incoder,
|
| 87 |
+
title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
|
| 88 |
+
author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin and others},
|
| 89 |
+
journal={arXiv preprint arXiv:2603.16790},
|
| 90 |
+
year={2026}
|
| 91 |
+
}
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
## Disclaimer
|
| 95 |
+
The model may generate incorrect or unsafe code. Always review and test outputs in a sandboxed environment before production use. Industrial code (RTL, embedded firmware, GPU kernels) requires expert human review before deployment.
|