InCoder-32B-Thinking: Reasoning Code Model for Industrial Scenarios
Model Summary
InCoder-32B-Thinking is the reasoning variant of the InCoder family. It extends InCoder-32B with chain-of-thought reasoning via <think>...</think> tags, enabling step-by-step problem decomposition before generating code. This is particularly effective for complex industrial tasks that require multi-step reasoning — debugging RTL modules, optimizing GPU kernels, or diagnosing embedded firmware issues.
For the instruction-tuned variant (without thinking), see IndustrialCoder. For the pre-trained base model, see IndustrialCoder-Base.
Key Results
General Code Benchmarks
| Benchmark | InCoder-32B | InCoder-32B-Thinking |
|---|---|---|
| HumanEval+ | 89.6 | 91.5 |
| MBPP+ | 78.3 | 80.1 |
| BigCodeBench (Full) | 49.8 | 51.2 |
| LiveCodeBench (Pass@1) | 49.14 | 52.3 |
Industrial Code Benchmarks
| Benchmark | Domain | InCoder-32B | InCoder-32B-Thinking |
|---|---|---|---|
| VeriScope Score | Chip Design | 80.7 | 82.3 |
| CAD-Coder Compile (%) | 3D Modeling | 82.0 | 84.0 |
| KernelBench L1 (%) | GPU Optimization | 22.2 | 24.0 |
The thinking variant shows consistent improvements across both general and industrial benchmarks, with the largest gains on tasks requiring multi-step reasoning.
Model Architecture
Same architecture as InCoder-32B, with thinking-aware post-training:
| Hyperparameter | Value |
|---|---|
| Parameters | ~32B |
| Layers | 64 |
| Hidden Size | 5,120 |
| Attention Heads | 40 (8 KV heads, GQA) |
| Max Context Length | 131,072 (128K) |
| Positional Encoding | RoPE (θ = 500,000) |
| Precision | BFloat16 |
How Thinking Mode Works
InCoder-32B-Thinking generates a reasoning trace inside <think>...</think> tags before producing the final answer. This allows the model to:
- Decompose complex problems into sub-tasks
- Reason about constraints, edge cases, and hardware semantics
- Plan the solution structure before writing code
Example output:
<think>
The user wants a UART transmitter module. Let me think through the design:
1. Need a state machine: IDLE -> START_BIT -> DATA_BITS -> STOP_BIT
2. 8N1 means: 8 data bits, no parity, 1 stop bit
3. Need a baud rate counter derived from the clock frequency
4. Shift register to serialize the 8-bit data LSB first
</think>
module uart_tx (
input wire clk,
...
You can disable thinking mode to get direct answers (behaves like the instruct variant):
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
enable_thinking=False
)
Usage
Installation
pip install transformers accelerate
Thinking Mode (default)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Multilingual-Multimodal-NLP/IndustrialCoder-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "user", "content": "Optimize this CUDA kernel for better memory coalescing:\n__global__ void add(float *a, float *b, float *c, int N) {\n int i = threadIdx.x;\n if (i < N) c[i] = a[i] + b[i];\n}"}
]
# Thinking mode (default) — model reasons before answering
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.85, top_k=20)
output = tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False)
# Parse thinking and response
if "</think>" in output:
thinking = output.split("</think>")[0].replace("<think>\n", "").strip()
response = output.split("</think>")[1].strip()
print(f"Thinking:\n{thinking}\n\nResponse:\n{response}")
else:
print(output)
Non-Thinking Mode
# Disable thinking — direct answer without reasoning trace
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
enable_thinking=False
)
With Tool Calls
tools = [{
"type": "function",
"function": {
"name": "run_verilog_sim",
"description": "Run Verilog simulation with Icarus Verilog",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Verilog source code"},
"testbench": {"type": "string", "description": "Testbench code"}
}
}
}
}]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, tools=tools
)
Deployment with vLLM
vllm serve Multilingual-Multimodal-NLP/IndustrialCoder-Thinking \
--tensor-parallel-size 4 --max-model-len 32768 --trust-remote-code
Recommended Sampling Parameters
| Use case | temperature | top_p | top_k | max_new_tokens |
|---|---|---|---|---|
| Thinking (default) | 0.6 | 0.85 | 20 | 8192 |
| Non-thinking / precise | 0.2 | 0.95 | — | 4096 |
Model Family
| Model | Type | HuggingFace |
|---|---|---|
| InCoder-32B-Base | Pre-trained | 🤗 IndustrialCoder-Base |
| InCoder-32B | Instruct | 🤗 IndustrialCoder |
| InCoder-32B-Thinking | Reasoning | 🤗 IndustrialCoder-Thinking |
| InCoder-32B-FP8 | FP8 Quantized | 🤗 IndustrialCoder-32B-FP8 |
| InCoder-32B-AWQ-INT4 | AWQ INT4 | 🤗 IndustrialCoder-32B-AWQ-INT4 |
| InCoder-32B-GPTQ-INT4 | GPTQ INT4 | 🤗 IndustrialCoder-32B-GPTQ-INT4 |
Limitations & Disclaimers
- The thinking trace may occasionally contain reasoning errors or hallucinated constraints — always verify the final code output.
- For simple tasks, thinking mode adds latency; use
enable_thinking=Falsefor straightforward generation. - Based on failure analysis, the model may struggle with:
- API Knowledge: Linker errors from undefined HAL/CMSIS functions in embedded C.
- Functional Semantics: Producing compilable but functionally incorrect RTL under complex logic scenarios.
- Optimization: Correct but sub-optimal GPU kernel performance.
Always review and test generated code in a sandboxed environment. Industrial code (RTL, embedded firmware, GPU kernels) requires expert review before deployment.
Citation
@article{yang2026incoder,
title={InCoder-32B: Code Foundation Model for Industrial Scenarios},
author={Yang, Jian and Zhang, Wei and Wu, Jiajun and Cheng, Junhang and Guo, Shawn
and Wang, Haowen and Gu, Weicheng and Du, Yaxin and Li, Joseph and Xu, Fanglin
and others},
journal={arXiv preprint arXiv:2603.16790},
year={2026}
}
- Downloads last month
- -