File size: 4,940 Bytes
e5254c0
293420c
 
 
 
 
 
 
 
 
 
 
 
 
 
d54f345
 
 
 
 
 
 
e5254c0
 
d54f345
 
 
 
 
bd5adcc
8ad5db8
bd5adcc
d54f345
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c821826
 
 
940687b
c821826
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121f0da
c821826
d54f345
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
language: 
- en
- zh 
- ja 
- ko 
- fr 
- es 
- pt 
- de 
- it 
- ru 
- ar 
- vi 
- th 
tags:
- code
- coding
- qwen3.0
- onnx
- int4
- web-ui
license: unknown
---

# JiRack Coder Reasoning 32B INT4

A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-32B-Instruct base and optimized using Microsoft ONNX Runtime.

- JiRack is cloud model and save money on cloud and can be used as expert model in RAG on cloud with ONNX JiRack java server as alternative.
- Subscription 1$ per month per user in updated license if not company 

## Quick Start
Watch the JiRack Coder 32B in action:
**DEMO**: [JiRack Coder Reasoning 32B Web UI](https://youtu.be/mq1DxIov7Bw)

### Run with Docker

---
--Default CPU--

- docker run -d \
  --name jirack_coder_reasoning_32b \
  -p 7869:7869 \
  --restart unless-stopped \
  cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest

--Multi CPU--

- docker run -d \
  --name jirack_coder_reasoning_32b \
  -p 7869:7869 \
  --restart unless-stopped \
  --memory=48g \
  --cpus=16 \
  cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest

---GPU--
-- comming soon

- docker run -d \
   --name jirack_coder_reasoning_32b \
   -p 7869:7869 \
   --gpus all \
   --restart unless-stopped \
   cmsmanhattan/jirack_coder_32b_int4_gpu_qwenbase:latest

---

services:
  
    
    image: cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest
    container_name: jirack_onnx_service
    ports:
      - "7869:7869"
    volumes:
      - .:/app
      - ./web:/app/web
    environment:
      - MAX_TOKENS=1024
      - TEMPERATURE=0.7
      - TOP_P=0.9
      - DEFAULT_STREAM=False
      - INTRA_THREADS=4
      - USE_ENV_ALLOCATOR=1
    deploy:
      resources:
        limits:
          memory: 48g 

## Access the UI
Once the container is running, open your browser and navigate to:

**`http://localhost:7869`**

This opens the **JiRack Coder UI** — a clean web interface designed for coding.

## Changing the Port
The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.

## Licensing
- The **JiRack Coder 32B model** is provided under a **commercial enterprise license**.
- All **JiRack UI clients** are provided under a commercial license.
- However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.

**JiRack Coder 14B** is available under a lighter commercial license (~$12 per user/year).

For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B, please contact us.

- JiRack MS Windows 11 Desktop chat client with ollama API setup: https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
- Live email chat with model via support@cmsmanhattan.com

## Hardware Recommendations for AMD Systems
It is significantly heavier than JiRack Coder 14B INT4

### Recommended Hardware for JiRack Coder Reasoning 32B INT4. It is one docker container

| Use Case              | CPU                              | GPU (ROCm)                        | VRAM / RAM      | Expected Speed      | Recommendation     |
|-----------------------|----------------------------------|-----------------------------------|-----------------|---------------------|--------------------|
| **Recommended**       | Ryzen 9 7950X / 9950X            | RX 7900 XTX / 2x RX 7900 XT       | 48GB+ VRAM      | 35-55 tokens/s      | Best choice        |
| **High Performance**  | Ryzen 9 9950X / Threadripper     | 2x RX 7900 XTX                    | 48-64GB VRAM    | 50-75 tokens/s      | Excellent          |
| **Enterprise**        | EPYC 7003/9004 series            | MI300X or 4x RX 7900 XTX          | 96GB+ VRAM      | 70-110 tokens/s     | Best for production|
| **Budget Option**     | Ryzen 7 7700 / 9700X             | RX 7900 XTX (24GB)                | 24GB+ VRAM      | 25-40 tokens/s      | Acceptable         |

### Important Memory Notes

Even though the 32B INT4 model itself takes approximately **12–14 GB**, we recommend **at least 48GB VRAM** for the following reasons:

- KV-cache consumption during generation (especially with long context)
- ONNX Runtime overhead and temporary buffers
- System stability and to avoid Out of Memory errors
- Room for larger context windows

**Minimum recommended:** 48GB VRAM (dual RX 7900 series or MI300X)  
**Ideal:** 48–64GB VRAM

For pure CPU inference (no GPU), we recommend at least **128GB system RAM** (Ryzen 9 7950X/9950X or better).

---

I will use the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.

## 📧 Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
- **Phone:** +1 (516) 777-0945
- **Location:** New York, USA