File size: 4,897 Bytes
e9fbb37 745f8f5 f74a84f e9fbb37 f74a84f 24fe8b3 f74a84f 1f99b6c 5f535b3 1f99b6c f74a84f e58dc59 568550e e58dc59 653b4fd e58dc59 f74a84f bd7d2ff f74a84f b73e0c2 f74a84f ad12c36 f74a84f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
language:
- en
- zh
- ja
- ko
- fr
- es
- pt
- de
- it
- ru
- ar
- vi
- th
tags:
- code
- coding
- qwen3.0
- onnx
- int8
- web-ui
license: unknown
---
# JiRack Coder Reasoing 8B INT4
A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-8B-Instruct base and optimized using Microsoft ONNX Runtime.
- JiRack is cloud model and save money on cloud and can be used as expert model in RAG on cloud with ONNX JiRack java server as alternative.
- Subscription 1$ per month per user in updated license if not company
## Quick Start
Watch the JiRack Coder 8B in action:
**DEMO**: [JiRack Coder Reasoing 8B Web UI](https://youtu.be/mq1DxIov7Bw)
### Run with Docker
---
--Default CPU--
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--restart unless-stopped \
cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
--Multi CPU--
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--restart unless-stopped \
--memory=20g \
--cpus=12 \
cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
---GPU--
-- comming soon
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--gpus all \
--restart unless-stopped \
cmsmanhattan/jirack_coder_8b_int4_gpu_qwenbase:latest
---
services:
image: cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
container_name: jirack_onnx_service
ports:
- "7869:7869"
volumes:
- .:/app
- ./web:/app/web
environment:
- MAX_TOKENS=1024
- TEMPERATURE=0.7
- TOP_P=0.9
- DEFAULT_STREAM=False
- INTRA_THREADS=4
- USE_ENV_ALLOCATOR=1
deploy:
resources:
limits:
memory: 16g
## Access the UI
Once the container is running, open your browser and navigate to:
**`http://localhost:7869`**
This opens the **JiRack Coder UI** — a clean web interface designed for coding.
## Changing the Port
The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.
## Licensing
- The **JiRack Coder 8B model** is provided under a commercial license. It ia about 12$ for year per user .
- All **JiRack UI clients** are provided under a commercial license.
- However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.
**JiRack Coder 32B** is available exclusively under a commercial enterprise license.
For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B , please contact us.
- JiRack MS Windows 11 Desktop chat client with ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
- Live email chat with model via support@cmsmanhattan.com
## Hardware Recommendations for AMD Systems
It is more heavy then JiRack Coder 7B INT8
### Recommended Hardware for JiRack Coder Reasoing 8B INT8 . It is one dcoker container
| Use Case | CPU | GPU (ROCm) | VRAM / RAM | Expected Speed | Recommendation |
|-----------------------|----------------------------------|-----------------------------------|----------------|---------------------|--------------------|
| **Recommended** | Ryzen 7 7700 / 9700X | RX 7900 XTX / 7900 XT | 24GB VRAM | 50-75 tokens/s | Best choice |
| **High Performance** | Ryzen 9 7950X / 9950X | RX 7900 XTX | 24GB+ VRAM | 65-90 tokens/s | Excellent |
| **Enterprise** | EPYC 7003/9004 series | MI300X or 2x RX 7900 XTX | 48GB+ VRAM | 90-140 tokens/s | For 32B model |
| **Budget Option** | Ryzen 5 7600 / 9600X | RX 7800 XT (16GB) | 16GB VRAM | 35-50 tokens/s | Acceptable |
### Important Memory Notes
Even though the 8B INT4 model itself takes approximately **5–6 GB**, we recommend **at least 24GB VRAM** for the following reasons:
- KV-cache consumption during generation (especially with long context)
- ONNX Runtime overhead and temporary buffers
- System stability and to avoid Out of Memory errors
- Room for larger context windows
**Minimum recommended:** 24GB VRAM (RX 7900 series)
**Ideal:** 24–32GB VRAM
For pure CPU inference (no GPU), we recommend at least **64GB system RAM** (Ryzen 9 7950X/9950X).
---
I will the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.
## 📧 Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
- **Phone:** +1 (516) 777-0945
- **Location:** New York, USA
|