kgrabko's picture
Update README.md
5f535b3 verified
---
language:
- en
- zh
- ja
- ko
- fr
- es
- pt
- de
- it
- ru
- ar
- vi
- th
tags:
- code
- coding
- qwen3.0
- onnx
- int8
- web-ui
license: unknown
---
# JiRack Coder Reasoing 8B INT4
A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-8B-Instruct base and optimized using Microsoft ONNX Runtime.
- JiRack is cloud model and save money on cloud and can be used as expert model in RAG on cloud with ONNX JiRack java server as alternative.
- Subscription 1$ per month per user in updated license if not company
## Quick Start
Watch the JiRack Coder 8B in action:
**DEMO**: [JiRack Coder Reasoing 8B Web UI](https://youtu.be/mq1DxIov7Bw)
### Run with Docker
---
--Default CPU--
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--restart unless-stopped \
cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
--Multi CPU--
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--restart unless-stopped \
--memory=20g \
--cpus=12 \
cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
---GPU--
-- comming soon
- docker run -d \
--name jirack_coder_reasoing_8b \
-p 7869:7869 \
--gpus all \
--restart unless-stopped \
cmsmanhattan/jirack_coder_8b_int4_gpu_qwenbase:latest
---
services:
image: cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest
container_name: jirack_onnx_service
ports:
- "7869:7869"
volumes:
- .:/app
- ./web:/app/web
environment:
- MAX_TOKENS=1024
- TEMPERATURE=0.7
- TOP_P=0.9
- DEFAULT_STREAM=False
- INTRA_THREADS=4
- USE_ENV_ALLOCATOR=1
deploy:
resources:
limits:
memory: 16g
## Access the UI
Once the container is running, open your browser and navigate to:
**`http://localhost:7869`**
This opens the **JiRack Coder UI** — a clean web interface designed for coding.
## Changing the Port
The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.
## Licensing
- The **JiRack Coder 8B model** is provided under a commercial license. It ia about 12$ for year per user .
- All **JiRack UI clients** are provided under a commercial license.
- However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.
**JiRack Coder 32B** is available exclusively under a commercial enterprise license.
For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B , please contact us.
- JiRack MS Windows 11 Desktop chat client with ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
- Live email chat with model via support@cmsmanhattan.com
## Hardware Recommendations for AMD Systems
It is more heavy then JiRack Coder 7B INT8
### Recommended Hardware for JiRack Coder Reasoing 8B INT8 . It is one dcoker container
| Use Case | CPU | GPU (ROCm) | VRAM / RAM | Expected Speed | Recommendation |
|-----------------------|----------------------------------|-----------------------------------|----------------|---------------------|--------------------|
| **Recommended** | Ryzen 7 7700 / 9700X | RX 7900 XTX / 7900 XT | 24GB VRAM | 50-75 tokens/s | Best choice |
| **High Performance** | Ryzen 9 7950X / 9950X | RX 7900 XTX | 24GB+ VRAM | 65-90 tokens/s | Excellent |
| **Enterprise** | EPYC 7003/9004 series | MI300X or 2x RX 7900 XTX | 48GB+ VRAM | 90-140 tokens/s | For 32B model |
| **Budget Option** | Ryzen 5 7600 / 9600X | RX 7800 XT (16GB) | 16GB VRAM | 35-50 tokens/s | Acceptable |
### Important Memory Notes
Even though the 8B INT4 model itself takes approximately **5–6 GB**, we recommend **at least 24GB VRAM** for the following reasons:
- KV-cache consumption during generation (especially with long context)
- ONNX Runtime overhead and temporary buffers
- System stability and to avoid Out of Memory errors
- Room for larger context windows
**Minimum recommended:** 24GB VRAM (RX 7900 series)
**Ideal:** 24–32GB VRAM
For pure CPU inference (no GPU), we recommend at least **64GB system RAM** (Ryzen 9 7950X/9950X).
---
I will the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.
## 📧 Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
- **Phone:** +1 (516) 777-0945
- **Location:** New York, USA