kgrabko commited on
Commit
d54f345
·
verified ·
1 Parent(s): e5254c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md CHANGED
@@ -1,3 +1,110 @@
1
  ---
 
 
 
 
 
 
 
 
2
  license: unknown
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags:
4
+ - code
5
+ - coding
6
+ - qwen3.0
7
+ - onnx
8
+ - int4
9
+ - web-ui
10
  license: unknown
11
  ---
12
+
13
+ # JiRack Coder Reasoning 32B INT4
14
+
15
+ A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-32B-Instruct base and optimized using Microsoft ONNX Runtime.
16
+
17
+ ## Quick Start
18
+ Watch the JiRack Coder 32B in action:
19
+ **DEMO**: [JiRack Coder Reasoning 32B Web UI](https://youtu.be/mq1DxIov7Bw)
20
+
21
+ ### Run with Docker
22
+
23
+ ---
24
+ --Default CPU--
25
+
26
+ - docker run -d \
27
+ --name jirack_coder_reasoning_32b \
28
+ -p 7869:7869 \
29
+ --restart unless-stopped \
30
+ cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest
31
+
32
+ --Multi CPU--
33
+
34
+ - docker run -d \
35
+ --name jirack_coder_reasoning_32b \
36
+ -p 7869:7869 \
37
+ --restart unless-stopped \
38
+ --memory=48g \
39
+ --cpus=16 \
40
+ cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest
41
+
42
+ ---GPU--
43
+ -- comming soon
44
+
45
+ - docker run -d \
46
+ --name jirack_coder_reasoning_32b \
47
+ -p 7869:7869 \
48
+ --gpus all \
49
+ --restart unless-stopped \
50
+ cmsmanhattan/jirack_coder_32b_int4_gpu_qwenbase:latest
51
+
52
+ ---
53
+
54
+ ## Access the UI
55
+ Once the container is running, open your browser and navigate to:
56
+
57
+ **`http://localhost:7869`**
58
+
59
+ This opens the **JiRack Coder UI** — a clean web interface designed for coding.
60
+
61
+ ## Changing the Port
62
+ The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.
63
+
64
+ ## Licensing
65
+ - The **JiRack Coder 32B model** is provided under a **commercial enterprise license**.
66
+ - All **JiRack UI clients** are provided under a commercial license.
67
+ - However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.
68
+
69
+ **JiRack Coder 14B** is available under a lighter commercial license (~$12 per user/year).
70
+
71
+ For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B, please contact us.
72
+
73
+ - JiRack MS Windows 11 Desktop chat client with ollama API setup: https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
74
+ - Live email chat with model via support@cmsmanhattan.com
75
+
76
+ ## Hardware Recommendations for AMD Systems
77
+ It is significantly heavier than JiRack Coder 14B INT4
78
+
79
+ ### Recommended Hardware for JiRack Coder Reasoning 32B INT4. It is one docker container
80
+
81
+ | Use Case | CPU | GPU (ROCm) | VRAM / RAM | Expected Speed | Recommendation |
82
+ |-----------------------|----------------------------------|-----------------------------------|-----------------|---------------------|--------------------|
83
+ | **Recommended** | Ryzen 9 7950X / 9950X | RX 7900 XTX / 2x RX 7900 XT | 48GB+ VRAM | 35-55 tokens/s | Best choice |
84
+ | **High Performance** | Ryzen 9 9950X / Threadripper | 2x RX 7900 XTX | 48-64GB VRAM | 50-75 tokens/s | Excellent |
85
+ | **Enterprise** | EPYC 7003/9004 series | MI300X or 4x RX 7900 XTX | 96GB+ VRAM | 70-110 tokens/s | Best for production|
86
+ | **Budget Option** | Ryzen 7 7700 / 9700X | RX 7900 XTX (24GB) | 24GB+ VRAM | 25-40 tokens/s | Acceptable |
87
+
88
+ ### Important Memory Notes
89
+
90
+ Even though the 32B INT4 model itself takes approximately **12–14 GB**, we recommend **at least 48GB VRAM** for the following reasons:
91
+
92
+ - KV-cache consumption during generation (especially with long context)
93
+ - ONNX Runtime overhead and temporary buffers
94
+ - System stability and to avoid Out of Memory errors
95
+ - Room for larger context windows
96
+
97
+ **Minimum recommended:** 48GB VRAM (dual RX 7900 series or MI300X)
98
+ **Ideal:** 48–64GB VRAM
99
+
100
+ For pure CPU inference (no GPU), we recommend at least **128GB system RAM** (Ryzen 9 7950X/9950X or better).
101
+
102
+ ---
103
+
104
+ I will use the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance.
105
+
106
+ ## 📧 Contact & Licensing
107
+ For joint venture opportunities, hardware integration, or licensing inquiries:
108
+ - **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
109
+ - **Phone:** +1 (516) 777-0945
110
+ - **Location:** New York, USA