File size: 4,105 Bytes
fc3cf1f
8e8dd6c
 
 
 
 
 
 
 
fc3cf1f
 
8e8dd6c
 
 
05e8c8c
8e8dd6c
 
ddbc871
 
 
8e8dd6c
 
 
e807015
 
5754e60
 
e807015
 
 
 
 
 
5754e60
 
e807015
 
 
 
 
 
 
 
5754e60
 
e807015
 
 
 
 
8a6c45b
e807015
8e8dd6c
b1127fe
2b803d4
8e8dd6c
18087a2
2b803d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24b0b29
755d110
3f32acf
 
 
 
 
a65808c
3f32acf
e61f1c7
 
e5545df
 
 
 
3f32acf
e61f1c7
3f32acf
e61f1c7
3f32acf
e61f1c7
 
 
 
3f32acf
e61f1c7
 
 
 
3f32acf
 
7b8e67d
e61f1c7
18087a2
6dbd222
 
 
 
 
18087a2
8e8dd6c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
language: en
tags:
- code
- coding
- qwen2.5
- onnx
- int8
- web-ui
license: gpl-3.0
---

# JiRack Coder 7B INT8

A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen2.5-Coder-7B-Instruct base and optimized using Microsoft ONNX Runtime.

## Quick Start
Watch the JiRack Coder 7B in action:
**DEMO**: [JiRack Coder 7B Web UI](https://youtu.be/I8AAITUiI64)


### Run with Docker

---
--Default CPU--

- docker run -d \
  --name jirack_coder_7b \
  -p 7869:7869 \
  --restart unless-stopped \
  cmsmanhattan/jirack_coder_7b_int8_qwenbase:latest

--Multi CPU--

- docker run -d \
  --name jirack_coder_7b \
  -p 7869:7869 \
  --restart unless-stopped \
  --memory=20g \
  --cpus=12 \
  cmsmanhattan/jirack_coder_7b_int8_qwenbase:latest

---GPU--

- docker run -d \
   --name jirack_coder_7b \
   -p 7869:7869 \
   --gpus all \
   --restart unless-stopped \
   cmsmanhattan/jirack_coder_7b_int8:1.0.2

---


## Access the UI

Once the container is running, open your browser and navigate to:

**`http://localhost:7869`**

This opens the **JiRack Coder UI** — a clean web interface designed for coding.

## Changing the Port

The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI.

## Licensing

- The **JiRack Coder 7B model** is released under the **GNU General Public License v3.0 (GPL-3.0)**.
- All **JiRack UI clients** are provided under a commercial license.
- However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately.

**JiRack Coder 32B** is available exclusively under a commercial enterprise license.

For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B , please contact us.
- JiRack MS Windows 11  Desktop chat client with  ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
- Live email chat with model via support@cmsmanhattan.com


## Hardware Recommendations for AMD Systems

### Recommended Hardware for JiRack Coder 7B INT8 . It is one dcoker container

| Use Case              | CPU                              | GPU (ROCm)                        | VRAM / RAM     | Expected Speed      | Recommendation     |
|-----------------------|----------------------------------|-----------------------------------|----------------|---------------------|--------------------|
| **Recommended**       | Ryzen 7 7700 / 9700X             | RX 7900 XTX / 7900 XT             | 24GB VRAM      | 50-75 tokens/s      | Best choice        |
| **High Performance**  | Ryzen 9 7950X / 9950X            | RX 7900 XTX                       | 24GB+ VRAM     | 65-90 tokens/s      | Excellent          |
| **Enterprise**        | EPYC 7003/9004 series            | MI300X or 2x RX 7900 XTX          | 48GB+ VRAM     | 90-140 tokens/s     | For 32B model      |
| **Budget Option**     | Ryzen 5 7600 / 9600X             | RX 7800 XT (16GB)                 | 16GB VRAM      | 35-50 tokens/s      | Acceptable         |

### Important Memory Notes

Even though the 7B INT8 model itself takes approximately **8–9 GB**, we recommend **at least 24GB VRAM** for the following reasons:

- KV-cache consumption during generation (especially with long context)
- ONNX Runtime overhead and temporary buffers
- System stability and to avoid Out of Memory errors
- Room for larger context windows

**Minimum recommended:** 24GB VRAM (RX 7900 series)  
**Ideal:** 24–32GB VRAM

For pure CPU inference (no GPU), we recommend at least **64GB system RAM** (Ryzen 9 7950X/9950X).

---
I added the default model in full FP32 precision, which is approximately 62 GB in size. This serves as the base for quantization, allowing us to find the optimal balance between model size and performance.


## 📧 Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
- **Phone:** +1 (516) 777-0945
- **Location:** New York, USA