| --- |
| language: |
| - en |
| - zh |
| - ja |
| - ko |
| - fr |
| - es |
| - pt |
| - de |
| - it |
| - ru |
| - ar |
| - vi |
| - th |
| tags: |
| - code |
| - coding |
| - qwen3.0 |
| - onnx |
| - int8 |
| - web-ui |
| license: unknown |
| --- |
| |
| # JiRack Coder Reasoing 8B INT4 |
|
|
| A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-8B-Instruct base and optimized using Microsoft ONNX Runtime. |
|
|
| - JiRack is cloud model and save money on cloud and can be used as expert model in RAG on cloud with ONNX JiRack java server as alternative. |
| - Subscription 1$ per month per user in updated license if not company |
|
|
| ## Quick Start |
| Watch the JiRack Coder 8B in action: |
| **DEMO**: [JiRack Coder Reasoing 8B Web UI](https://youtu.be/mq1DxIov7Bw) |
|
|
|
|
| ### Run with Docker |
|
|
| --- |
| --Default CPU-- |
|
|
| - docker run -d \ |
| --name jirack_coder_reasoing_8b \ |
| -p 7869:7869 \ |
| --restart unless-stopped \ |
| cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest |
| |
| --Multi CPU-- |
| |
| - docker run -d \ |
| --name jirack_coder_reasoing_8b \ |
| -p 7869:7869 \ |
| --restart unless-stopped \ |
| --memory=20g \ |
| --cpus=12 \ |
| cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest |
|
|
| ---GPU-- |
| -- comming soon |
|
|
| - docker run -d \ |
| --name jirack_coder_reasoing_8b \ |
| -p 7869:7869 \ |
| --gpus all \ |
| --restart unless-stopped \ |
| cmsmanhattan/jirack_coder_8b_int4_gpu_qwenbase:latest |
|
|
| --- |
|
|
| services: |
| |
| |
| image: cmsmanhattan/jirack_coder_8b_int4_qwenbase:latest |
| container_name: jirack_onnx_service |
| ports: |
| - "7869:7869" |
| volumes: |
| - .:/app |
| - ./web:/app/web |
| environment: |
| - MAX_TOKENS=1024 |
| - TEMPERATURE=0.7 |
| - TOP_P=0.9 |
| - DEFAULT_STREAM=False |
| - INTRA_THREADS=4 |
| - USE_ENV_ALLOCATOR=1 |
| deploy: |
| resources: |
| limits: |
| memory: 16g |
| |
| ## Access the UI |
|
|
| Once the container is running, open your browser and navigate to: |
|
|
| **`http://localhost:7869`** |
|
|
| This opens the **JiRack Coder UI** — a clean web interface designed for coding. |
|
|
| ## Changing the Port |
|
|
| The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI. |
|
|
| ## Licensing |
|
|
| - The **JiRack Coder 8B model** is provided under a commercial license. It ia about 12$ for year per user . |
| - All **JiRack UI clients** are provided under a commercial license. |
| - However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately. |
|
|
|
|
| **JiRack Coder 32B** is available exclusively under a commercial enterprise license. |
|
|
| For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B , please contact us. |
| - JiRack MS Windows 11 Desktop chat client with ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip |
| - Live email chat with model via support@cmsmanhattan.com |
| |
| |
| ## Hardware Recommendations for AMD Systems |
| It is more heavy then JiRack Coder 7B INT8 |
| ### Recommended Hardware for JiRack Coder Reasoing 8B INT8 . It is one dcoker container |
| |
| | Use Case | CPU | GPU (ROCm) | VRAM / RAM | Expected Speed | Recommendation | |
| |-----------------------|----------------------------------|-----------------------------------|----------------|---------------------|--------------------| |
| | **Recommended** | Ryzen 7 7700 / 9700X | RX 7900 XTX / 7900 XT | 24GB VRAM | 50-75 tokens/s | Best choice | |
| | **High Performance** | Ryzen 9 7950X / 9950X | RX 7900 XTX | 24GB+ VRAM | 65-90 tokens/s | Excellent | |
| | **Enterprise** | EPYC 7003/9004 series | MI300X or 2x RX 7900 XTX | 48GB+ VRAM | 90-140 tokens/s | For 32B model | |
| | **Budget Option** | Ryzen 5 7600 / 9600X | RX 7800 XT (16GB) | 16GB VRAM | 35-50 tokens/s | Acceptable | |
| |
| ### Important Memory Notes |
| |
| Even though the 8B INT4 model itself takes approximately **5–6 GB**, we recommend **at least 24GB VRAM** for the following reasons: |
| |
| - KV-cache consumption during generation (especially with long context) |
| - ONNX Runtime overhead and temporary buffers |
| - System stability and to avoid Out of Memory errors |
| - Room for larger context windows |
| |
| **Minimum recommended:** 24GB VRAM (RX 7900 series) |
| **Ideal:** 24–32GB VRAM |
| |
| For pure CPU inference (no GPU), we recommend at least **64GB system RAM** (Ryzen 9 7950X/9950X). |
| |
| --- |
| I will the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance. |
| |
| |
| ## 📧 Contact & Licensing |
| For joint venture opportunities, hardware integration, or licensing inquiries: |
| - **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com) |
| - **Phone:** +1 (516) 777-0945 |
| - **Location:** New York, USA |
| |
| |
| |