--- language: - en - zh - ja - ko - fr - es - pt - de - it - ru - ar - vi - th tags: - code - coding - qwen3.0 - onnx - int4 - web-ui license: unknown --- # JiRack Coder Reasoning 32B INT4 A fast and efficient coding assistant with a clean built-in web UI, powered by Qwen3.0-Coder-32B-Instruct base and optimized using Microsoft ONNX Runtime. - JiRack is cloud model and save money on cloud and can be used as expert model in RAG on cloud with ONNX JiRack java server as alternative. - Subscription 1$ per month per user in updated license if not company ## Quick Start Watch the JiRack Coder 32B in action: **DEMO**: [JiRack Coder Reasoning 32B Web UI](https://youtu.be/mq1DxIov7Bw) ### Run with Docker --- --Default CPU-- - docker run -d \ --name jirack_coder_reasoning_32b \ -p 7869:7869 \ --restart unless-stopped \ cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest --Multi CPU-- - docker run -d \ --name jirack_coder_reasoning_32b \ -p 7869:7869 \ --restart unless-stopped \ --memory=48g \ --cpus=16 \ cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest ---GPU-- -- comming soon - docker run -d \ --name jirack_coder_reasoning_32b \ -p 7869:7869 \ --gpus all \ --restart unless-stopped \ cmsmanhattan/jirack_coder_32b_int4_gpu_qwenbase:latest --- services: image: cmsmanhattan/jirack_coder_32b_int4_qwenbase:latest container_name: jirack_onnx_service ports: - "7869:7869" volumes: - .:/app - ./web:/app/web environment: - MAX_TOKENS=1024 - TEMPERATURE=0.7 - TOP_P=0.9 - DEFAULT_STREAM=False - INTRA_THREADS=4 - USE_ENV_ALLOCATOR=1 deploy: resources: limits: memory: 48g ## Access the UI Once the container is running, open your browser and navigate to: **`http://localhost:7869`** This opens the **JiRack Coder UI** — a clean web interface designed for coding. ## Changing the Port The listening port can be easily modified directly from the **Settings** panel within the JiRack Coder UI. ## Licensing - The **JiRack Coder 32B model** is provided under a **commercial enterprise license**. - All **JiRack UI clients** are provided under a commercial license. - However, the UI clients can be used for free when running together with the official JiRack Docker containers, as long as they are not redistributed separately. **JiRack Coder 14B** is available under a lighter commercial license (~$12 per user/year). For commercial licensing, cluster deployment, or enterprise use of the JiRack Coder 32B and JiRack Coder 14B, please contact us. - JiRack MS Windows 11 Desktop chat client with ollama API setup: https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip - Live email chat with model via support@cmsmanhattan.com ## Hardware Recommendations for AMD Systems It is significantly heavier than JiRack Coder 14B INT4 ### Recommended Hardware for JiRack Coder Reasoning 32B INT4. It is one docker container | Use Case | CPU | GPU (ROCm) | VRAM / RAM | Expected Speed | Recommendation | |-----------------------|----------------------------------|-----------------------------------|-----------------|---------------------|--------------------| | **Recommended** | Ryzen 9 7950X / 9950X | RX 7900 XTX / 2x RX 7900 XT | 48GB+ VRAM | 35-55 tokens/s | Best choice | | **High Performance** | Ryzen 9 9950X / Threadripper | 2x RX 7900 XTX | 48-64GB VRAM | 50-75 tokens/s | Excellent | | **Enterprise** | EPYC 7003/9004 series | MI300X or 4x RX 7900 XTX | 96GB+ VRAM | 70-110 tokens/s | Best for production| | **Budget Option** | Ryzen 7 7700 / 9700X | RX 7900 XTX (24GB) | 24GB+ VRAM | 25-40 tokens/s | Acceptable | ### Important Memory Notes Even though the 32B INT4 model itself takes approximately **12–14 GB**, we recommend **at least 48GB VRAM** for the following reasons: - KV-cache consumption during generation (especially with long context) - ONNX Runtime overhead and temporary buffers - System stability and to avoid Out of Memory errors - Room for larger context windows **Minimum recommended:** 48GB VRAM (dual RX 7900 series or MI300X) **Ideal:** 48–64GB VRAM For pure CPU inference (no GPU), we recommend at least **128GB system RAM** (Ryzen 9 7950X/9950X or better). --- I will use the default model in full FP32 precision for quantization, allowing us to find the optimal balance between model size and performance. ## 📧 Contact & Licensing For joint venture opportunities, hardware integration, or licensing inquiries: - **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com) - **Phone:** +1 (516) 777-0945 - **Location:** New York, USA