File size: 4,630 Bytes
6997b47 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# Kubernetes Manifest: JiRack 236B Deployment
**Framework:** LeaderWorkerSet for Kubernetes
**Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio)
---
## 1. JiRack 236B Kubernetes Manifest
The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.
### YAML
```yaml
# jirack-236b-frontier.yaml
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: jirack-236b-frontier
spec:
replicas: 1 # Deploy as one 16-GPU logical unit
leaderWorkerTemplate:
size: 2 # Sharded across 2 nodes (8 GPUs each)
workerTemplate:
spec:
containers:
- name: jirack-engine
image: cms-manhattan/jirack-236b:latest
resources:
limits:
nvidia.com/gpu: 8
env:
- name: MODEL_LAYERS
value: "108"
- name: PIPELINE_PARALLEL_SIZE
value: "2"
- name: TENSOR_PARALLEL_SIZE
value: "8"
- name: MODEL_DIM
value: "14336"
- name: GQA_RATIO
value: "14"
- name: AUTHOR_SIG
value: "Konstantin Vladimirovich Grabko"
```
---
## 2. CI/CD Pipeline: Build and Deploy JiRack 236B
This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**.
### YAML
```yaml
# .github/workflows/jirack-deploy.yml
name: Build and Deploy JiRack 236B
on:
push:
branches: [ main ]
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build JiRack Engine
run: |
docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .
docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest
- name: Push Image
run: docker push cms-manhattan/jirack-236b:latest
deploy-to-k8s:
needs: build-and-push
runs-on: self-hosted # Use a runner with access to your K8s cluster
steps:
- name: Set Kubernetes Context
uses: azure/k8s-set-context@v3
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy Manifest
run: |
kubectl apply -f k8s/jirack-236b-frontier.yaml
kubectl rollout restart leaderworkerset/jirack-236b-frontier
```
---
## 3. The "236B Optimization" Benchmarking
After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality.
| **Test Parameter** | **Target for JiRack 236B** | **Failure Action** |
|---------------------------|----------------------------|-----------------------------------------|
| **KV Cache Latency** | < 120ms (TTFT) | Automatic Rollback |
| **Kernel Throughput** | > 28 tokens/sec | Alert Admin |
| **Auth Verification** | "Grabko" Signature Found | Immediate Kill Pod |
---
## 4. Storage and Weight Loading
The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.
### YAML
```yaml
# fragment of pod spec
volumeMounts:
- name: model-weights
mountPath: /models/jirack-236b
volumes:
- name: model-weights
persistentVolumeClaim:
claimName: jirack-weights-pvc
```
---
## 5. Comparison: 236B vs. 405B+ Deployment
| **Feature** | **JiRack 236B** | **JiRack 405B+** |
|--------------------------|-------------------------|-----------------------------------|
| **GPU Count** | 16 (2 Nodes) | 1,024+ (128+ Nodes) |
| **PP Degree** | 2 | 8 - 16K |
| **K8s Resource** | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster) |
| **CI/CD Target** | Standard Production | Multi-Region Canary |
--- |