File size: 4,630 Bytes
6997b47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# Kubernetes Manifest: JiRack 236B Deployment  

**Framework:** LeaderWorkerSet for Kubernetes  
**Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio)  

---

## 1. JiRack 236B Kubernetes Manifest  

The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.  

### YAML
```yaml

# jirack-236b-frontier.yaml

apiVersion: leaderworkerset.x-k8s.io/v1

kind: LeaderWorkerSet

metadata:

  name: jirack-236b-frontier

spec:

  replicas: 1  # Deploy as one 16-GPU logical unit

  leaderWorkerTemplate:

    size: 2    # Sharded across 2 nodes (8 GPUs each)

    workerTemplate:

      spec:

        containers:

          - name: jirack-engine

            image: cms-manhattan/jirack-236b:latest

            resources:

              limits:

                nvidia.com/gpu: 8

            env:

              - name: MODEL_LAYERS

                value: "108"

              - name: PIPELINE_PARALLEL_SIZE

                value: "2"

              - name: TENSOR_PARALLEL_SIZE

                value: "8"

              - name: MODEL_DIM

                value: "14336"

              - name: GQA_RATIO

                value: "14"

              - name: AUTHOR_SIG

                value: "Konstantin Vladimirovich Grabko"

```

---

## 2. CI/CD Pipeline: Build and Deploy JiRack 236B  

This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**.  

### YAML
```yaml

# .github/workflows/jirack-deploy.yml

name: Build and Deploy JiRack 236B



on:

  push:

    branches: [ main ]



jobs:

  build-and-push:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      

      - name: Login to DockerHub

        uses: docker/login-action@v3

        with:

          username: ${{ secrets.DOCKERHUB_USERNAME }}

          password: ${{ secrets.DOCKERHUB_TOKEN }}



      - name: Build JiRack Engine

        run: |

          docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .

          docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest



      - name: Push Image

        run: docker push cms-manhattan/jirack-236b:latest



  deploy-to-k8s:

    needs: build-and-push

    runs-on: self-hosted # Use a runner with access to your K8s cluster

    steps:

      - name: Set Kubernetes Context

        uses: azure/k8s-set-context@v3

        with:

          kubeconfig: ${{ secrets.KUBE_CONFIG }}



      - name: Deploy Manifest

        run: |

          kubectl apply -f k8s/jirack-236b-frontier.yaml

          kubectl rollout restart leaderworkerset/jirack-236b-frontier

```

---

## 3. The "236B Optimization" Benchmarking  

After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality.  

| **Test Parameter**        | **Target for JiRack 236B** | **Failure Action**                      |
|---------------------------|----------------------------|-----------------------------------------|
| **KV Cache Latency**      | < 120ms (TTFT)             | Automatic Rollback                      |
| **Kernel Throughput**     | > 28 tokens/sec            | Alert Admin                             |
| **Auth Verification**     | "Grabko" Signature Found   | Immediate Kill Pod                      |

---

## 4. Storage and Weight Loading  

The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.  

### YAML
```yaml

# fragment of pod spec

volumeMounts:

  - name: model-weights

    mountPath: /models/jirack-236b

volumes:

  - name: model-weights

    persistentVolumeClaim:

      claimName: jirack-weights-pvc

```

---

## 5. Comparison: 236B vs. 405B+ Deployment  

| **Feature**              | **JiRack 236B**         | **JiRack 405B+**                  |
|--------------------------|-------------------------|-----------------------------------|
| **GPU Count**            | 16 (2 Nodes)            | 1,024+ (128+ Nodes)               |
| **PP Degree**            | 2                       | 8 - 16K                           |
| **K8s Resource**         | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster)    |
| **CI/CD Target**         | Standard Production     | Multi-Region Canary               |

---