Kubernetes Manifest: JiRack 236B Deployment
Framework: LeaderWorkerSet for Kubernetes
Model Scale: JiRack 236B (108 Layers, 14:1 GQA Ratio)
1. JiRack 236B Kubernetes Manifest
The JiRack 236B model uses a 14:1 GQA ratio and 108 layers. This manifest shards it across 2 nodes using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.
YAML
# jirack-236b-frontier.yaml
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: jirack-236b-frontier
spec:
replicas: 1 # Deploy as one 16-GPU logical unit
leaderWorkerTemplate:
size: 2 # Sharded across 2 nodes (8 GPUs each)
workerTemplate:
spec:
containers:
- name: jirack-engine
image: cms-manhattan/jirack-236b:latest
resources:
limits:
nvidia.com/gpu: 8
env:
- name: MODEL_LAYERS
value: "108"
- name: PIPELINE_PARALLEL_SIZE
value: "2"
- name: TENSOR_PARALLEL_SIZE
value: "8"
- name: MODEL_DIM
value: "14336"
- name: GQA_RATIO
value: "14"
- name: AUTHOR_SIG
value: "Konstantin Vladimirovich Grabko"
2. CI/CD Pipeline: Build and Deploy JiRack 236B
This GitHub Actions workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the 236B Production Cluster.
YAML
# .github/workflows/jirack-deploy.yml
name: Build and Deploy JiRack 236B
on:
push:
branches: [ main ]
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build JiRack Engine
run: |
docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .
docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest
- name: Push Image
run: docker push cms-manhattan/jirack-236b:latest
deploy-to-k8s:
needs: build-and-push
runs-on: self-hosted # Use a runner with access to your K8s cluster
steps:
- name: Set Kubernetes Context
uses: azure/k8s-set-context@v3
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy Manifest
run: |
kubectl apply -f k8s/jirack-236b-frontier.yaml
kubectl rollout restart leaderworkerset/jirack-236b-frontier
3. The "236B Optimization" Benchmarking
After deployment, the pipeline includes a Post-Deployment Verification Step to confirm SWA Fusion performance and functionality.
| Test Parameter | Target for JiRack 236B | Failure Action |
|---|---|---|
| KV Cache Latency | < 120ms (TTFT) | Automatic Rollback |
| Kernel Throughput | > 28 tokens/sec | Alert Admin |
| Auth Verification | "Grabko" Signature Found | Immediate Kill Pod |
4. Storage and Weight Loading
The JiRack 236B model (~470GB in BF16) requires fast storage to load the 108 layers in under 2 minutes. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.
YAML
# fragment of pod spec
volumeMounts:
- name: model-weights
mountPath: /models/jirack-236b
volumes:
- name: model-weights
persistentVolumeClaim:
claimName: jirack-weights-pvc
5. Comparison: 236B vs. 405B+ Deployment
| Feature | JiRack 236B | JiRack 405B+ |
|---|---|---|
| GPU Count | 16 (2 Nodes) | 1,024+ (128+ Nodes) |
| PP Degree | 2 | 8 - 16K |
| K8s Resource | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster) |
| CI/CD Target | Standard Production | Multi-Region Canary |