# Kubernetes Manifest: JiRack 236B Deployment **Framework:** LeaderWorkerSet for Kubernetes **Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio) --- ## 1. JiRack 236B Kubernetes Manifest The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2. ### YAML ```yaml # jirack-236b-frontier.yaml apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: jirack-236b-frontier spec: replicas: 1 # Deploy as one 16-GPU logical unit leaderWorkerTemplate: size: 2 # Sharded across 2 nodes (8 GPUs each) workerTemplate: spec: containers: - name: jirack-engine image: cms-manhattan/jirack-236b:latest resources: limits: nvidia.com/gpu: 8 env: - name: MODEL_LAYERS value: "108" - name: PIPELINE_PARALLEL_SIZE value: "2" - name: TENSOR_PARALLEL_SIZE value: "8" - name: MODEL_DIM value: "14336" - name: GQA_RATIO value: "14" - name: AUTHOR_SIG value: "Konstantin Vladimirovich Grabko" ``` --- ## 2. CI/CD Pipeline: Build and Deploy JiRack 236B This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**. ### YAML ```yaml # .github/workflows/jirack-deploy.yml name: Build and Deploy JiRack 236B on: push: branches: [ main ] jobs: build-and-push: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - name: Build JiRack Engine run: | docker build -t cms-manhattan/jirack-236b:${{ github.sha }} . docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest - name: Push Image run: docker push cms-manhattan/jirack-236b:latest deploy-to-k8s: needs: build-and-push runs-on: self-hosted # Use a runner with access to your K8s cluster steps: - name: Set Kubernetes Context uses: azure/k8s-set-context@v3 with: kubeconfig: ${{ secrets.KUBE_CONFIG }} - name: Deploy Manifest run: | kubectl apply -f k8s/jirack-236b-frontier.yaml kubectl rollout restart leaderworkerset/jirack-236b-frontier ``` --- ## 3. The "236B Optimization" Benchmarking After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality. | **Test Parameter** | **Target for JiRack 236B** | **Failure Action** | |---------------------------|----------------------------|-----------------------------------------| | **KV Cache Latency** | < 120ms (TTFT) | Automatic Rollback | | **Kernel Throughput** | > 28 tokens/sec | Alert Admin | | **Auth Verification** | "Grabko" Signature Found | Immediate Kill Pod | --- ## 4. Storage and Weight Loading The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended. ### YAML ```yaml # fragment of pod spec volumeMounts: - name: model-weights mountPath: /models/jirack-236b volumes: - name: model-weights persistentVolumeClaim: claimName: jirack-weights-pvc ``` --- ## 5. Comparison: 236B vs. 405B+ Deployment | **Feature** | **JiRack 236B** | **JiRack 405B+** | |--------------------------|-------------------------|-----------------------------------| | **GPU Count** | 16 (2 Nodes) | 1,024+ (128+ Nodes) | | **PP Degree** | 2 | 8 - 16K | | **K8s Resource** | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster) | | **CI/CD Target** | Standard Production | Multi-Region Canary | ---