| # Kubernetes Manifest: JiRack 236B Deployment | |
| **Framework:** LeaderWorkerSet for Kubernetes | |
| **Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio) | |
| --- | |
| ## 1. JiRack 236B Kubernetes Manifest | |
| The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2. | |
| ### YAML | |
| ```yaml | |
| # jirack-236b-frontier.yaml | |
| apiVersion: leaderworkerset.x-k8s.io/v1 | |
| kind: LeaderWorkerSet | |
| metadata: | |
| name: jirack-236b-frontier | |
| spec: | |
| replicas: 1 # Deploy as one 16-GPU logical unit | |
| leaderWorkerTemplate: | |
| size: 2 # Sharded across 2 nodes (8 GPUs each) | |
| workerTemplate: | |
| spec: | |
| containers: | |
| - name: jirack-engine | |
| image: cms-manhattan/jirack-236b:latest | |
| resources: | |
| limits: | |
| nvidia.com/gpu: 8 | |
| env: | |
| - name: MODEL_LAYERS | |
| value: "108" | |
| - name: PIPELINE_PARALLEL_SIZE | |
| value: "2" | |
| - name: TENSOR_PARALLEL_SIZE | |
| value: "8" | |
| - name: MODEL_DIM | |
| value: "14336" | |
| - name: GQA_RATIO | |
| value: "14" | |
| - name: AUTHOR_SIG | |
| value: "Konstantin Vladimirovich Grabko" | |
| ``` | |
| --- | |
| ## 2. CI/CD Pipeline: Build and Deploy JiRack 236B | |
| This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**. | |
| ### YAML | |
| ```yaml | |
| # .github/workflows/jirack-deploy.yml | |
| name: Build and Deploy JiRack 236B | |
| on: | |
| push: | |
| branches: [ main ] | |
| jobs: | |
| build-and-push: | |
| runs-on: ubuntu-latest | |
| steps: | |
| - uses: actions/checkout@v4 | |
| - name: Login to DockerHub | |
| uses: docker/login-action@v3 | |
| with: | |
| username: ${{ secrets.DOCKERHUB_USERNAME }} | |
| password: ${{ secrets.DOCKERHUB_TOKEN }} | |
| - name: Build JiRack Engine | |
| run: | | |
| docker build -t cms-manhattan/jirack-236b:${{ github.sha }} . | |
| docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest | |
| - name: Push Image | |
| run: docker push cms-manhattan/jirack-236b:latest | |
| deploy-to-k8s: | |
| needs: build-and-push | |
| runs-on: self-hosted # Use a runner with access to your K8s cluster | |
| steps: | |
| - name: Set Kubernetes Context | |
| uses: azure/k8s-set-context@v3 | |
| with: | |
| kubeconfig: ${{ secrets.KUBE_CONFIG }} | |
| - name: Deploy Manifest | |
| run: | | |
| kubectl apply -f k8s/jirack-236b-frontier.yaml | |
| kubectl rollout restart leaderworkerset/jirack-236b-frontier | |
| ``` | |
| --- | |
| ## 3. The "236B Optimization" Benchmarking | |
| After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality. | |
| | **Test Parameter** | **Target for JiRack 236B** | **Failure Action** | | |
| |---------------------------|----------------------------|-----------------------------------------| | |
| | **KV Cache Latency** | < 120ms (TTFT) | Automatic Rollback | | |
| | **Kernel Throughput** | > 28 tokens/sec | Alert Admin | | |
| | **Auth Verification** | "Grabko" Signature Found | Immediate Kill Pod | | |
| --- | |
| ## 4. Storage and Weight Loading | |
| The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended. | |
| ### YAML | |
| ```yaml | |
| # fragment of pod spec | |
| volumeMounts: | |
| - name: model-weights | |
| mountPath: /models/jirack-236b | |
| volumes: | |
| - name: model-weights | |
| persistentVolumeClaim: | |
| claimName: jirack-weights-pvc | |
| ``` | |
| --- | |
| ## 5. Comparison: 236B vs. 405B+ Deployment | |
| | **Feature** | **JiRack 236B** | **JiRack 405B+** | | |
| |--------------------------|-------------------------|-----------------------------------| | |
| | **GPU Count** | 16 (2 Nodes) | 1,024+ (128+ Nodes) | | |
| | **PP Degree** | 2 | 8 - 16K | | |
| | **K8s Resource** | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster) | | |
| | **CI/CD Target** | Standard Production | Multi-Region Canary | | |
| --- |