File size: 4,630 Bytes

6997b47

# Kubernetes Manifest: JiRack 236B Deployment  

**Framework:** LeaderWorkerSet for Kubernetes  
**Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio)  

---

## 1. JiRack 236B Kubernetes Manifest  

The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.  

### YAML
```yaml

# jirack-236b-frontier.yaml

apiVersion: leaderworkerset.x-k8s.io/v1

kind: LeaderWorkerSet

metadata:

  name: jirack-236b-frontier

spec:

  replicas: 1  # Deploy as one 16-GPU logical unit

  leaderWorkerTemplate:

    size: 2    # Sharded across 2 nodes (8 GPUs each)

    workerTemplate:

      spec:

        containers:

          - name: jirack-engine

            image: cms-manhattan/jirack-236b:latest

            resources:

              limits:

                nvidia.com/gpu: 8

            env:

              - name: MODEL_LAYERS

                value: "108"

              - name: PIPELINE_PARALLEL_SIZE

                value: "2"

              - name: TENSOR_PARALLEL_SIZE

                value: "8"

              - name: MODEL_DIM

                value: "14336"

              - name: GQA_RATIO

                value: "14"

              - name: AUTHOR_SIG

                value: "Konstantin Vladimirovich Grabko"

```

---

## 2. CI/CD Pipeline: Build and Deploy JiRack 236B  

This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**.  

### YAML
```yaml

# .github/workflows/jirack-deploy.yml

name: Build and Deploy JiRack 236B



on:

  push:

    branches: [ main ]



jobs:

  build-and-push:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      

      - name: Login to DockerHub

        uses: docker/login-action@v3

        with:

          username: ${{ secrets.DOCKERHUB_USERNAME }}

          password: ${{ secrets.DOCKERHUB_TOKEN }}



      - name: Build JiRack Engine

        run: |

          docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .

          docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest



      - name: Push Image

        run: docker push cms-manhattan/jirack-236b:latest



  deploy-to-k8s:

    needs: build-and-push

    runs-on: self-hosted # Use a runner with access to your K8s cluster

    steps:

      - name: Set Kubernetes Context

        uses: azure/k8s-set-context@v3

        with:

          kubeconfig: ${{ secrets.KUBE_CONFIG }}



      - name: Deploy Manifest

        run: |

          kubectl apply -f k8s/jirack-236b-frontier.yaml

          kubectl rollout restart leaderworkerset/jirack-236b-frontier

```

---

## 3. The "236B Optimization" Benchmarking  

After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality.  

| **Test Parameter**        | **Target for JiRack 236B** | **Failure Action**                      |
|---------------------------|----------------------------|-----------------------------------------|
| **KV Cache Latency**      | < 120ms (TTFT)             | Automatic Rollback                      |
| **Kernel Throughput**     | > 28 tokens/sec            | Alert Admin                             |
| **Auth Verification**     | "Grabko" Signature Found   | Immediate Kill Pod                      |

---

## 4. Storage and Weight Loading  

The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.  

### YAML
```yaml

# fragment of pod spec

volumeMounts:

  - name: model-weights

    mountPath: /models/jirack-236b

volumes:

  - name: model-weights

    persistentVolumeClaim:

      claimName: jirack-weights-pvc

```

---

## 5. Comparison: 236B vs. 405B+ Deployment  

| **Feature**              | **JiRack 236B**         | **JiRack 405B+**                  |
|--------------------------|-------------------------|-----------------------------------|
| **GPU Count**            | 16 (2 Nodes)            | 1,024+ (128+ Nodes)               |
| **PP Degree**            | 2                       | 8 - 16K                           |
| **K8s Resource**         | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster)    |
| **CI/CD Target**         | Standard Production     | Multi-Region Canary               |

---