# Kubernetes Manifest: JiRack 236B Deployment  

**Framework:** LeaderWorkerSet for Kubernetes  
**Model Scale:** JiRack 236B (108 Layers, 14:1 GQA Ratio)  

---

## 1. JiRack 236B Kubernetes Manifest  

The **JiRack 236B** model uses a **14:1 GQA ratio** and **108 layers**. This manifest shards it across **2 nodes** using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.  

### YAML
```yaml
# jirack-236b-frontier.yaml
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: jirack-236b-frontier
spec:
  replicas: 1  # Deploy as one 16-GPU logical unit
  leaderWorkerTemplate:
    size: 2    # Sharded across 2 nodes (8 GPUs each)
    workerTemplate:
      spec:
        containers:
          - name: jirack-engine
            image: cms-manhattan/jirack-236b:latest
            resources:
              limits:
                nvidia.com/gpu: 8
            env:
              - name: MODEL_LAYERS
                value: "108"
              - name: PIPELINE_PARALLEL_SIZE
                value: "2"
              - name: TENSOR_PARALLEL_SIZE
                value: "8"
              - name: MODEL_DIM
                value: "14336"
              - name: GQA_RATIO
                value: "14"
              - name: AUTHOR_SIG
                value: "Konstantin Vladimirovich Grabko"
```

---

## 2. CI/CD Pipeline: Build and Deploy JiRack 236B  

This **GitHub Actions** workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the **236B Production Cluster**.  

### YAML
```yaml
# .github/workflows/jirack-deploy.yml
name: Build and Deploy JiRack 236B

on:
  push:
    branches: [ main ]

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Login to DockerHub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build JiRack Engine
        run: |
          docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .
          docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest

      - name: Push Image
        run: docker push cms-manhattan/jirack-236b:latest

  deploy-to-k8s:
    needs: build-and-push
    runs-on: self-hosted # Use a runner with access to your K8s cluster
    steps:
      - name: Set Kubernetes Context
        uses: azure/k8s-set-context@v3
        with:
          kubeconfig: ${{ secrets.KUBE_CONFIG }}

      - name: Deploy Manifest
        run: |
          kubectl apply -f k8s/jirack-236b-frontier.yaml
          kubectl rollout restart leaderworkerset/jirack-236b-frontier
```

---

## 3. The "236B Optimization" Benchmarking  

After deployment, the pipeline includes a **Post-Deployment Verification Step** to confirm SWA Fusion performance and functionality.  

| **Test Parameter**        | **Target for JiRack 236B** | **Failure Action**                      |
|---------------------------|----------------------------|-----------------------------------------|
| **KV Cache Latency**      | < 120ms (TTFT)             | Automatic Rollback                      |
| **Kernel Throughput**     | > 28 tokens/sec            | Alert Admin                             |
| **Auth Verification**     | "Grabko" Signature Found   | Immediate Kill Pod                      |

---

## 4. Storage and Weight Loading  

The JiRack 236B model (~470GB in BF16) requires fast storage to load the **108 layers** in under **2 minutes**. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.  

### YAML
```yaml
# fragment of pod spec
volumeMounts:
  - name: model-weights
    mountPath: /models/jirack-236b
volumes:
  - name: model-weights
    persistentVolumeClaim:
      claimName: jirack-weights-pvc
```

---

## 5. Comparison: 236B vs. 405B+ Deployment  

| **Feature**              | **JiRack 236B**         | **JiRack 405B+**                  |
|--------------------------|-------------------------|-----------------------------------|
| **GPU Count**            | 16 (2 Nodes)            | 1,024+ (128+ Nodes)               |
| **PP Degree**            | 2                       | 8 - 16K                           |
| **K8s Resource**         | LeaderWorkerSet (Small) | LeaderWorkerSet (Mega-Cluster)    |
| **CI/CD Target**         | Standard Production     | Multi-Region Canary               |

---