JiRack_GPT5_236b / ClusterOrchestrationScript.md

Upload ClusterOrchestrationScript.md

6997b47 verified about 1 month ago

4.63 kB

	# Kubernetes Manifest: JiRack 236B Deployment

	Framework: LeaderWorkerSet for Kubernetes
	Model Scale: JiRack 236B (108 Layers, 14:1 GQA Ratio)

	---

	## 1. JiRack 236B Kubernetes Manifest

	The JiRack 236B model uses a 14:1 GQA ratio and 108 layers. This manifest shards it across 2 nodes using Tensor Parallelism (TP) of 8 and Pipeline Parallelism (PP) of 2.

	### YAML
	```yaml
	# jirack-236b-frontier.yaml
	apiVersion: leaderworkerset.x-k8s.io/v1
	kind: LeaderWorkerSet
	metadata:
	name: jirack-236b-frontier
	spec:
	replicas: 1 # Deploy as one 16-GPU logical unit
	leaderWorkerTemplate:
	size: 2 # Sharded across 2 nodes (8 GPUs each)
	workerTemplate:
	spec:
	containers:
	- name: jirack-engine
	image: cms-manhattan/jirack-236b:latest
	resources:
	limits:
	nvidia.com/gpu: 8
	env:
	- name: MODEL_LAYERS
	value: "108"
	- name: PIPELINE_PARALLEL_SIZE
	value: "2"
	- name: TENSOR_PARALLEL_SIZE
	value: "8"
	- name: MODEL_DIM
	value: "14336"
	- name: GQA_RATIO
	value: "14"
	- name: AUTHOR_SIG
	value: "Konstantin Vladimirovich Grabko"
	```

	---

	## 2. CI/CD Pipeline: Build and Deploy JiRack 236B

	This GitHub Actions workflow automates the Build-Verify-Deploy cycle. The pipeline ensures that any update (e.g., to SWA fusion kernels) is tested and pushed to the 236B Production Cluster.

	### YAML
	```yaml
	# .github/workflows/jirack-deploy.yml
	name: Build and Deploy JiRack 236B

	on:
	push:
	branches: [ main ]

	jobs:
	build-and-push:
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v4

	- name: Login to DockerHub
	uses: docker/login-action@v3
	with:
	username: ${{ secrets.DOCKERHUB_USERNAME }}
	password: ${{ secrets.DOCKERHUB_TOKEN }}

	- name: Build JiRack Engine
	run: \|
	docker build -t cms-manhattan/jirack-236b:${{ github.sha }} .
	docker tag cms-manhattan/jirack-236b:${{ github.sha }} cms-manhattan/jirack-236b:latest

	- name: Push Image
	run: docker push cms-manhattan/jirack-236b:latest

	deploy-to-k8s:
	needs: build-and-push
	runs-on: self-hosted # Use a runner with access to your K8s cluster
	steps:
	- name: Set Kubernetes Context
	uses: azure/k8s-set-context@v3
	with:
	kubeconfig: ${{ secrets.KUBE_CONFIG }}

	- name: Deploy Manifest
	run: \|
	kubectl apply -f k8s/jirack-236b-frontier.yaml
	kubectl rollout restart leaderworkerset/jirack-236b-frontier
	```

	---

	## 3. The "236B Optimization" Benchmarking

	After deployment, the pipeline includes a Post-Deployment Verification Step to confirm SWA Fusion performance and functionality.

	\| Test Parameter \| Target for JiRack 236B \| Failure Action \|
	\|---------------------------\|----------------------------\|-----------------------------------------\|
	\| KV Cache Latency \| < 120ms (TTFT) \| Automatic Rollback \|
	\| Kernel Throughput \| > 28 tokens/sec \| Alert Admin \|
	\| Auth Verification \| "Grabko" Signature Found \| Immediate Kill Pod \|

	---

	## 4. Storage and Weight Loading

	The JiRack 236B model (~470GB in BF16) requires fast storage to load the 108 layers in under 2 minutes. Persistent Volume Claims (PVC) backed by NVMe storage are recommended.

	### YAML
	```yaml
	# fragment of pod spec
	volumeMounts:
	- name: model-weights
	mountPath: /models/jirack-236b
	volumes:
	- name: model-weights
	persistentVolumeClaim:
	claimName: jirack-weights-pvc
	```

	---

	## 5. Comparison: 236B vs. 405B+ Deployment

	\| Feature \| JiRack 236B \| JiRack 405B+ \|
	\|--------------------------\|-------------------------\|-----------------------------------\|
	\| GPU Count \| 16 (2 Nodes) \| 1,024+ (128+ Nodes) \|
	\| PP Degree \| 2 \| 8 - 16K \|
	\| K8s Resource \| LeaderWorkerSet (Small) \| LeaderWorkerSet (Mega-Cluster) \|
	\| CI/CD Target \| Standard Production \| Multi-Region Canary \|

	---