Spaces:

visualisable-ai
/

api

Paused

gary-boon Claude Opus 4.5 commited on Dec 14, 2025

Commit

e694533

1 Parent(s): 5f122aa

Update plan: Phase 1 paused due to GB10 GPU support

Document the blocker with DGX Spark's GB10 GPU (sm_121 compute
capability) not being supported by current PyTorch releases.

- Mark Phase 0 and 0.5 as complete
- Mark Phase 1 as paused (not blocked permanently)
- Document what we tried and what we learned
- Add clear restart instructions for when PyTorch 2.9.x adds sm_121
- List infrastructure already in place on Spark

The Spark deployment makes no sense on CPU when Mac Studio and
HuggingFace Spaces are available. Wait for official GPU support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show

docs/devstral-spark-plan-phased.md +120 -3

docs/devstral-spark-plan-phased.md CHANGED Viewed

@@ -1814,11 +1814,128 @@ Before marking each phase complete, verify:
 ## Current Status
-- [ ] **Phase 0**: Secure GPU HF Space + verify basic routing
-- [ ] **Phase 0.5**: Fix critical API route routing (prove GPU routing works)
-- [ ] **Phase 1**: Deploy CodeGen to DGX Spark
 - [ ] **Phase 2**: Add Devstral backend support
 - [ ] **Phase 2b**: Frontend dynamic layer handling
 - [ ] **Phase 2c**: Wire Spark into frontend backend router + Deploy Devstral to GPU HF Space
 - [ ] **Phase 3**: Deploy Devstral to DGX Spark
 - [ ] **Phase 4**: Future enhancements (optional)

 ## Current Status
+- [x] **Phase 0**: Secure GPU HF Space + verify basic routing ✅ COMPLETE
+- [x] **Phase 0.5**: Fix critical API route routing (prove GPU routing works) ✅ COMPLETE
+- [ ] **Phase 1**: Deploy CodeGen to DGX Spark ⏸️ PAUSED (see blocker below)
 - [ ] **Phase 2**: Add Devstral backend support
 - [ ] **Phase 2b**: Frontend dynamic layer handling
 - [ ] **Phase 2c**: Wire Spark into frontend backend router + Deploy Devstral to GPU HF Space
 - [ ] **Phase 3**: Deploy Devstral to DGX Spark
 - [ ] **Phase 4**: Future enhancements (optional)
+---
+## Blocker: DGX Spark GB10 GPU Not Yet Supported by PyTorch
+**Date:** December 2024
+**Status:** ⏸️ Phase 1 paused pending PyTorch update
+### The Issue
+The DGX Spark uses an NVIDIA GB10 GPU (Grace Blackwell architecture) with compute capability **sm_121**. Current PyTorch releases (including NGC containers up to 24.08) do not include pre-built CUDA kernels for sm_121.
+**Error observed:**
+```
+RuntimeError: CUDA error: no kernel image is available for execution on the device
+CUDA kernel errors might be asynchronously reported at some other API call
+```
+**Hardware details:**
+- DGX Spark hostname: `spark-c691.local`
+- GPU: NVIDIA GB10 (sm_121 compute capability)
+- CUDA driver: 13.0
+- Architecture: ARM64 (aarch64)
+### What We Tried
+1. **NGC PyTorch container 24.08-py3** - Does not include sm_121 kernels
+2. **NGC PyTorch container 24.11-py3** - Python 3.12 compatibility issues with dependencies
+3. **Standard PyTorch images** - No ARM64 + CUDA 13.0 support
+4. **CPU fallback** - Works but defeats the purpose of using Spark
+### What We Learned
+From the [PyTorch forums](https://discuss.pytorch.org/t/nvidia-dgx-spark-support/223677/16):
+1. **sm_121 is binary compatible with sm_120** - The warning/error is overly cautious
+2. **A PR exists** to add sm_121 support but missed PyTorch 2.9.0 release
+3. **Workaround exists** - Building PyTorch from source with sm_121 support works, but requires recompiling PyTorch, torchvision, and triton
+### Why We're Pausing (Not Workaround)
+Running CodeGen on CPU on the Spark provides no benefit over:
+- Mac Studio (512GB RAM) for local development
+- HuggingFace Spaces (CPU and GPU options available)
+The Spark deployment only makes sense when we can leverage the GB10 GPU. Building PyTorch from source is complex and fragile for a temporary workaround.
+### What's Ready on Spark
+The following infrastructure is in place and ready to test once GPU support lands:
+- [x] Docker infrastructure: `docker/compose.spark.yml`
+- [x] Dockerfile: `docker/Dockerfile.spark` (using NGC container)
+- [x] Environment template: `.env.spark.example`
+- [x] SSH access configured with key-based auth
+- [x] Git clone at `/srv/visualisable/backend`
+- [x] Model cache directory: `/srv/models-cache/huggingface`
+- [x] Backend code has DEVICE env var override (for CPU fallback if needed)
+- [x] `/health`, `/ready`, `/debug/device` endpoints added
+### Restart Instructions
+When PyTorch officially supports sm_121 (expected in PyTorch 2.9.x patch or 2.10):
+1. **Check for updated NGC container:**
+   ```bash
+   # Look for NGC PyTorch containers with sm_121 support
+   # https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
+   ```
+2. **Update Dockerfile.spark:**
+   ```dockerfile
+   # Update to NGC container version with sm_121 support
+   FROM nvcr.io/nvidia/pytorch:XX.XX-py3
+   ```
+3. **On Spark, pull and rebuild:**
+   ```bash
+   ssh dgxspark@spark-c691.local
+   cd /srv/visualisable/backend
+   git pull
+   # Remove DEVICE=cpu from .env.spark (or comment it out)
+   vim .env.spark
+   # Rebuild with new NGC container
+   docker compose -f docker/compose.spark.yml --env-file .env.spark up -d --build
+   ```
+4. **Verify GPU is working:**
+   ```bash
+   # Should show cuda_available: true, model_device: cuda:0
+   curl -s http://spark-c691.local:8000/debug/device | python -m json.tool
+   # Test inference
+   curl -X POST http://spark-c691.local:8000/analyze/research/attention \
+     -H "Content-Type: application/json" \
+     -d '{"prompt": "def hello():", "max_tokens": 5}'
+   ```
+5. **Continue with Phase 1 validation criteria**
+### Monitoring PyTorch Progress
+- PyTorch GitHub: Watch for sm_121 PRs
+- NGC Container releases: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
+- PyTorch forums: https://discuss.pytorch.org/t/nvidia-dgx-spark-support/223677
+### Pre-Devstral Tag
+Before making these changes, both repos were tagged: `pre-devstral-v1`
+To restore to this state if needed:
+```bash
+git checkout pre-devstral-v1
+```