gary-boon Claude Opus 4.5 commited on
Commit
e694533
·
1 Parent(s): 5f122aa

Update plan: Phase 1 paused due to GB10 GPU support

Browse files

Document the blocker with DGX Spark's GB10 GPU (sm_121 compute
capability) not being supported by current PyTorch releases.

- Mark Phase 0 and 0.5 as complete
- Mark Phase 1 as paused (not blocked permanently)
- Document what we tried and what we learned
- Add clear restart instructions for when PyTorch 2.9.x adds sm_121
- List infrastructure already in place on Spark

The Spark deployment makes no sense on CPU when Mac Studio and
HuggingFace Spaces are available. Wait for official GPU support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show
  1. docs/devstral-spark-plan-phased.md +120 -3
docs/devstral-spark-plan-phased.md CHANGED
@@ -1814,11 +1814,128 @@ Before marking each phase complete, verify:
1814
 
1815
  ## Current Status
1816
 
1817
- - [ ] **Phase 0**: Secure GPU HF Space + verify basic routing
1818
- - [ ] **Phase 0.5**: Fix critical API route routing (prove GPU routing works)
1819
- - [ ] **Phase 1**: Deploy CodeGen to DGX Spark
1820
  - [ ] **Phase 2**: Add Devstral backend support
1821
  - [ ] **Phase 2b**: Frontend dynamic layer handling
1822
  - [ ] **Phase 2c**: Wire Spark into frontend backend router + Deploy Devstral to GPU HF Space
1823
  - [ ] **Phase 3**: Deploy Devstral to DGX Spark
1824
  - [ ] **Phase 4**: Future enhancements (optional)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1814
 
1815
  ## Current Status
1816
 
1817
+ - [x] **Phase 0**: Secure GPU HF Space + verify basic routing ✅ COMPLETE
1818
+ - [x] **Phase 0.5**: Fix critical API route routing (prove GPU routing works) ✅ COMPLETE
1819
+ - [ ] **Phase 1**: Deploy CodeGen to DGX Spark ⏸️ PAUSED (see blocker below)
1820
  - [ ] **Phase 2**: Add Devstral backend support
1821
  - [ ] **Phase 2b**: Frontend dynamic layer handling
1822
  - [ ] **Phase 2c**: Wire Spark into frontend backend router + Deploy Devstral to GPU HF Space
1823
  - [ ] **Phase 3**: Deploy Devstral to DGX Spark
1824
  - [ ] **Phase 4**: Future enhancements (optional)
1825
+
1826
+ ---
1827
+
1828
+ ## Blocker: DGX Spark GB10 GPU Not Yet Supported by PyTorch
1829
+
1830
+ **Date:** December 2024
1831
+
1832
+ **Status:** ⏸️ Phase 1 paused pending PyTorch update
1833
+
1834
+ ### The Issue
1835
+
1836
+ The DGX Spark uses an NVIDIA GB10 GPU (Grace Blackwell architecture) with compute capability **sm_121**. Current PyTorch releases (including NGC containers up to 24.08) do not include pre-built CUDA kernels for sm_121.
1837
+
1838
+ **Error observed:**
1839
+ ```
1840
+ RuntimeError: CUDA error: no kernel image is available for execution on the device
1841
+ CUDA kernel errors might be asynchronously reported at some other API call
1842
+ ```
1843
+
1844
+ **Hardware details:**
1845
+ - DGX Spark hostname: `spark-c691.local`
1846
+ - GPU: NVIDIA GB10 (sm_121 compute capability)
1847
+ - CUDA driver: 13.0
1848
+ - Architecture: ARM64 (aarch64)
1849
+
1850
+ ### What We Tried
1851
+
1852
+ 1. **NGC PyTorch container 24.08-py3** - Does not include sm_121 kernels
1853
+ 2. **NGC PyTorch container 24.11-py3** - Python 3.12 compatibility issues with dependencies
1854
+ 3. **Standard PyTorch images** - No ARM64 + CUDA 13.0 support
1855
+ 4. **CPU fallback** - Works but defeats the purpose of using Spark
1856
+
1857
+ ### What We Learned
1858
+
1859
+ From the [PyTorch forums](https://discuss.pytorch.org/t/nvidia-dgx-spark-support/223677/16):
1860
+
1861
+ 1. **sm_121 is binary compatible with sm_120** - The warning/error is overly cautious
1862
+ 2. **A PR exists** to add sm_121 support but missed PyTorch 2.9.0 release
1863
+ 3. **Workaround exists** - Building PyTorch from source with sm_121 support works, but requires recompiling PyTorch, torchvision, and triton
1864
+
1865
+ ### Why We're Pausing (Not Workaround)
1866
+
1867
+ Running CodeGen on CPU on the Spark provides no benefit over:
1868
+ - Mac Studio (512GB RAM) for local development
1869
+ - HuggingFace Spaces (CPU and GPU options available)
1870
+
1871
+ The Spark deployment only makes sense when we can leverage the GB10 GPU. Building PyTorch from source is complex and fragile for a temporary workaround.
1872
+
1873
+ ### What's Ready on Spark
1874
+
1875
+ The following infrastructure is in place and ready to test once GPU support lands:
1876
+
1877
+ - [x] Docker infrastructure: `docker/compose.spark.yml`
1878
+ - [x] Dockerfile: `docker/Dockerfile.spark` (using NGC container)
1879
+ - [x] Environment template: `.env.spark.example`
1880
+ - [x] SSH access configured with key-based auth
1881
+ - [x] Git clone at `/srv/visualisable/backend`
1882
+ - [x] Model cache directory: `/srv/models-cache/huggingface`
1883
+ - [x] Backend code has DEVICE env var override (for CPU fallback if needed)
1884
+ - [x] `/health`, `/ready`, `/debug/device` endpoints added
1885
+
1886
+ ### Restart Instructions
1887
+
1888
+ When PyTorch officially supports sm_121 (expected in PyTorch 2.9.x patch or 2.10):
1889
+
1890
+ 1. **Check for updated NGC container:**
1891
+ ```bash
1892
+ # Look for NGC PyTorch containers with sm_121 support
1893
+ # https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
1894
+ ```
1895
+
1896
+ 2. **Update Dockerfile.spark:**
1897
+ ```dockerfile
1898
+ # Update to NGC container version with sm_121 support
1899
+ FROM nvcr.io/nvidia/pytorch:XX.XX-py3
1900
+ ```
1901
+
1902
+ 3. **On Spark, pull and rebuild:**
1903
+ ```bash
1904
+ ssh dgxspark@spark-c691.local
1905
+ cd /srv/visualisable/backend
1906
+ git pull
1907
+
1908
+ # Remove DEVICE=cpu from .env.spark (or comment it out)
1909
+ vim .env.spark
1910
+
1911
+ # Rebuild with new NGC container
1912
+ docker compose -f docker/compose.spark.yml --env-file .env.spark up -d --build
1913
+ ```
1914
+
1915
+ 4. **Verify GPU is working:**
1916
+ ```bash
1917
+ # Should show cuda_available: true, model_device: cuda:0
1918
+ curl -s http://spark-c691.local:8000/debug/device | python -m json.tool
1919
+
1920
+ # Test inference
1921
+ curl -X POST http://spark-c691.local:8000/analyze/research/attention \
1922
+ -H "Content-Type: application/json" \
1923
+ -d '{"prompt": "def hello():", "max_tokens": 5}'
1924
+ ```
1925
+
1926
+ 5. **Continue with Phase 1 validation criteria**
1927
+
1928
+ ### Monitoring PyTorch Progress
1929
+
1930
+ - PyTorch GitHub: Watch for sm_121 PRs
1931
+ - NGC Container releases: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
1932
+ - PyTorch forums: https://discuss.pytorch.org/t/nvidia-dgx-spark-support/223677
1933
+
1934
+ ### Pre-Devstral Tag
1935
+
1936
+ Before making these changes, both repos were tagged: `pre-devstral-v1`
1937
+
1938
+ To restore to this state if needed:
1939
+ ```bash
1940
+ git checkout pre-devstral-v1
1941
+ ```