gary-boon Claude Opus 4.5 commited on
Commit
a2bd186
·
1 Parent(s): ab4534a

Phase 1: DGX Spark infrastructure

Browse files

- Add .env.spark.example template for Spark configuration
- Add docker/compose.spark.yml with GPU support and model cache mount
- Add runs/.gitkeep for runtime outputs
- Update .gitignore for .env.spark and runs/*
- Add /ready endpoint (503 until model loaded, then 200)
- Add /debug/device endpoint for GPU verification (no secrets)

Paths follow DGX Spark deployment spec:
- /srv/projects/visualisable-ai-backend
- /srv/models-cache/huggingface (mounted)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

.env.spark.example ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DGX Spark Environment Configuration
2
+ # Copy this to .env.spark and fill in values
3
+
4
+ # Service Configuration
5
+ PORT=8000
6
+
7
+ # Model Configuration (Phase 1: CodeGen, Phase 3: Devstral)
8
+ DEFAULT_MODEL=codegen-350m
9
+ # DEFAULT_MODEL=devstral-small # Uncomment for Phase 3
10
+
11
+ # API Security
12
+ API_KEY=<your-api-key>
13
+
14
+ # HuggingFace (required for gated models like Devstral)
15
+ HF_TOKEN=<your-hf-token>
16
+
17
+ # Model Settings
18
+ MAX_CONTEXT=8192
19
+ BATCH_SIZE=1
20
+ TORCH_DTYPE=fp16
21
+ # TORCH_DTYPE=bf16 # Use bf16 for Devstral (Phase 3)
.gitignore CHANGED
@@ -38,6 +38,11 @@ env/
38
  # Environment variables
39
  .env
40
  .env.local
 
 
 
 
 
41
 
42
  # Testing
43
  .coverage
 
38
  # Environment variables
39
  .env
40
  .env.local
41
+ .env.spark
42
+
43
+ # Spark runtime outputs
44
+ runs/*
45
+ !runs/.gitkeep
46
 
47
  # Testing
48
  .coverage
backend/model_service.py CHANGED
@@ -866,7 +866,7 @@ async def root():
866
 
867
  @app.get("/health")
868
  async def health():
869
- """Detailed health check"""
870
  return {
871
  "status": "healthy" if manager.model else "initializing",
872
  "model_loaded": manager.model is not None,
@@ -875,6 +875,44 @@ async def health():
875
  "timestamp": datetime.now().isoformat()
876
  }
877
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
878
  @app.get("/model/info")
879
  async def model_info(authenticated: bool = Depends(verify_api_key)):
880
  """Get detailed information about the loaded model"""
 
866
 
867
  @app.get("/health")
868
  async def health():
869
+ """Detailed health check - always returns 200 for Docker healthcheck"""
870
  return {
871
  "status": "healthy" if manager.model else "initializing",
872
  "model_loaded": manager.model is not None,
 
875
  "timestamp": datetime.now().isoformat()
876
  }
877
 
878
+ @app.get("/ready")
879
+ async def ready():
880
+ """Readiness check - returns 503 until model is loaded, then 200.
881
+
882
+ Use this for Kubernetes readiness probes or to wait for model availability.
883
+ Unlike /health, this returns an error status when not ready.
884
+ """
885
+ if manager.model is None:
886
+ raise HTTPException(
887
+ status_code=503,
888
+ detail="Model not loaded yet - service is initializing"
889
+ )
890
+ return {
891
+ "status": "ready",
892
+ "model_loaded": True,
893
+ "device": str(manager.device) if manager.device else "not set",
894
+ "timestamp": datetime.now().isoformat()
895
+ }
896
+
897
+ @app.get("/debug/device")
898
+ async def debug_device():
899
+ """Debug endpoint for GPU/device verification.
900
+
901
+ Returns device info without exposing secrets or environment variables.
902
+ Use this to verify the model is running on GPU.
903
+ """
904
+ import torch
905
+
906
+ return {
907
+ "cuda_available": torch.cuda.is_available(),
908
+ "cuda_device_count": torch.cuda.device_count() if torch.cuda.is_available() else 0,
909
+ "cuda_device_name": torch.cuda.get_device_name(0) if torch.cuda.is_available() and torch.cuda.device_count() > 0 else None,
910
+ "model_device": str(manager.device) if manager.device else "not set",
911
+ "model_loaded": manager.model is not None,
912
+ "model_dtype": str(manager.model.dtype) if manager.model and hasattr(manager.model, 'dtype') else None,
913
+ "timestamp": datetime.now().isoformat()
914
+ }
915
+
916
  @app.get("/model/info")
917
  async def model_info(authenticated: bool = Depends(verify_api_key)):
918
  """Get detailed information about the loaded model"""
docker/compose.spark.yml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Docker Compose for DGX Spark deployment
2
+ #
3
+ # Usage:
4
+ # docker compose -f docker/compose.spark.yml --env-file .env.spark up -d --build
5
+ #
6
+ # Multi-instance (different branches):
7
+ # PORT=8001 docker compose -p visai-branch-a -f docker/compose.spark.yml --env-file .env.spark up -d --build
8
+
9
+ services:
10
+ visualisable-ai-backend:
11
+ build:
12
+ context: ..
13
+ dockerfile: Dockerfile
14
+ ports:
15
+ - "${PORT:-8000}:${PORT:-8000}"
16
+ environment:
17
+ - PORT=${PORT:-8000}
18
+ - DEFAULT_MODEL=${DEFAULT_MODEL:-codegen-350m}
19
+ - TORCH_DTYPE=${TORCH_DTYPE:-fp16}
20
+ - MAX_CONTEXT=${MAX_CONTEXT:-8192}
21
+ - BATCH_SIZE=${BATCH_SIZE:-1}
22
+ - API_KEY=${API_KEY}
23
+ - HF_TOKEN=${HF_TOKEN}
24
+ # HuggingFace cache locations (inside container)
25
+ - TRANSFORMERS_CACHE=/models-cache
26
+ - HF_HOME=/models-cache
27
+ - HUGGINGFACE_HUB_CACHE=/models-cache
28
+ volumes:
29
+ # Persistent model cache (shared across instances)
30
+ - /srv/models-cache/huggingface:/models-cache
31
+ # Runtime outputs
32
+ - ./runs:/app/runs
33
+ deploy:
34
+ resources:
35
+ reservations:
36
+ devices:
37
+ - driver: nvidia
38
+ count: all
39
+ capabilities: [gpu]
40
+ healthcheck:
41
+ test: ["CMD", "curl", "-f", "http://localhost:${PORT:-8000}/health"]
42
+ interval: 30s
43
+ timeout: 3s
44
+ start_period: 10s
45
+ retries: 3
46
+ restart: unless-stopped
47
+ # Override entrypoint to use model_service on configurable port
48
+ command: >
49
+ uvicorn backend.model_service:app
50
+ --host 0.0.0.0
51
+ --port ${PORT:-8000}
runs/.gitkeep ADDED
File without changes