Spaces:

visualisable-ai
/

api

Paused

gary-boon Claude Opus 4.5 commited on Dec 14, 2025

Commit

a2bd186

1 Parent(s): ab4534a

Phase 1: DGX Spark infrastructure

- Add .env.spark.example template for Spark configuration
- Add docker/compose.spark.yml with GPU support and model cache mount
- Add runs/.gitkeep for runtime outputs
- Update .gitignore for .env.spark and runs/*
- Add /ready endpoint (503 until model loaded, then 200)
- Add /debug/device endpoint for GPU verification (no secrets)

Paths follow DGX Spark deployment spec:
- /srv/projects/visualisable-ai-backend
- /srv/models-cache/huggingface (mounted)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (5) hide show

.env.spark.example +21 -0
.gitignore +5 -0
backend/model_service.py +39 -1
docker/compose.spark.yml +51 -0
runs/.gitkeep +0 -0

.env.spark.example ADDED Viewed

	@@ -0,0 +1,21 @@

+# DGX Spark Environment Configuration
+# Copy this to .env.spark and fill in values
+# Service Configuration
+PORT=8000
+# Model Configuration (Phase 1: CodeGen, Phase 3: Devstral)
+DEFAULT_MODEL=codegen-350m
+# DEFAULT_MODEL=devstral-small    # Uncomment for Phase 3
+# API Security
+API_KEY=<your-api-key>
+# HuggingFace (required for gated models like Devstral)
+HF_TOKEN=<your-hf-token>
+# Model Settings
+MAX_CONTEXT=8192
+BATCH_SIZE=1
+TORCH_DTYPE=fp16
+# TORCH_DTYPE=bf16               # Use bf16 for Devstral (Phase 3)

.gitignore CHANGED Viewed

@@ -38,6 +38,11 @@ env/
 # Environment variables
 .env
 .env.local
 # Testing
 .coverage

 # Environment variables
 .env
 .env.local
+.env.spark
+# Spark runtime outputs
+runs/*
+!runs/.gitkeep
 # Testing
 .coverage

backend/model_service.py CHANGED Viewed

@@ -866,7 +866,7 @@ async def root():
 @app.get("/health")
 async def health():
-    """Detailed health check"""
     return {
         "status": "healthy" if manager.model else "initializing",
         "model_loaded": manager.model is not None,
@@ -875,6 +875,44 @@ async def health():
         "timestamp": datetime.now().isoformat()
     }
 @app.get("/model/info")
 async def model_info(authenticated: bool = Depends(verify_api_key)):
     """Get detailed information about the loaded model"""

 @app.get("/health")
 async def health():
+    """Detailed health check - always returns 200 for Docker healthcheck"""
     return {
         "status": "healthy" if manager.model else "initializing",
         "model_loaded": manager.model is not None,
         "timestamp": datetime.now().isoformat()
     }
+@app.get("/ready")
+async def ready():
+    """Readiness check - returns 503 until model is loaded, then 200.
+    Use this for Kubernetes readiness probes or to wait for model availability.
+    Unlike /health, this returns an error status when not ready.
+    """
+    if manager.model is None:
+        raise HTTPException(
+            status_code=503,
+            detail="Model not loaded yet - service is initializing"
+        )
+    return {
+        "status": "ready",
+        "model_loaded": True,
+        "device": str(manager.device) if manager.device else "not set",
+        "timestamp": datetime.now().isoformat()
+    }
+@app.get("/debug/device")
+async def debug_device():
+    """Debug endpoint for GPU/device verification.
+    Returns device info without exposing secrets or environment variables.
+    Use this to verify the model is running on GPU.
+    """
+    import torch
+    return {
+        "cuda_available": torch.cuda.is_available(),
+        "cuda_device_count": torch.cuda.device_count() if torch.cuda.is_available() else 0,
+        "cuda_device_name": torch.cuda.get_device_name(0) if torch.cuda.is_available() and torch.cuda.device_count() > 0 else None,
+        "model_device": str(manager.device) if manager.device else "not set",
+        "model_loaded": manager.model is not None,
+        "model_dtype": str(manager.model.dtype) if manager.model and hasattr(manager.model, 'dtype') else None,
+        "timestamp": datetime.now().isoformat()
+    }
 @app.get("/model/info")
 async def model_info(authenticated: bool = Depends(verify_api_key)):
     """Get detailed information about the loaded model"""

docker/compose.spark.yml ADDED Viewed

	@@ -0,0 +1,51 @@

+# Docker Compose for DGX Spark deployment
+#
+# Usage:
+#   docker compose -f docker/compose.spark.yml --env-file .env.spark up -d --build
+#
+# Multi-instance (different branches):
+#   PORT=8001 docker compose -p visai-branch-a -f docker/compose.spark.yml --env-file .env.spark up -d --build
+services:
+  visualisable-ai-backend:
+    build:
+      context: ..
+      dockerfile: Dockerfile
+    ports:
+      - "${PORT:-8000}:${PORT:-8000}"
+    environment:
+      - PORT=${PORT:-8000}
+      - DEFAULT_MODEL=${DEFAULT_MODEL:-codegen-350m}
+      - TORCH_DTYPE=${TORCH_DTYPE:-fp16}
+      - MAX_CONTEXT=${MAX_CONTEXT:-8192}
+      - BATCH_SIZE=${BATCH_SIZE:-1}
+      - API_KEY=${API_KEY}
+      - HF_TOKEN=${HF_TOKEN}
+      # HuggingFace cache locations (inside container)
+      - TRANSFORMERS_CACHE=/models-cache
+      - HF_HOME=/models-cache
+      - HUGGINGFACE_HUB_CACHE=/models-cache
+    volumes:
+      # Persistent model cache (shared across instances)
+      - /srv/models-cache/huggingface:/models-cache
+      # Runtime outputs
+      - ./runs:/app/runs
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:${PORT:-8000}/health"]
+      interval: 30s
+      timeout: 3s
+      start_period: 10s
+      retries: 3
+    restart: unless-stopped
+    # Override entrypoint to use model_service on configurable port
+    command: >
+      uvicorn backend.model_service:app
+      --host 0.0.0.0
+      --port ${PORT:-8000}

runs/.gitkeep ADDED Viewed

File without changes