Data-Science-Agent / STORAGE.md
Pulastya B
fix: Handle output_dir parameter mismatch in ydata_profiling
03b24f8

Storage Strategy for Render Deployment

Current Status (Ephemeral Storage)

Render uses ephemeral /tmp storage - files are deleted on:

  • Container restart
  • New deployment
  • Service scaling

Current behavior:

  • Reports generated during analysis are accessible during the session
  • Files disappear after 10-30 minutes or on redeploy
  • Fine for hackathon demos where users view reports immediately

For Production (If Needed)

Option 1: Cloudflare R2 (Recommended)

Best for: Production deployment with persistent storage

# Install R2 SDK
pip install boto3

# Configuration
R2_ENDPOINT = "https://<account-id>.r2.cloudflarestorage.com"
R2_ACCESS_KEY = "<access-key>"
R2_SECRET_KEY = "<secret-key>"
R2_BUCKET = "ds-agent-reports"

Code changes needed:

# In src/storage/artifact_store.py
import boto3

def upload_to_r2(local_path: str, r2_key: str):
    s3 = boto3.client(
        's3',
        endpoint_url=os.getenv('R2_ENDPOINT'),
        aws_access_key_id=os.getenv('R2_ACCESS_KEY'),
        aws_secret_access_key=os.getenv('R2_SECRET_KEY')
    )
    s3.upload_file(local_path, os.getenv('R2_BUCKET'), r2_key)
    # Return public URL
    return f"https://reports.yourdomain.com/{r2_key}"

Cost: ~$0.015/GB storage + $0.36/million Class B operations (very cheap)

Option 2: Render Persistent Disks

Best for: Simple persistent storage without external dependencies

  • Add persistent disk in Render dashboard
  • Mount at /data
  • Change OUTPUT_DIR to /data/outputs
  • Cost: $0.25/GB/month (more expensive than R2)
  • Limitation: Disk size is fixed, can't easily scale

Option 3: Browser-Side Download (Current + Enhancement)

Best for: Hackathon/Demo where users download immediately

// Auto-download reports after generation
const downloadReport = async (reportPath: string) => {
  const response = await fetch(reportPath);
  const blob = await response.blob();
  const url = window.URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = reportPath.split('/').pop() || 'report.html';
  a.click();
};

Pros:

  • No storage costs
  • Works with ephemeral Render storage
  • User has permanent copy

Cons:

  • Large files (reports can be 5-50MB)
  • Can't re-access after browser close

Recommendation for DevSprint Hackathon

Keep current ephemeral storage because:

  1. βœ… No cost or setup complexity
  2. βœ… Reports accessible during demo session
  3. βœ… Judges can view reports immediately after generation
  4. βœ… If needed, add "Download Report" button for permanent copy

After hackathon (if going to production):

  • Use Cloudflare R2 for cost-effective persistent storage
  • Keep reports for 30 days with auto-cleanup
  • Estimated cost: ~$1-5/month for typical usage

Current File Serving

Reports are served via FastAPI endpoint:

# src/api/app.py
@app.get("/outputs/{file_path:path}")
async def serve_output_file(file_path: str):
    file_full_path = Path(f"./outputs/{file_path}")
    return FileResponse(file_full_path, media_type="text/html")

Works perfectly for ephemeral storage during active sessions.