Lin / docs /sprint-artifacts /tech_spec_job_tracking.md
Zelyanoth's picture
add redis for job queuing
48e5de1

Tech-Spec 1: Fix Background Task and Job Tracking in Gunicorn Environment

Created: 2025-12-22 Status: Completed

Overview

Problem Statement

After migrating from Flask's development server to Gunicorn, the background task system for post generation is not working correctly. While tasks are submitted and processed successfully on the generation server, the job status tracking fails because job states are not accessible across multiple Gunicorn workers. This results in 404 errors when trying to fetch job status and causes the SSE connection to time out, preventing updates from reaching the frontend.

Solution

Implemented a shared storage mechanism for job state management that works across multiple Gunicorn workers, ensuring consistent job tracking and proper SSE communication.

Scope (In/Out)

In Scope:

  • Implement shared job state storage (Redis, database, or other shared storage)
  • Update background task management to use shared storage instead of in-memory storage
  • Fix SSE connection handling in multi-worker environment using Flask's streaming features
  • Update job status polling endpoints to access shared storage
  • Ensure proper request context handling for streaming responses

Out Scope:

  • Changing the Gradio API integration
  • Modifying the frontend UI components
  • Refactoring the core post generation logic

Context for Development

Codebase Patterns

  • Flask REST API backend
  • Background task processing with job tracking
  • Server-sent events for real-time updates
  • Gradio API integration for content generation
  • Use of stream_with_context for streaming responses

Files to Reference

  • app.py - Main Flask application
  • posts.py - Contains the job submission and tracking logic (around line 196 based on logs)
  • content_service.py - Gradio API integration
  • start_gunicorn.py - Gunicorn configuration
  • Frontend JavaScript files handling SSE connections

Technical Decisions

  • Use Redis for shared job state storage (fast, simple, good for temporary job data)
  • Implement proper job lifecycle management (created, processing, completed, failed)
  • Use Flask's stream_with_context for proper SSE handling

Implementation Plan

Tasks

  • Task 1: Set up Redis connection and configure Flask-Session with Redis backend
  • Task 2: Update job creation and tracking to use Redis instead of in-memory storage
  • Task 3: Modify background task handlers to update job state in Redis
  • Task 4: Update job status polling endpoint to fetch from Redis
  • Task 5: Fix SSE connection handling using Flask's streaming capabilities with stream_with_context
  • Task 6: Test the complete flow from job submission to completion

Acceptance Criteria

  • AC 1: Job state is accessible across all Gunicorn workers via Redis
  • AC 2: Job status polling endpoint returns correct job status (not 404)
  • AC 3: SSE connections receive real-time updates about job progress using proper streaming
  • AC 4: Generated posts are properly displayed on the frontend after completion
  • AC 5: No regressions in existing functionality

Additional Context

Dependencies

  • Redis server for shared storage
  • Updated Flask application with Redis integration
  • Flask-Session with Redis backend
  • Gunicorn configuration with appropriate worker settings

Testing Strategy

  • Unit tests for Redis job state management
  • Integration tests for job submission and status tracking
  • End-to-end test of the complete post generation flow
  • Test SSE connections with multiple Gunicorn workers

Notes

  • Consider job cleanup for completed/failed jobs to prevent Redis memory issues
  • Ensure proper error handling when Redis is unavailable
  • May need to adjust Gunicorn worker count to optimize for background tasks
  • Use Flask's stream_with_context to properly handle streaming responses in multi-worker environments