handbook_engine / README.md
internationalscholarsprogram's picture
fix: ISP handbook styling overhaul - margins, typography, emphasis, benefits, CSS cascade
ec94fc1
metadata
title: ISP Handbook Engine
emoji: πŸ“˜
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

ISP Handbook Service β€” Python Migration

A Python/FastAPI service that generates the ISP (International Scholars Program) Handbook as PDF or HTML. This is a drop-in replacement for the PHP handbook generation pipeline, designed to be called over HTTP from the existing PHP application.

Architecture

python_service/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # FastAPI entry point
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── routes.py        # REST endpoints
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py        # Environment-based settings
β”‚   β”‚   β”œβ”€β”€ database.py      # SQLAlchemy engine (MySQL)
β”‚   β”‚   β”œβ”€β”€ fonts.py         # Century Gothic font management
β”‚   β”‚   └── logging.py       # Logging setup
β”‚   β”œβ”€β”€ models/              # SQLAlchemy models (if needed)
β”‚   β”œβ”€β”€ repositories/
β”‚   β”‚   └── handbook_repo.py # Direct DB access (fallback)
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── handbook.py      # Pydantic request/response models
β”‚   └── services/
β”‚       β”œβ”€β”€ data_fetcher.py   # Fetch data from external JSON APIs
β”‚       β”œβ”€β”€ html_builder.py   # Build full handbook HTML
β”‚       β”œβ”€β”€ pdf_service.py    # HTML -> PDF via WeasyPrint
β”‚       β”œβ”€β”€ renderers.py      # TOC, sections, university renderers
β”‚       └── utils.py          # Shared helpers (h, money format, etc.)
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_api.py
β”‚   └── test_renderers.py
β”œβ”€β”€ fonts/                    # Century Gothic TTF files
β”œβ”€β”€ images/                   # Handbook images (cover, header, etc.)
β”œβ”€β”€ css/                      # Base stylesheet
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md

API Endpoints

Method Path Description
GET /health Health check
GET /diagnostics/fonts Font file diagnostics
GET /api/v1/sections/global?catalog_id=0 Fetch normalised global sections
GET /api/v1/sections/universities Fetch normalised university sections
GET /api/v1/handbook/pdf?catalog_id=0 Generate PDF (download)
POST /api/v1/handbook/pdf Generate PDF with JSON body
GET /api/v1/handbook/html?catalog_id=0 Generate HTML preview
POST /api/v1/handbook/render Generate PDF or HTML based on output_format
GET /docs Swagger UI
GET /redoc ReDoc UI

Local Development

Prerequisites

  • Python 3.11+
  • MySQL database (existing schema β€” unchanged)
  • Century Gothic font files in fonts/ directory

Setup

cd python_service

# Create virtualenv
python -m venv .venv
.venv\Scripts\activate    # Windows
# source .venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment
copy .env.example .env
# Edit .env with your database credentials and API URLs

Run

uvicorn app.main:app --reload --host 0.0.0.0 --port 7860

Visit http://localhost:7860/docs for the interactive API documentation.

Run Tests

pytest tests/ -v

Docker

Build

docker build -t isp-handbook-service .

Run

docker run -d \
  --name handbook-service \
  -p 7860:7860 \
  -e DB_HOST=host.docker.internal \
  -e DB_USER=root \
  -e DB_PASSWORD=secret \
  -e DB_NAME=handbook \
  -e API_BASE_URL=https://finsapdev.qhtestingserver.com \
  isp-handbook-service

Or with an env file:

docker run -d --name handbook-service -p 7860:7860 --env-file .env isp-handbook-service

Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face with Docker SDK
  2. Upload/push the python_service/ directory as the Space root
  3. Ensure fonts/, images/, and css/ directories are included
  4. Set environment variables (Secrets) in Space settings:
    • DB_HOST, DB_USER, DB_PASSWORD, DB_NAME
    • API_BASE_URL
    • PORT=7860 (default for HF Spaces)
  5. The Dockerfile is already configured for HF Spaces (port 7860, 0.0.0.0)

Important: Hugging Face Spaces may not allow outbound MySQL connections. If direct DB access is needed, use the external API endpoint approach (the service fetches data from the PHP JSON APIs over HTTP, not from the database directly).

PHP Integration Example

The PHP application can call this Python service over HTTP using cURL:

<?php
/**
 * PHP client for the ISP Handbook Python Service.
 * Replace HANDBOOK_SERVICE_URL with your actual deployment URL.
 */

define('HANDBOOK_SERVICE_URL', 'http://localhost:7860');

/**
 * Check service health.
 */
function handbook_health(): array {
    $url = HANDBOOK_SERVICE_URL . '/health';
    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 5,
    ]);
    $body = curl_exec($ch);
    $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($code !== 200) {
        return ['ok' => false, 'error' => 'Service unreachable', 'http_code' => $code];
    }
    return json_decode($body, true) ?? ['ok' => false, 'error' => 'Invalid response'];
}

/**
 * Generate and download the handbook PDF.
 */
function handbook_download_pdf(int $catalogId = 0, bool $debug = false): void {
    $params = http_build_query([
        'catalog_id' => $catalogId,
        'debug'      => $debug ? 'true' : 'false',
    ]);
    $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/pdf?' . $params;

    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_FOLLOWLOCATION => true,
    ]);
    $body = curl_exec($ch);
    $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
    $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
    curl_close($ch);

    if ($code !== 200 || strpos($contentType, 'application/pdf') === false) {
        http_response_code(502);
        header('Content-Type: text/plain');
        echo "PDF generation failed (HTTP $code)";
        return;
    }

    header('Content-Type: application/pdf');
    header('Content-Disposition: attachment; filename="ISP_Handbook.pdf"');
    header('Content-Length: ' . strlen($body));
    echo $body;
}

/**
 * Fetch global sections via the Python service.
 */
function handbook_get_sections(int $catalogId = 0): array {
    $url = HANDBOOK_SERVICE_URL . '/api/v1/sections/global?catalog_id=' . $catalogId;
    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 25,
    ]);
    $body = curl_exec($ch);
    curl_close($ch);
    return json_decode($body, true) ?? [];
}

/**
 * Generate handbook via POST with custom options.
 */
function handbook_generate(array $options = []): string {
    $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/render';
    $payload = json_encode(array_merge([
        'catalog_id'                => 0,
        'include_inactive_programs' => false,
        'debug'                     => false,
        'output_format'             => 'pdf',
    ], $options));

    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => $payload,
        CURLOPT_HTTPHEADER     => ['Content-Type: application/json'],
        CURLOPT_TIMEOUT        => 120,
    ]);
    $body = curl_exec($ch);
    curl_close($ch);
    return $body;
}

Usage in PHP

// Health check
$status = handbook_health();
if ($status['status'] === 'ok') {
    echo "Service is running\n";
}

// Stream PDF to browser
handbook_download_pdf(catalogId: 1);

// Get sections data
$sections = handbook_get_sections(catalogId: 1);
print_r($sections);

Migration Notes & Assumptions

What was migrated

PHP Component Python Equivalent Notes
common.php (URL builder, HTTP client) data_fetcher.py Uses httpx instead of cURL
cors.php FastAPI CORS middleware Same origins preserved
helpers.php (h(), respondJson()) Built into FastAPI + utils.py
fetchers.php (global/uni data fetch) data_fetcher.py Identical normalisation logic
renderers.php (TOC, blocks, university) renderers.py All block types preserved
html_builder.php (buildHandbookHtml) html_builder.py Same HTML structure
pdf.php (Dompdf render) pdf_service.py WeasyPrint replaces Dompdf
images.php (image config) pdf_service.py _get_images_config()
font_diagnostics.php GET /diagnostics/fonts
db.php (mysqli) database.py (SQLAlchemy) Available but not primary path

Key differences

  1. PDF engine: WeasyPrint replaces Dompdf. Layout may differ slightly in edge cases (table widths, page breaks). Both support @font-face with base64 TTF and @page rules.

  2. TOC page numbers: The PHP code uses a 2-pass Dompdf render to inject exact TOC page numbers via named destinations. WeasyPrint doesn't expose named destinations the same way. TOC pages are assigned sequentially in the initial migration. Exact page numbers can be added via a post-processing PDF pass if needed.

  3. No auth: The PHP code has no authentication. The Python service also has none. Add API key middleware if this service is exposed publicly.

  4. Data source: The service fetches data from the same two PHP JSON APIs over HTTP (not directly from the database). The repositories/handbook_repo.py provides a DB fallback if you want to bypass the PHP APIs entirely.

  5. SSL verification: Disabled for internal API calls (verify=False in httpx), matching the PHP behavior (CURLOPT_SSL_VERIFYPEER => false).

Risks

  • Font rendering: Century Gothic rendering may differ slightly between Dompdf (PHP) and WeasyPrint (Python). Test with actual fonts.
  • Page break behavior: Dompdf and WeasyPrint handle CSS page-break-* properties slightly differently.
  • Image embedding: Remote campus images are fetched at generation time. Network issues will result in placeholder cells (same as PHP behavior).
  • Memory: Large handbooks with many university images may require significant memory. The Dockerfile doesn't set memory limits β€” Hugging Face Spaces has its own limits.