--- title: ISP Handbook Engine emoji: 📘 colorFrom: blue colorTo: indigo sdk: docker pinned: false --- # ISP Handbook Service — Python Migration A Python/FastAPI service that generates the ISP (International Scholars Program) Handbook as PDF or HTML. This is a drop-in replacement for the PHP handbook generation pipeline, designed to be called over HTTP from the existing PHP application. ## Architecture ``` python_service/ ├── app/ │ ├── main.py # FastAPI entry point │ ├── api/ │ │ └── routes.py # REST endpoints │ ├── core/ │ │ ├── config.py # Environment-based settings │ │ ├── database.py # SQLAlchemy engine (MySQL) │ │ ├── fonts.py # Century Gothic font management │ │ └── logging.py # Logging setup │ ├── models/ # SQLAlchemy models (if needed) │ ├── repositories/ │ │ └── handbook_repo.py # Direct DB access (fallback) │ ├── schemas/ │ │ └── handbook.py # Pydantic request/response models │ └── services/ │ ├── data_fetcher.py # Fetch data from external JSON APIs │ ├── html_builder.py # Build full handbook HTML │ ├── pdf_service.py # HTML -> PDF via WeasyPrint │ ├── renderers.py # TOC, sections, university renderers │ └── utils.py # Shared helpers (h, money format, etc.) ├── tests/ │ ├── test_api.py │ └── test_renderers.py ├── fonts/ # Century Gothic TTF files ├── images/ # Handbook images (cover, header, etc.) ├── css/ # Base stylesheet ├── Dockerfile ├── requirements.txt ├── .env.example └── README.md ``` ## API Endpoints | Method | Path | Description | |--------|------|-------------| | `GET` | `/health` | Health check | | `GET` | `/diagnostics/fonts` | Font file diagnostics | | `GET` | `/api/v1/sections/global?catalog_id=0` | Fetch normalised global sections | | `GET` | `/api/v1/sections/universities` | Fetch normalised university sections | | `GET` | `/api/v1/handbook/pdf?catalog_id=0` | Generate PDF (download) | | `POST` | `/api/v1/handbook/pdf` | Generate PDF with JSON body | | `GET` | `/api/v1/handbook/html?catalog_id=0` | Generate HTML preview | | `POST` | `/api/v1/handbook/render` | Generate PDF or HTML based on `output_format` | | `GET` | `/docs` | Swagger UI | | `GET` | `/redoc` | ReDoc UI | ## Local Development ### Prerequisites - Python 3.11+ - MySQL database (existing schema — unchanged) - Century Gothic font files in `fonts/` directory ### Setup ```bash cd python_service # Create virtualenv python -m venv .venv .venv\Scripts\activate # Windows # source .venv/bin/activate # Linux/Mac # Install dependencies pip install -r requirements.txt # Copy and configure environment copy .env.example .env # Edit .env with your database credentials and API URLs ``` ### Run ```bash uvicorn app.main:app --reload --host 0.0.0.0 --port 7860 ``` Visit http://localhost:7860/docs for the interactive API documentation. ### Run Tests ```bash pytest tests/ -v ``` ## Docker ### Build ```bash docker build -t isp-handbook-service . ``` ### Run ```bash docker run -d \ --name handbook-service \ -p 7860:7860 \ -e DB_HOST=host.docker.internal \ -e DB_USER=root \ -e DB_PASSWORD=secret \ -e DB_NAME=handbook \ -e API_BASE_URL=https://finsapdev.qhtestingserver.com \ isp-handbook-service ``` Or with an env file: ```bash docker run -d --name handbook-service -p 7860:7860 --env-file .env isp-handbook-service ``` ## Hugging Face Spaces Deployment 1. Create a new Space on Hugging Face with **Docker** SDK 2. Upload/push the `python_service/` directory as the Space root 3. Ensure `fonts/`, `images/`, and `css/` directories are included 4. Set environment variables (Secrets) in Space settings: - `DB_HOST`, `DB_USER`, `DB_PASSWORD`, `DB_NAME` - `API_BASE_URL` - `PORT=7860` (default for HF Spaces) 5. The `Dockerfile` is already configured for HF Spaces (port 7860, `0.0.0.0`) **Important**: Hugging Face Spaces may not allow outbound MySQL connections. If direct DB access is needed, use the external API endpoint approach (the service fetches data from the PHP JSON APIs over HTTP, not from the database directly). ## PHP Integration Example The PHP application can call this Python service over HTTP using cURL: ```php true, CURLOPT_TIMEOUT => 5, ]); $body = curl_exec($ch); $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($code !== 200) { return ['ok' => false, 'error' => 'Service unreachable', 'http_code' => $code]; } return json_decode($body, true) ?? ['ok' => false, 'error' => 'Invalid response']; } /** * Generate and download the handbook PDF. */ function handbook_download_pdf(int $catalogId = 0, bool $debug = false): void { $params = http_build_query([ 'catalog_id' => $catalogId, 'debug' => $debug ? 'true' : 'false', ]); $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/pdf?' . $params; $ch = curl_init($url); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 120, CURLOPT_FOLLOWLOCATION => true, ]); $body = curl_exec($ch); $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE); curl_close($ch); if ($code !== 200 || strpos($contentType, 'application/pdf') === false) { http_response_code(502); header('Content-Type: text/plain'); echo "PDF generation failed (HTTP $code)"; return; } header('Content-Type: application/pdf'); header('Content-Disposition: attachment; filename="ISP_Handbook.pdf"'); header('Content-Length: ' . strlen($body)); echo $body; } /** * Fetch global sections via the Python service. */ function handbook_get_sections(int $catalogId = 0): array { $url = HANDBOOK_SERVICE_URL . '/api/v1/sections/global?catalog_id=' . $catalogId; $ch = curl_init($url); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 25, ]); $body = curl_exec($ch); curl_close($ch); return json_decode($body, true) ?? []; } /** * Generate handbook via POST with custom options. */ function handbook_generate(array $options = []): string { $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/render'; $payload = json_encode(array_merge([ 'catalog_id' => 0, 'include_inactive_programs' => false, 'debug' => false, 'output_format' => 'pdf', ], $options)); $ch = curl_init($url); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_POSTFIELDS => $payload, CURLOPT_HTTPHEADER => ['Content-Type: application/json'], CURLOPT_TIMEOUT => 120, ]); $body = curl_exec($ch); curl_close($ch); return $body; } ``` ### Usage in PHP ```php // Health check $status = handbook_health(); if ($status['status'] === 'ok') { echo "Service is running\n"; } // Stream PDF to browser handbook_download_pdf(catalogId: 1); // Get sections data $sections = handbook_get_sections(catalogId: 1); print_r($sections); ``` ## Migration Notes & Assumptions ### What was migrated | PHP Component | Python Equivalent | Notes | |---|---|---| | `common.php` (URL builder, HTTP client) | `data_fetcher.py` | Uses `httpx` instead of cURL | | `cors.php` | FastAPI CORS middleware | Same origins preserved | | `helpers.php` (`h()`, `respondJson()`) | Built into FastAPI + `utils.py` | | | `fetchers.php` (global/uni data fetch) | `data_fetcher.py` | Identical normalisation logic | | `renderers.php` (TOC, blocks, university) | `renderers.py` | All block types preserved | | `html_builder.php` (`buildHandbookHtml`) | `html_builder.py` | Same HTML structure | | `pdf.php` (Dompdf render) | `pdf_service.py` | **WeasyPrint** replaces Dompdf | | `images.php` (image config) | `pdf_service.py` `_get_images_config()` | | | `font_diagnostics.php` | `GET /diagnostics/fonts` | | | `db.php` (mysqli) | `database.py` (SQLAlchemy) | Available but not primary path | ### Key differences 1. **PDF engine**: WeasyPrint replaces Dompdf. Layout may differ slightly in edge cases (table widths, page breaks). Both support `@font-face` with base64 TTF and `@page` rules. 2. **TOC page numbers**: The PHP code uses a 2-pass Dompdf render to inject exact TOC page numbers via named destinations. WeasyPrint doesn't expose named destinations the same way. TOC pages are assigned sequentially in the initial migration. Exact page numbers can be added via a post-processing PDF pass if needed. 3. **No auth**: The PHP code has no authentication. The Python service also has none. Add API key middleware if this service is exposed publicly. 4. **Data source**: The service fetches data from the same two PHP JSON APIs over HTTP (not directly from the database). The `repositories/handbook_repo.py` provides a DB fallback if you want to bypass the PHP APIs entirely. 5. **SSL verification**: Disabled for internal API calls (`verify=False` in httpx), matching the PHP behavior (`CURLOPT_SSL_VERIFYPEER => false`). ### Risks - **Font rendering**: Century Gothic rendering may differ slightly between Dompdf (PHP) and WeasyPrint (Python). Test with actual fonts. - **Page break behavior**: Dompdf and WeasyPrint handle CSS `page-break-*` properties slightly differently. - **Image embedding**: Remote campus images are fetched at generation time. Network issues will result in placeholder cells (same as PHP behavior). - **Memory**: Large handbooks with many university images may require significant memory. The Dockerfile doesn't set memory limits — Hugging Face Spaces has its own limits.