File size: 10,508 Bytes
ec94fc1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | ---
title: ISP Handbook Engine
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# ISP Handbook Service β Python Migration
A Python/FastAPI service that generates the ISP (International Scholars Program) Handbook as PDF or HTML. This is a drop-in replacement for the PHP handbook generation pipeline, designed to be called over HTTP from the existing PHP application.
## Architecture
```
python_service/
βββ app/
β βββ main.py # FastAPI entry point
β βββ api/
β β βββ routes.py # REST endpoints
β βββ core/
β β βββ config.py # Environment-based settings
β β βββ database.py # SQLAlchemy engine (MySQL)
β β βββ fonts.py # Century Gothic font management
β β βββ logging.py # Logging setup
β βββ models/ # SQLAlchemy models (if needed)
β βββ repositories/
β β βββ handbook_repo.py # Direct DB access (fallback)
β βββ schemas/
β β βββ handbook.py # Pydantic request/response models
β βββ services/
β βββ data_fetcher.py # Fetch data from external JSON APIs
β βββ html_builder.py # Build full handbook HTML
β βββ pdf_service.py # HTML -> PDF via WeasyPrint
β βββ renderers.py # TOC, sections, university renderers
β βββ utils.py # Shared helpers (h, money format, etc.)
βββ tests/
β βββ test_api.py
β βββ test_renderers.py
βββ fonts/ # Century Gothic TTF files
βββ images/ # Handbook images (cover, header, etc.)
βββ css/ # Base stylesheet
βββ Dockerfile
βββ requirements.txt
βββ .env.example
βββ README.md
```
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/diagnostics/fonts` | Font file diagnostics |
| `GET` | `/api/v1/sections/global?catalog_id=0` | Fetch normalised global sections |
| `GET` | `/api/v1/sections/universities` | Fetch normalised university sections |
| `GET` | `/api/v1/handbook/pdf?catalog_id=0` | Generate PDF (download) |
| `POST` | `/api/v1/handbook/pdf` | Generate PDF with JSON body |
| `GET` | `/api/v1/handbook/html?catalog_id=0` | Generate HTML preview |
| `POST` | `/api/v1/handbook/render` | Generate PDF or HTML based on `output_format` |
| `GET` | `/docs` | Swagger UI |
| `GET` | `/redoc` | ReDoc UI |
## Local Development
### Prerequisites
- Python 3.11+
- MySQL database (existing schema β unchanged)
- Century Gothic font files in `fonts/` directory
### Setup
```bash
cd python_service
# Create virtualenv
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
# Copy and configure environment
copy .env.example .env
# Edit .env with your database credentials and API URLs
```
### Run
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
```
Visit http://localhost:7860/docs for the interactive API documentation.
### Run Tests
```bash
pytest tests/ -v
```
## Docker
### Build
```bash
docker build -t isp-handbook-service .
```
### Run
```bash
docker run -d \
--name handbook-service \
-p 7860:7860 \
-e DB_HOST=host.docker.internal \
-e DB_USER=root \
-e DB_PASSWORD=secret \
-e DB_NAME=handbook \
-e API_BASE_URL=https://finsapdev.qhtestingserver.com \
isp-handbook-service
```
Or with an env file:
```bash
docker run -d --name handbook-service -p 7860:7860 --env-file .env isp-handbook-service
```
## Hugging Face Spaces Deployment
1. Create a new Space on Hugging Face with **Docker** SDK
2. Upload/push the `python_service/` directory as the Space root
3. Ensure `fonts/`, `images/`, and `css/` directories are included
4. Set environment variables (Secrets) in Space settings:
- `DB_HOST`, `DB_USER`, `DB_PASSWORD`, `DB_NAME`
- `API_BASE_URL`
- `PORT=7860` (default for HF Spaces)
5. The `Dockerfile` is already configured for HF Spaces (port 7860, `0.0.0.0`)
**Important**: Hugging Face Spaces may not allow outbound MySQL connections. If direct DB access is needed, use the external API endpoint approach (the service fetches data from the PHP JSON APIs over HTTP, not from the database directly).
## PHP Integration Example
The PHP application can call this Python service over HTTP using cURL:
```php
<?php
/**
* PHP client for the ISP Handbook Python Service.
* Replace HANDBOOK_SERVICE_URL with your actual deployment URL.
*/
define('HANDBOOK_SERVICE_URL', 'http://localhost:7860');
/**
* Check service health.
*/
function handbook_health(): array {
$url = HANDBOOK_SERVICE_URL . '/health';
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 5,
]);
$body = curl_exec($ch);
$code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($code !== 200) {
return ['ok' => false, 'error' => 'Service unreachable', 'http_code' => $code];
}
return json_decode($body, true) ?? ['ok' => false, 'error' => 'Invalid response'];
}
/**
* Generate and download the handbook PDF.
*/
function handbook_download_pdf(int $catalogId = 0, bool $debug = false): void {
$params = http_build_query([
'catalog_id' => $catalogId,
'debug' => $debug ? 'true' : 'false',
]);
$url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/pdf?' . $params;
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 120,
CURLOPT_FOLLOWLOCATION => true,
]);
$body = curl_exec($ch);
$code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
$contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
curl_close($ch);
if ($code !== 200 || strpos($contentType, 'application/pdf') === false) {
http_response_code(502);
header('Content-Type: text/plain');
echo "PDF generation failed (HTTP $code)";
return;
}
header('Content-Type: application/pdf');
header('Content-Disposition: attachment; filename="ISP_Handbook.pdf"');
header('Content-Length: ' . strlen($body));
echo $body;
}
/**
* Fetch global sections via the Python service.
*/
function handbook_get_sections(int $catalogId = 0): array {
$url = HANDBOOK_SERVICE_URL . '/api/v1/sections/global?catalog_id=' . $catalogId;
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 25,
]);
$body = curl_exec($ch);
curl_close($ch);
return json_decode($body, true) ?? [];
}
/**
* Generate handbook via POST with custom options.
*/
function handbook_generate(array $options = []): string {
$url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/render';
$payload = json_encode(array_merge([
'catalog_id' => 0,
'include_inactive_programs' => false,
'debug' => false,
'output_format' => 'pdf',
], $options));
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $payload,
CURLOPT_HTTPHEADER => ['Content-Type: application/json'],
CURLOPT_TIMEOUT => 120,
]);
$body = curl_exec($ch);
curl_close($ch);
return $body;
}
```
### Usage in PHP
```php
// Health check
$status = handbook_health();
if ($status['status'] === 'ok') {
echo "Service is running\n";
}
// Stream PDF to browser
handbook_download_pdf(catalogId: 1);
// Get sections data
$sections = handbook_get_sections(catalogId: 1);
print_r($sections);
```
## Migration Notes & Assumptions
### What was migrated
| PHP Component | Python Equivalent | Notes |
|---|---|---|
| `common.php` (URL builder, HTTP client) | `data_fetcher.py` | Uses `httpx` instead of cURL |
| `cors.php` | FastAPI CORS middleware | Same origins preserved |
| `helpers.php` (`h()`, `respondJson()`) | Built into FastAPI + `utils.py` | |
| `fetchers.php` (global/uni data fetch) | `data_fetcher.py` | Identical normalisation logic |
| `renderers.php` (TOC, blocks, university) | `renderers.py` | All block types preserved |
| `html_builder.php` (`buildHandbookHtml`) | `html_builder.py` | Same HTML structure |
| `pdf.php` (Dompdf render) | `pdf_service.py` | **WeasyPrint** replaces Dompdf |
| `images.php` (image config) | `pdf_service.py` `_get_images_config()` | |
| `font_diagnostics.php` | `GET /diagnostics/fonts` | |
| `db.php` (mysqli) | `database.py` (SQLAlchemy) | Available but not primary path |
### Key differences
1. **PDF engine**: WeasyPrint replaces Dompdf. Layout may differ slightly in edge cases (table widths, page breaks). Both support `@font-face` with base64 TTF and `@page` rules.
2. **TOC page numbers**: The PHP code uses a 2-pass Dompdf render to inject exact TOC page numbers via named destinations. WeasyPrint doesn't expose named destinations the same way. TOC pages are assigned sequentially in the initial migration. Exact page numbers can be added via a post-processing PDF pass if needed.
3. **No auth**: The PHP code has no authentication. The Python service also has none. Add API key middleware if this service is exposed publicly.
4. **Data source**: The service fetches data from the same two PHP JSON APIs over HTTP (not directly from the database). The `repositories/handbook_repo.py` provides a DB fallback if you want to bypass the PHP APIs entirely.
5. **SSL verification**: Disabled for internal API calls (`verify=False` in httpx), matching the PHP behavior (`CURLOPT_SSL_VERIFYPEER => false`).
### Risks
- **Font rendering**: Century Gothic rendering may differ slightly between Dompdf (PHP) and WeasyPrint (Python). Test with actual fonts.
- **Page break behavior**: Dompdf and WeasyPrint handle CSS `page-break-*` properties slightly differently.
- **Image embedding**: Remote campus images are fetched at generation time. Network issues will result in placeholder cells (same as PHP behavior).
- **Memory**: Large handbooks with many university images may require significant memory. The Dockerfile doesn't set memory limits β Hugging Face Spaces has its own limits.
|