File size: 10,508 Bytes
ec94fc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---
title: ISP Handbook Engine
emoji: πŸ“˜
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---

# ISP Handbook Service β€” Python Migration

A Python/FastAPI service that generates the ISP (International Scholars Program) Handbook as PDF or HTML. This is a drop-in replacement for the PHP handbook generation pipeline, designed to be called over HTTP from the existing PHP application.

## Architecture

```
python_service/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # FastAPI entry point
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── routes.py        # REST endpoints
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py        # Environment-based settings
β”‚   β”‚   β”œβ”€β”€ database.py      # SQLAlchemy engine (MySQL)
β”‚   β”‚   β”œβ”€β”€ fonts.py         # Century Gothic font management
β”‚   β”‚   └── logging.py       # Logging setup
β”‚   β”œβ”€β”€ models/              # SQLAlchemy models (if needed)
β”‚   β”œβ”€β”€ repositories/
β”‚   β”‚   └── handbook_repo.py # Direct DB access (fallback)
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── handbook.py      # Pydantic request/response models
β”‚   └── services/
β”‚       β”œβ”€β”€ data_fetcher.py   # Fetch data from external JSON APIs
β”‚       β”œβ”€β”€ html_builder.py   # Build full handbook HTML
β”‚       β”œβ”€β”€ pdf_service.py    # HTML -> PDF via WeasyPrint
β”‚       β”œβ”€β”€ renderers.py      # TOC, sections, university renderers
β”‚       └── utils.py          # Shared helpers (h, money format, etc.)
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_api.py
β”‚   └── test_renderers.py
β”œβ”€β”€ fonts/                    # Century Gothic TTF files
β”œβ”€β”€ images/                   # Handbook images (cover, header, etc.)
β”œβ”€β”€ css/                      # Base stylesheet
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md
```

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/diagnostics/fonts` | Font file diagnostics |
| `GET` | `/api/v1/sections/global?catalog_id=0` | Fetch normalised global sections |
| `GET` | `/api/v1/sections/universities` | Fetch normalised university sections |
| `GET` | `/api/v1/handbook/pdf?catalog_id=0` | Generate PDF (download) |
| `POST` | `/api/v1/handbook/pdf` | Generate PDF with JSON body |
| `GET` | `/api/v1/handbook/html?catalog_id=0` | Generate HTML preview |
| `POST` | `/api/v1/handbook/render` | Generate PDF or HTML based on `output_format` |
| `GET` | `/docs` | Swagger UI |
| `GET` | `/redoc` | ReDoc UI |

## Local Development

### Prerequisites

- Python 3.11+
- MySQL database (existing schema β€” unchanged)
- Century Gothic font files in `fonts/` directory

### Setup

```bash
cd python_service

# Create virtualenv
python -m venv .venv
.venv\Scripts\activate    # Windows
# source .venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment
copy .env.example .env
# Edit .env with your database credentials and API URLs
```

### Run

```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
```

Visit http://localhost:7860/docs for the interactive API documentation.

### Run Tests

```bash
pytest tests/ -v
```

## Docker

### Build

```bash
docker build -t isp-handbook-service .
```

### Run

```bash
docker run -d \
  --name handbook-service \
  -p 7860:7860 \
  -e DB_HOST=host.docker.internal \
  -e DB_USER=root \
  -e DB_PASSWORD=secret \
  -e DB_NAME=handbook \
  -e API_BASE_URL=https://finsapdev.qhtestingserver.com \
  isp-handbook-service
```

Or with an env file:

```bash
docker run -d --name handbook-service -p 7860:7860 --env-file .env isp-handbook-service
```

## Hugging Face Spaces Deployment

1. Create a new Space on Hugging Face with **Docker** SDK
2. Upload/push the `python_service/` directory as the Space root
3. Ensure `fonts/`, `images/`, and `css/` directories are included
4. Set environment variables (Secrets) in Space settings:
   - `DB_HOST`, `DB_USER`, `DB_PASSWORD`, `DB_NAME`
   - `API_BASE_URL`
   - `PORT=7860` (default for HF Spaces)
5. The `Dockerfile` is already configured for HF Spaces (port 7860, `0.0.0.0`)

**Important**: Hugging Face Spaces may not allow outbound MySQL connections. If direct DB access is needed, use the external API endpoint approach (the service fetches data from the PHP JSON APIs over HTTP, not from the database directly).

## PHP Integration Example

The PHP application can call this Python service over HTTP using cURL:

```php
<?php
/**
 * PHP client for the ISP Handbook Python Service.
 * Replace HANDBOOK_SERVICE_URL with your actual deployment URL.
 */

define('HANDBOOK_SERVICE_URL', 'http://localhost:7860');

/**
 * Check service health.
 */
function handbook_health(): array {
    $url = HANDBOOK_SERVICE_URL . '/health';
    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 5,
    ]);
    $body = curl_exec($ch);
    $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($code !== 200) {
        return ['ok' => false, 'error' => 'Service unreachable', 'http_code' => $code];
    }
    return json_decode($body, true) ?? ['ok' => false, 'error' => 'Invalid response'];
}

/**
 * Generate and download the handbook PDF.
 */
function handbook_download_pdf(int $catalogId = 0, bool $debug = false): void {
    $params = http_build_query([
        'catalog_id' => $catalogId,
        'debug'      => $debug ? 'true' : 'false',
    ]);
    $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/pdf?' . $params;

    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_FOLLOWLOCATION => true,
    ]);
    $body = curl_exec($ch);
    $code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
    $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
    curl_close($ch);

    if ($code !== 200 || strpos($contentType, 'application/pdf') === false) {
        http_response_code(502);
        header('Content-Type: text/plain');
        echo "PDF generation failed (HTTP $code)";
        return;
    }

    header('Content-Type: application/pdf');
    header('Content-Disposition: attachment; filename="ISP_Handbook.pdf"');
    header('Content-Length: ' . strlen($body));
    echo $body;
}

/**
 * Fetch global sections via the Python service.
 */
function handbook_get_sections(int $catalogId = 0): array {
    $url = HANDBOOK_SERVICE_URL . '/api/v1/sections/global?catalog_id=' . $catalogId;
    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 25,
    ]);
    $body = curl_exec($ch);
    curl_close($ch);
    return json_decode($body, true) ?? [];
}

/**
 * Generate handbook via POST with custom options.
 */
function handbook_generate(array $options = []): string {
    $url = HANDBOOK_SERVICE_URL . '/api/v1/handbook/render';
    $payload = json_encode(array_merge([
        'catalog_id'                => 0,
        'include_inactive_programs' => false,
        'debug'                     => false,
        'output_format'             => 'pdf',
    ], $options));

    $ch = curl_init($url);
    curl_setopt_array($ch, [
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => $payload,
        CURLOPT_HTTPHEADER     => ['Content-Type: application/json'],
        CURLOPT_TIMEOUT        => 120,
    ]);
    $body = curl_exec($ch);
    curl_close($ch);
    return $body;
}
```

### Usage in PHP

```php
// Health check
$status = handbook_health();
if ($status['status'] === 'ok') {
    echo "Service is running\n";
}

// Stream PDF to browser
handbook_download_pdf(catalogId: 1);

// Get sections data
$sections = handbook_get_sections(catalogId: 1);
print_r($sections);
```

## Migration Notes & Assumptions

### What was migrated

| PHP Component | Python Equivalent | Notes |
|---|---|---|
| `common.php` (URL builder, HTTP client) | `data_fetcher.py` | Uses `httpx` instead of cURL |
| `cors.php` | FastAPI CORS middleware | Same origins preserved |
| `helpers.php` (`h()`, `respondJson()`) | Built into FastAPI + `utils.py` | |
| `fetchers.php` (global/uni data fetch) | `data_fetcher.py` | Identical normalisation logic |
| `renderers.php` (TOC, blocks, university) | `renderers.py` | All block types preserved |
| `html_builder.php` (`buildHandbookHtml`) | `html_builder.py` | Same HTML structure |
| `pdf.php` (Dompdf render) | `pdf_service.py` | **WeasyPrint** replaces Dompdf |
| `images.php` (image config) | `pdf_service.py` `_get_images_config()` | |
| `font_diagnostics.php` | `GET /diagnostics/fonts` | |
| `db.php` (mysqli) | `database.py` (SQLAlchemy) | Available but not primary path |

### Key differences

1. **PDF engine**: WeasyPrint replaces Dompdf. Layout may differ slightly in edge cases (table widths, page breaks). Both support `@font-face` with base64 TTF and `@page` rules.

2. **TOC page numbers**: The PHP code uses a 2-pass Dompdf render to inject exact TOC page numbers via named destinations. WeasyPrint doesn't expose named destinations the same way. TOC pages are assigned sequentially in the initial migration. Exact page numbers can be added via a post-processing PDF pass if needed.

3. **No auth**: The PHP code has no authentication. The Python service also has none. Add API key middleware if this service is exposed publicly.

4. **Data source**: The service fetches data from the same two PHP JSON APIs over HTTP (not directly from the database). The `repositories/handbook_repo.py` provides a DB fallback if you want to bypass the PHP APIs entirely.

5. **SSL verification**: Disabled for internal API calls (`verify=False` in httpx), matching the PHP behavior (`CURLOPT_SSL_VERIFYPEER => false`).

### Risks

- **Font rendering**: Century Gothic rendering may differ slightly between Dompdf (PHP) and WeasyPrint (Python). Test with actual fonts.
- **Page break behavior**: Dompdf and WeasyPrint handle CSS `page-break-*` properties slightly differently.
- **Image embedding**: Remote campus images are fetched at generation time. Network issues will result in placeholder cells (same as PHP behavior).
- **Memory**: Large handbooks with many university images may require significant memory. The Dockerfile doesn't set memory limits β€” Hugging Face Spaces has its own limits.