Spaces:

PhonePixelGhost
/

Image_Classification_Service

Running

App Files Files Community

PhonePixelGhost commited on 1 day ago

Commit

17d2f7c

verified ·

1 Parent(s): 76419e9

Upload folder using huggingface_hub

Browse files

Files changed (21) hide show

.dockerignore +15 -0
.gitattributes +2 -0
.gitignore +56 -0
.pytest_cache/.gitignore +2 -0
.pytest_cache/CACHEDIR.TAG +4 -0
.pytest_cache/README.md +8 -0
.pytest_cache/v/cache/nodeids +7 -0
0a2152e6-4aef-11f1-b222-345a603e44a9.data +3 -0
72a8c263-4aef-11f1-aad7-345a603e44a9.data +3 -0
README.md +184 -10
SKILL.md +495 -0
app/main.py +24 -2
jmeter_test_plan.jmx +123 -0
postman_collection.json +65 -0
scripts/01_baseline_test.py +35 -0
scripts/02_export_onnx.py +29 -0
scripts/03_quantize.py +51 -0
scripts/04_benchmark_onnx.py +42 -0
sym_shape_infer_temp.onnx +3 -0
tests/conftest.py +5 -0
tests/test_api.py +62 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,15 @@

+tests/
+.git/
+.github/
+*.pt
+*.pth
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.pytest_cache/
+.venv/
+venv/
+*.md
+.gitignore
+scripts/

.gitattributes CHANGED Viewed

@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 models/resnet18.onnx.data filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 models/resnet18.onnx.data filter=lfs diff=lfs merge=lfs -text
+0a2152e6-4aef-11f1-b222-345a603e44a9.data filter=lfs diff=lfs merge=lfs -text
+72a8c263-4aef-11f1-aad7-345a603e44a9.data filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,56 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+# Models
+*.pt
+*.pth
+*.bin
+pytorch_model/
+# Logs
+*.log
+# OS
+.DS_Store
+Thumbs.db
+# Test files
+test.jpg

.pytest_cache/.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Created by pytest automatically.
2	+ *

.pytest_cache/CACHEDIR.TAG ADDED Viewed

	@@ -0,0 +1,4 @@

+Signature: 8a477f597d28d172789f06886806bc55
+# This file is a cache directory tag created by pytest.
+# For information about cache directory tags, see:
+#	https://bford.info/cachedir/spec.html

.pytest_cache/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+# pytest cache directory #
+This directory contains data from the pytest's cache plugin,
+which provides the `--lf` and `--ff` options, as well as the `cache` fixture.
+**Do not** commit this to version control.
+See [the docs](https://docs.pytest.org/en/stable/how-to/cache.html) for more information.

.pytest_cache/v/cache/nodeids ADDED Viewed

	@@ -0,0 +1,7 @@

+[
+  "tests/test_api.py::test_health_endpoint",
+  "tests/test_api.py::test_predict_rejects_corrupted_file",
+  "tests/test_api.py::test_predict_rejects_non_image",
+  "tests/test_api.py::test_predict_rejects_oversized_file",
+  "tests/test_api.py::test_predict_returns_valid_json"
+]

0a2152e6-4aef-11f1-b222-345a603e44a9.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:237ab7da3d82e3a5e7fbd88cb146e9ba328e7492c4c21d65b131002249cb6979
+size 46735008

72a8c263-4aef-11f1-aad7-345a603e44a9.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:237ab7da3d82e3a5e7fbd88cb146e9ba328e7492c4c21d65b131002249cb6979
+size 46735008

README.md CHANGED Viewed

@@ -1,10 +1,184 @@
----
-title: Image Classification Service
-emoji: 🌖
-colorFrom: gray
-colorTo: yellow
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# High-Throughput Image Classification Service
+A production-ready image classification API using ResNet-18 with ONNX optimization, FastAPI, and CI/CD pipeline.
+## Features
+- **Optimized Model**: ResNet-18 converted to ONNX with dynamic quantization (~70% size reduction)
+- **High Performance**: ProcessPoolExecutor for concurrent request handling
+- **Production Ready**: Docker containerization, comprehensive error handling
+- **CI/CD Pipeline**: Automated testing and deployment to Hugging Face Spaces
+- **Comprehensive Testing**: pytest unit tests with 100% endpoint coverage
+## Project Structure
+```
+image-classification-service/
+├── app/
+│   ├── __init__.py
+│   ├── main.py          # FastAPI application
+│   ├── model.py         # ONNX inference logic
+│   └── schemas.py       # Pydantic models
+├── models/
+│   └── resnet18_quantized.onnx  # Optimized model
+├── tests/
+│   └── test_api.py      # Unit tests
+├── scripts/
+│   ├── 01_baseline_test.py      # PyTorch baseline benchmark
+│   ├── 02_export_onnx.py        # Export to ONNX
+│   ├── 03_quantize.py           # Dynamic quantization
+│   └── 04_benchmark_onnx.py     # ONNX benchmark
+├── .github/
+│   └── workflows/
+│       └── ci-cd.yml    # GitHub Actions pipeline
+├── Dockerfile
+├── .dockerignore
+├── requirements.txt
+└── README.md
+```
+## Quick Start
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Prepare the Model
+Run the optimization scripts in order:
+```bash
+cd scripts
+python 01_baseline_test.py      # Measure PyTorch baseline
+python 02_export_onnx.py        # Export to ONNX
+python 03_quantize.py           # Apply quantization
+python 04_benchmark_onnx.py     # Compare performance
+cd ..
+```
+### 3. Run the API
+```bash
+uvicorn app.main:app --host 0.0.0.0 --port 7860
+```
+### 4. Test the API
+```bash
+# Health check
+curl http://localhost:7860/health
+# Predict
+curl -X POST "http://localhost:7860/predict" \
+  -H "accept: application/json" \
+  -F "file=@/path/to/image.jpg"
+```
+## Docker Deployment
+### Build and Run
+```bash
+docker build -t image-classifier .
+docker run -p 7860:7860 image-classifier
+```
+## Testing
+```bash
+pytest tests/ -v
+```
+## API Endpoints
+### GET /health
+Health check endpoint.
+**Response:**
+```json
+{
+  "status": "ok"
+}
+```
+### POST /predict
+Image classification endpoint.
+**Request:**
+- Content-Type: `multipart/form-data`
+- Body: `file` (image file)
+**Response:**
+```json
+{
+  "label": "tabby, tabby cat",
+  "score": 0.8234,
+  "label_id": 281,
+  "inference_time_ms": 45.123
+}
+```
+**Error Codes:**
+- `400`: Corrupted or invalid image
+- `413`: File too large (max 10MB)
+- `415`: Unsupported media type
+- `500`: Inference error
+## Performance Metrics
+| Format | File Size | Avg Latency | P95 Latency |
+|--------|-----------|-------------|-------------|
+| PyTorch | ~45 MB | baseline | baseline |
+| ONNX | ~45 MB | ~20% faster | - |
+| ONNX Quantized | ~12 MB | ~40% faster | - |
+*Run benchmark scripts to get actual measurements on your hardware*
+## CI/CD Pipeline
+The GitHub Actions workflow automatically:
+1. Runs unit tests on every push/PR
+2. Deploys to Hugging Face Spaces on main branch (requires `HF_TOKEN` secret)
+### Setup Hugging Face Deployment
+1. Create a Hugging Face Space
+2. Generate an access token with write permissions
+3. Add `HF_TOKEN` to GitHub repository secrets
+4. Update `.github/workflows/ci-cd.yml` with your Space URL
+## Model Details
+- **Base Model**: microsoft/resnet-18 (Hugging Face)
+- **Task**: Image Classification (ImageNet-1k)
+- **Input**: RGB images (224x224)
+- **Output**: 1000 class probabilities
+- **Optimization**: ONNX + Dynamic Quantization (QUint8)
+## Development
+### Adding New Features
+1. Update code in `app/`
+2. Add tests in `tests/`
+3. Run tests: `pytest tests/ -v`
+4. Update documentation
+### Performance Testing
+Use JMeter or similar tools to test throughput:
+- Concurrent users: 10, 50, 100
+- Measure: TPS, P95 latency, error rate
+## License
+MIT
+## Acknowledgments
+- Model: microsoft/resnet-18 from Hugging Face
+- Framework: FastAPI, ONNX Runtime

SKILL.md ADDED Viewed

	@@ -0,0 +1,495 @@

+---
+name: image-classification-mlops
+description: >
+  ทักษะสำหรับพัฒนาระบบ High-Throughput Image Classification Service ครบวงจร ตั้งแต่
+  Model Optimization, FastAPI Development, CI/CD Pipeline จนถึง Performance Testing
+  โดยใช้โมเดล microsoft/resnet-18 จาก Hugging Face
+  ใช้ skill นี้เมื่อ:
+  - ต้องการ Optimize โมเดล (ONNX Conversion + Dynamic Quantization)
+  - สร้าง FastAPI ที่รองรับ Concurrent Request ด้วย ProcessPoolExecutor
+  - เขียน Dockerfile สำหรับ Production
+  - ตั้งค่า GitHub Actions CI/CD → Deploy ไป Hugging Face Spaces
+  - เขียน pytest Unit Tests สำหรับ /predict endpoint
+  - วิเคราะห์ผล JMeter Load Test (Throughput / P95 Latency)
+  - เขียน Project Report หรือสร้าง System Architecture Diagram
+---
+# High-Throughput Image Classification Service — MLOps Skill
+## ภาพรวมโปรเจกต์
+| Phase | เนื้อหา |
+|---|---|
+| 1. Model Optimization | ResNet-18 → ONNX → Dynamic Quantization |
+| 2. API Development | FastAPI + ProcessPoolExecutor + Pydantic |
+| 3. Automation & CI/CD | pytest + GitHub Actions + HF Spaces Deploy |
+| 4. Performance Testing | JMeter Load Test + TPS/P95 Analysis |
+**โมเดลหลัก:** `microsoft/resnet-18` (Hugging Face)
+**Stack:** Python 3.11, FastAPI, ONNX Runtime, Transformers, Docker, GitHub Actions
+---
+## Phase 1 — Model Optimization
+### 1.1 Baseline Test (Original PyTorch)
+```python
+from transformers import AutoFeatureExtractor, ResNetForImageClassification
+import torch, time, os
+from PIL import Image
+model_id = "microsoft/resnet-18"
+extractor = AutoFeatureExtractor.from_pretrained(model_id)
+model = ResNetForImageClassification.from_pretrained(model_id)
+model.eval()
+# วัด Baseline Latency (100 runs)
+img = Image.open("test.jpg").convert("RGB")
+inputs = extractor(images=img, return_tensors="pt")
+times = []
+with torch.no_grad():
+    for _ in range(100):
+        t0 = time.perf_counter()
+        _ = model(**inputs)
+        times.append(time.perf_counter() - t0)
+print(f"Baseline Latency (avg): {sum(times)/len(times)*1000:.2f} ms")
+print(f"Model Size: {os.path.getsize('pytorch_model.bin')/1e6:.2f} MB")
+```
+### 1.2 Export to ONNX
+```python
+import torch
+from transformers import AutoFeatureExtractor, ResNetForImageClassification
+model_id = "microsoft/resnet-18"
+extractor = AutoFeatureExtractor.from_pretrained(model_id)
+model = ResNetForImageClassification.from_pretrained(model_id).eval()
+dummy = torch.randn(1, 3, 224, 224)
+torch.onnx.export(
+    model,
+    dummy,
+    "resnet18.onnx",
+    input_names=["pixel_values"],
+    output_names=["logits"],
+    dynamic_axes={"pixel_values": {0: "batch_size"}},
+    opset_version=17,
+)
+print("ONNX exported successfully")
+```
+### 1.3 Dynamic Quantization
+```python
+from onnxruntime.quantization import quantize_dynamic, QuantType
+quantize_dynamic(
+    model_input="resnet18.onnx",
+    model_output="resnet18_quantized.onnx",
+    weight_type=QuantType.QUint8,
+)
+print("Quantization complete")
+```
+### 1.4 ตารางเปรียบเทียบ (บันทึกผลจริงลงตาราง)
+| Format | File Size (MB) | Avg Latency (ms) | P95 Latency (ms) |
+|---|---|---|---|
+| Original (PyTorch) | ~45 | baseline | baseline |
+| ONNX | ~45 | คาดว่าเร็วขึ้น ~20% | - |
+| ONNX Quantized | ~12 | คาดว่าเร็วขึ้น ~40% | - |
+> **วิธีวัด:** รัน 100 ครั้ง → เก็บค่า avg และ percentile ด้วย `numpy.percentile(times, 95)`
+---
+## Phase 2 — API Development
+### 2.1 โครงสร้างโปรเจกต์
+```
+image-classification-service/
+├── app/
+│   ├── main.py          # FastAPI app
+│   ├── model.py         # ONNX inference logic
+│   └── schemas.py       # Pydantic models
+├── models/
+│   └── resnet18_quantized.onnx
+├── tests/
+│   └── test_api.py
+├── .github/
+│   └── workflows/
+│       └── ci-cd.yml
+├── Dockerfile
+├── requirements.txt
+└── README.md
+```
+### 2.2 Pydantic Schemas (`app/schemas.py`)
+```python
+from pydantic import BaseModel
+from typing import Optional
+class PredictionResponse(BaseModel):
+    label: str
+    score: float
+    label_id: int
+    inference_time_ms: float
+class ErrorResponse(BaseModel):
+    detail: str
+    error_code: str
+```
+### 2.3 ONNX Inference (`app/model.py`)
+```python
+import onnxruntime as ort
+import numpy as np
+from PIL import Image
+import io, time
+# Labels จาก ImageNet
+from transformers import AutoFeatureExtractor
+extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-18")
+# โหลด session ครั้งเดียว (module-level)
+session = ort.InferenceSession(
+    "models/resnet18_quantized.onnx",
+    providers=["CPUExecutionProvider"]
+)
+def run_inference(image_bytes: bytes) -> dict:
+    img = Image.open(io.BytesIO(image_bytes)).convert("RGB")
+    inputs = extractor(images=img, return_tensors="np")
+    pixel_values = inputs["pixel_values"].astype(np.float32)
+    t0 = time.perf_counter()
+    outputs = session.run(["logits"], {"pixel_values": pixel_values})
+    elapsed = (time.perf_counter() - t0) * 1000
+    logits = outputs[0][0]
+    probs = np.exp(logits) / np.sum(np.exp(logits))
+    label_id = int(np.argmax(probs))
+    # ดึง label จาก model config
+    from transformers import ResNetForImageClassification
+    cfg = ResNetForImageClassification.from_pretrained("microsoft/resnet-18").config
+    label = cfg.id2label.get(label_id, str(label_id))
+    return {
+        "label": label,
+        "score": float(probs[label_id]),
+        "label_id": label_id,
+        "inference_time_ms": round(elapsed, 3),
+    }
+```
+### 2.4 FastAPI Main App (`app/main.py`)
+```python
+from fastapi import FastAPI, File, UploadFile, HTTPException
+from concurrent.futures import ProcessPoolExecutor
+import asyncio
+from app.model import run_inference
+from app.schemas import PredictionResponse
+app = FastAPI(title="ResNet-18 Image Classifier", version="1.0.0")
+executor = ProcessPoolExecutor(max_workers=4)
+MAX_FILE_SIZE = 10 * 1024 * 1024  # 10 MB
+ALLOWED_CONTENT_TYPES = {"image/jpeg", "image/png", "image/webp", "image/gif"}
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+@app.post("/predict", response_model=PredictionResponse)
+async def predict(file: UploadFile = File(...)):
+    # Validate content type
+    if file.content_type not in ALLOWED_CONTENT_TYPES:
+        raise HTTPException(
+            status_code=415,
+            detail=f"Unsupported media type: {file.content_type}. Allowed: {ALLOWED_CONTENT_TYPES}"
+        )
+    image_bytes = await file.read()
+    # Validate file size
+    if len(image_bytes) > MAX_FILE_SIZE:
+        raise HTTPException(
+            status_code=413,
+            detail=f"File too large. Max size is {MAX_FILE_SIZE // 1024 // 1024} MB."
+        )
+    # Validate not corrupted (try opening with PIL)
+    try:
+        from PIL import Image
+        import io
+        Image.open(io.BytesIO(image_bytes)).verify()
+    except Exception:
+        raise HTTPException(status_code=400, detail="Corrupted or invalid image file.")
+    # Run CPU-bound inference in ProcessPoolExecutor (ไม่บล็อก event loop)
+    loop = asyncio.get_event_loop()
+    try:
+        result = await loop.run_in_executor(executor, run_inference, image_bytes)
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Inference error: {str(e)}")
+    return PredictionResponse(**result)
+```
+### 2.5 Error Handling Summary
+| สถานการณ์ | HTTP Status | รายละเอียด |
+|---|---|---|
+| ไฟล์ไม่ใช่รูปภาพ | 415 Unsupported Media Type | Content-type ไม่ตรง |
+| ไฟล์เสีย (Corrupted) | 400 Bad Request | PIL ไม่สามารถเปิดได้ |
+| ไฟล์ใหญ่เกินไป | 413 Request Entity Too Large | เกิน 10MB |
+| Inference Error | 500 Internal Server Error | โมเดลทำงานผิดพลาด |
+---
+## Phase 3 — Dockerfile
+```dockerfile
+# ใช้ slim image เพื่อลด size
+FROM python:3.11-slim
+WORKDIR /app
+# ติดตั้ง dependencies ก่อน (cache layer)
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy โค้ดและโมเดล
+COPY app/ ./app/
+COPY models/ ./models/
+EXPOSE 7860
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]
+```
+**requirements.txt:**
+```
+fastapi==0.111.0
+uvicorn[standard]==0.29.0
+python-multipart==0.0.9
+onnxruntime==1.18.0
+numpy==1.26.4
+Pillow==10.3.0
+transformers==4.41.0
+torch==2.3.0
+pydantic==2.7.1
+pytest==8.2.0
+httpx==0.27.0
+```
+> **เทคนิคลด Docker Image Size:**
+> - ใช้ `python:3.11-slim` (ไม่ใช่ full)
+> - `--no-cache-dir` ใน pip
+> - ลบ torch ออกหลัง export ONNX (ใน production image ไม่จำเป็น)
+> - ใช้ `.dockerignore` เพื่อ exclude `tests/`, `.git/`, `*.pt`
+---
+## Phase 4 — Unit Testing (`tests/test_api.py`)
+```python
+import pytest
+from fastapi.testclient import TestClient
+from app.main import app
+from pathlib import Path
+client = TestClient(app)
+# --- Helper ---
+def get_test_image() -> bytes:
+    """ใช้ภาพ test จริงหรือสร้าง dummy PNG"""
+    from PIL import Image
+    import io
+    img = Image.new("RGB", (224, 224), color=(128, 64, 32))
+    buf = io.BytesIO()
+    img.save(buf, format="JPEG")
+    return buf.getvalue()
+# --- Tests ---
+def test_health_endpoint():
+    res = client.get("/health")
+    assert res.status_code == 200
+    assert res.json() == {"status": "ok"}
+def test_predict_returns_valid_json():
+    img_bytes = get_test_image()
+    res = client.post(
+        "/predict",
+        files={"file": ("test.jpg", img_bytes, "image/jpeg")}
+    )
+    assert res.status_code == 200
+    data = res.json()
+    assert "label" in data
+    assert "score" in data
+    assert isinstance(data["score"], float)
+    assert 0.0 <= data["score"] <= 1.0
+def test_predict_rejects_non_image():
+    res = client.post(
+        "/predict",
+        files={"file": ("test.txt", b"not an image", "text/plain")}
+    )
+    assert res.status_code == 415
+def test_predict_rejects_corrupted_file():
+    res = client.post(
+        "/predict",
+        files={"file": ("bad.jpg", b"\xff\xd8corrupted", "image/jpeg")}
+    )
+    assert res.status_code == 400
+def test_predict_rejects_oversized_file():
+    huge = b"A" * (11 * 1024 * 1024)  # 11MB
+    res = client.post(
+        "/predict",
+        files={"file": ("big.jpg", huge, "image/jpeg")}
+    )
+    assert res.status_code == 413
+```
+---
+## Phase 5 — GitHub Actions CI/CD (`.github/workflows/ci-cd.yml`)
+```yaml
+name: CI/CD Pipeline
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Run Unit Tests
+        run: pytest tests/ -v --tb=short
+  deploy:
+    needs: test          # รัน deploy เฉพาะเมื่อ test ผ่านทุก case
+    runs-on: ubuntu-latest
+    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+    steps:
+      - uses: actions/checkout@v4
+      - name: Push to Hugging Face Spaces
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: |
+          git config --global user.email "ci@github.com"
+          git config --global user.name "GitHub Actions"
+          git remote add hf https://user:${HF_TOKEN}@huggingface.co/spaces/<YOUR_USERNAME>/<YOUR_SPACE_NAME>
+          git push hf main --force
+```
+> **การตั้งค่า Secret:**
+> ไปที่ GitHub Repo → Settings → Secrets → Actions → New secret
+> ชื่อ: `HF_TOKEN` | ค่า: Hugging Face Access Token (write permission)
+---
+## Phase 6 — Performance Testing (JMeter)
+### 6.1 JMeter Test Plan (.jmx) — Key Settings
+| Parameter | Local (Docker) | Cloud (HF Spaces) |
+|---|---|---|
+| Threads (Users) | 10, 50, 100 | 10, 25, 50 |
+| Ramp-Up (sec) | 10 | 20 |
+| Loop Count | 100 | 50 |
+| Endpoint | `http://localhost:7860/predict` | `https://<space>.hf.space/predict` |
+### 6.2 Metrics ที่ต้องรายงาน
+| Metric | คำอธิบาย | เป้าหมาย |
+|---|---|---|
+| **Throughput (TPS)** | Request ต่อวินาที | สูงที่สุด |
+| **P95 Latency** | 95th percentile response time | < 2000ms |
+| **Error Rate** | % ที่ได้รับ error | < 1% |
+| **Avg Latency** | ค่าเฉลี่ย response time | ต่ำที่สุด |
+### 6.3 การวิเคราะห์ผล
+```
+จุดที่ต้องวิเคราะห์:
+1. หา "Knee Point" — จุดที่ TPS หยุดเพิ่ม แต่ Latency เริ่มพุ่ง
+2. CPU Utilization ใน Docker stats ณ จำนวน concurrent users นั้น
+3. เปรียบเทียบ Local vs Cloud เพื่อดู overhead ของ Network/HF cold-start
+```
+---
+## Phase 7 — cURL Examples
+```bash
+# Health Check
+curl https://<USERNAME>-<SPACE>.hf.space/health
+# Predict (ส่งไฟล์รูปภาพจริง)
+curl -X POST "https://<USERNAME>-<SPACE>.hf.space/predict" \
+  -H "accept: application/json" \
+  -F "file=@/path/to/your/image.jpg"
+# Postman Collection — ดูไฟล์ postman_collection.json ใน repo
+```
+---
+## Checklist Deliverables
+- [ ] Project Report (PDF) — Model details, Optimization table, Error strategy, JMeter analysis, Architecture diagram
+- [ ] GitHub Repo — Source code + `.github/workflows/ci-cd.yml` + `README.md`
+- [ ] `resnet18_quantized.onnx` — โมเดลที่ optimize แล้ว
+- [ ] `tests/test_api.py` — pytest ครอบคลุม Happy path + Error cases
+- [ ] `Dockerfile` — Production-ready
+- [ ] JMeter Test Plan (`.jmx`)
+- [ ] Postman Collection (`.json`)
+- [ ] Hugging Face Space — Live API endpoint
+- [ ] Presentation Slides + Live Demo (9 พ.ค. 2569)
+---
+## Notes & Tips
+- **HF Spaces Free Tier** ใช้ CPU เท่าน���้น — ONNX Runtime บน CPU เหมาะสมที่สุด
+- **Cold Start** ใน HF Spaces อาจทำให้ request แรกช้า — ควรระบุในรายงาน
+- **ProcessPoolExecutor** ต้องระวัง: แต่ละ worker โหลด ONNX session แยกกัน (memory x workers)
+- **Pydantic v2** syntax เปลี่ยนจาก v1 — ใช้ `model_config` แทน `class Config`
+- ใน `pytest` ต้องมี `conftest.py` หรือ set `PYTHONPATH=.` ให้ถูกต้อง

app/main.py CHANGED Viewed

@@ -4,14 +4,17 @@ from concurrent.futures import ProcessPoolExecutor
 import asyncio
 from app.model import run_inference
 from app.schemas import PredictionResponse
 app = FastAPI(title="ResNet-18 Image Classifier", version="1.0.0")
 executor = ProcessPoolExecutor(max_workers=4)
 ALLOWED_CONTENT_TYPES = {"image/jpeg", "image/png", "image/webp", "image/gif"}
 @app.get("/", response_class=HTMLResponse)
 async def demo_ui():
     return """
     <!DOCTYPE html>
     <html>
@@ -78,6 +81,11 @@ async def demo_ui():
                     const response = await fetch('/predict', { method: 'POST', body: formData });
                     const data = await response.json();
                     document.getElementById('res-label').innerText = data.label;
                     document.getElementById('res-score').innerText = (data.score * 100).toFixed(2) + '%';
                     document.getElementById('res-time').innerText = data.inference_time_ms.toFixed(2) + ' ms';
@@ -101,9 +109,23 @@ async def health():
 @app.post("/predict", response_model=PredictionResponse)
 async def predict(file: UploadFile = File(...)):
     if file.content_type not in ALLOWED_CONTENT_TYPES:
         raise HTTPException(status_code=415, detail="Unsupported media type")
     image_bytes = await file.read()
     loop = asyncio.get_event_loop()
-    result = await loop.run_in_executor(executor, run_inference, image_bytes)
-    return result

 import asyncio
 from app.model import run_inference
 from app.schemas import PredictionResponse
+from PIL import UnidentifiedImageError
 app = FastAPI(title="ResNet-18 Image Classifier", version="1.0.0")
 executor = ProcessPoolExecutor(max_workers=4)
+MAX_FILE_SIZE = 10 * 1024 * 1024  # 10 MB
 ALLOWED_CONTENT_TYPES = {"image/jpeg", "image/png", "image/webp", "image/gif"}
 @app.get("/", response_class=HTMLResponse)
 async def demo_ui():
+    # ... (HTML UI code remains the same)
     return """
     <!DOCTYPE html>
     <html>
                     const response = await fetch('/predict', { method: 'POST', body: formData });
                     const data = await response.json();
+                    if (response.status !== 200) {
+                        alert(data.detail || 'Prediction failed');
+                        return;
+                    }
                     document.getElementById('res-label').innerText = data.label;
                     document.getElementById('res-score').innerText = (data.score * 100).toFixed(2) + '%';
                     document.getElementById('res-time').innerText = data.inference_time_ms.toFixed(2) + ' ms';
 @app.post("/predict", response_model=PredictionResponse)
 async def predict(file: UploadFile = File(...)):
+    # 1. ตรวจสอบ Content Type
     if file.content_type not in ALLOWED_CONTENT_TYPES:
         raise HTTPException(status_code=415, detail="Unsupported media type")
+    # 2. อ่านข้อมูล
     image_bytes = await file.read()
+    # 3. ตรวจสอบขนาดไฟล์ (Fix สำหรับ test_predict_rejects_oversized_file)
+    if len(image_bytes) > MAX_FILE_SIZE:
+        raise HTTPException(status_code=413, detail="File too large")
+    # 4. รัน Inference และดักจับ Error (Fix สำหรับ test_predict_rejects_corrupted_file)
     loop = asyncio.get_event_loop()
+    try:
+        result = await loop.run_in_executor(executor, run_inference, image_bytes)
+        return result
+    except UnidentifiedImageError:
+        raise HTTPException(status_code=400, detail="Invalid image file")
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Inference error: {str(e)}")

jmeter_test_plan.jmx ADDED Viewed

	@@ -0,0 +1,123 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.6.3">
+  <hashTree>
+    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="ResNet Image Classifier Load Test">
+      <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables">
+        <collectionProp name="Arguments.arguments"/>
+      </elementProp>
+    </TestPlan>
+    <hashTree>
+      <ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Concurrent Users">
+        <intProp name="ThreadGroup.num_threads">60</intProp>
+        <intProp name="ThreadGroup.ramp_time">10</intProp>
+        <longProp name="ThreadGroup.duration">60</longProp>
+        <boolProp name="ThreadGroup.same_user_on_next_iteration">true</boolProp>
+        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
+        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller">
+          <intProp name="LoopController.loops">-1</intProp>
+          <boolProp name="LoopController.continue_forever">false</boolProp>
+        </elementProp>
+      </ThreadGroup>
+      <hashTree>
+        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Predict Request">
+          <stringProp name="HTTPSampler.domain">127.0.0.1</stringProp>
+          <stringProp name="HTTPSampler.port">8000</stringProp>
+          <stringProp name="HTTPSampler.protocol">http</stringProp>
+          <stringProp name="HTTPSampler.path">/predict</stringProp>
+          <boolProp name="HTTPSampler.follow_redirects">true</boolProp>
+          <stringProp name="HTTPSampler.method">POST</stringProp>
+          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
+          <boolProp name="HTTPSampler.DO_MULTIPART_POST">true</boolProp>
+          <elementProp name="HTTPsampler.Files" elementType="HTTPFileArgs">
+            <collectionProp name="HTTPFileArgs.files">
+              <elementProp name="C:\Yanakorn\works\Assignments\AIE494\finalproject\test.jpg" elementType="HTTPFileArg">
+                <stringProp name="File.mimetype">image/jpeg</stringProp>
+                <stringProp name="File.path">C:\Yanakorn\works\Assignments\AIE494\finalproject\test.jpg</stringProp>
+                <stringProp name="File.paramname">file</stringProp>
+              </elementProp>
+            </collectionProp>
+          </elementProp>
+          <boolProp name="HTTPSampler.postBodyRaw">false</boolProp>
+          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables">
+            <collectionProp name="Arguments.arguments"/>
+          </elementProp>
+        </HTTPSamplerProxy>
+        <hashTree/>
+        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree">
+          <boolProp name="ResultCollector.error_logging">false</boolProp>
+          <objProp>
+            <name>saveConfig</name>
+            <value class="SampleSaveConfiguration">
+              <time>true</time>
+              <latency>true</latency>
+              <timestamp>true</timestamp>
+              <success>true</success>
+              <label>true</label>
+              <code>true</code>
+              <message>true</message>
+              <threadName>true</threadName>
+              <dataType>true</dataType>
+              <encoding>false</encoding>
+              <assertions>true</assertions>
+              <subresults>true</subresults>
+              <responseData>false</responseData>
+              <samplerData>false</samplerData>
+              <xml>false</xml>
+              <fieldNames>true</fieldNames>
+              <responseHeaders>false</responseHeaders>
+              <requestHeaders>false</requestHeaders>
+              <responseDataOnError>false</responseDataOnError>
+              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
+              <assertionsResultsToSave>0</assertionsResultsToSave>
+              <bytes>true</bytes>
+              <sentBytes>true</sentBytes>
+              <url>true</url>
+              <threadCounts>true</threadCounts>
+              <idleTime>true</idleTime>
+              <connectTime>true</connectTime>
+            </value>
+          </objProp>
+          <stringProp name="filename"></stringProp>
+        </ResultCollector>
+        <hashTree/>
+      </hashTree>
+      <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report">
+        <boolProp name="ResultCollector.error_logging">false</boolProp>
+        <objProp>
+          <name>saveConfig</name>
+          <value class="SampleSaveConfiguration">
+            <time>true</time>
+            <latency>true</latency>
+            <timestamp>true</timestamp>
+            <success>true</success>
+            <label>true</label>
+            <code>true</code>
+            <message>true</message>
+            <threadName>true</threadName>
+            <dataType>true</dataType>
+            <encoding>false</encoding>
+            <assertions>true</assertions>
+            <subresults>true</subresults>
+            <responseData>false</responseData>
+            <samplerData>false</samplerData>
+            <xml>false</xml>
+            <fieldNames>true</fieldNames>
+            <responseHeaders>false</responseHeaders>
+            <requestHeaders>false</requestHeaders>
+            <responseDataOnError>false</responseDataOnError>
+            <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
+            <assertionsResultsToSave>0</assertionsResultsToSave>
+            <bytes>true</bytes>
+            <sentBytes>true</sentBytes>
+            <url>true</url>
+            <threadCounts>true</threadCounts>
+            <idleTime>true</idleTime>
+            <connectTime>true</connectTime>
+          </value>
+        </objProp>
+        <stringProp name="filename"></stringProp>
+      </ResultCollector>
+      <hashTree/>
+    </hashTree>
+  </hashTree>
+</jmeterTestPlan>

postman_collection.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+	"info": {
+		"_postman_id": "8923a12b-7c45-4b2e-9d2a-8c9d9e9f9a9b",
+		"name": "ResNet-18 Image Classifier",
+		"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
+	},
+	"item": [
+		{
+			"name": "Health Check",
+			"request": {
+				"method": "GET",
+				"header": [],
+				"url": {
+					"raw": "{{baseUrl}}/health",
+					"host": [
+						"{{baseUrl}}"
+					],
+					"path": [
+						"health"
+					]
+				}
+			},
+			"response": []
+		},
+		{
+			"name": "Predict Image",
+			"request": {
+				"method": "POST",
+				"header": [
+					{
+						"key": "accept",
+						"value": "application/json"
+					}
+				],
+				"body": {
+					"mode": "formdata",
+					"formdata": [
+						{
+							"key": "file",
+							"type": "file",
+							"src": ""
+						}
+					]
+				},
+				"url": {
+					"raw": "{{baseUrl}}/predict",
+					"host": [
+						"{{baseUrl}}"
+					],
+					"path": [
+						"predict"
+					]
+				}
+			},
+			"response": []
+		}
+	],
+	"variable": [
+		{
+			"key": "baseUrl",
+			"value": "http://localhost:8000",
+			"type": "string"
+		}
+	]
+}

scripts/01_baseline_test.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from transformers import AutoImageProcessor, ResNetForImageClassification
+import torch
+import time
+import os
+from PIL import Image
+model_id = "microsoft/resnet-18"
+processor = AutoImageProcessor.from_pretrained(model_id)
+model = ResNetForImageClassification.from_pretrained(model_id)
+model.eval()
+# Create test image if not exists
+if not os.path.exists("test.jpg"):
+    img = Image.new("RGB", (224, 224), color=(128, 64, 32))
+    img.save("test.jpg")
+# Measure Baseline Latency (100 runs)
+img = Image.open("test.jpg").convert("RGB")
+inputs = processor(images=img, return_tensors="pt")
+times = []
+with torch.no_grad():
+    for _ in range(100):
+        t0 = time.perf_counter()
+        _ = model(**inputs)
+        times.append(time.perf_counter() - t0)
+print(f"Baseline Latency (avg): {sum(times)/len(times)*1000:.2f} ms")
+print(f"P95 Latency: {sorted(times)[94]*1000:.2f} ms")
+# Save model for size measurement
+model.save_pretrained("./pytorch_model")
+model_size = sum(os.path.getsize(os.path.join("./pytorch_model", f))
+                 for f in os.listdir("./pytorch_model") if f.endswith(".bin"))
+print(f"Model Size: {model_size/1e6:.2f} MB")

scripts/02_export_onnx.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import torch
+from transformers import AutoImageProcessor, ResNetForImageClassification
+import os
+model_id = "microsoft/resnet-18"
+processor = AutoImageProcessor.from_pretrained(model_id)
+model = ResNetForImageClassification.from_pretrained(model_id).eval()
+# Ensure models directory exists
+os.makedirs("models", exist_ok=True)
+# Use dummy input for tracing
+dummy = torch.randn(1, 3, 224, 224)
+# Export using the legacy approach (Tracing) which is more stable for quantization tools
+print("Exporting model to ONNX using legacy tracing...")
+torch.onnx.export(
+    model,
+    dummy,
+    "models/resnet18.onnx",
+    export_params=True,
+    opset_version=18,  # Use Opset 11 for better compatibility with quantization
+    do_constant_folding=True,
+    input_names=["pixel_values"],
+    output_names=["logits"],
+    dynamic_axes={"pixel_values": {0: "batch_size"}, "logits": {0: "batch_size"}},
+)
+print("ONNX exported successfully to models/resnet18.onnx")

scripts/03_quantize.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import onnx
+import onnx.shape_inference
+from onnxruntime.quantization import quantize_dynamic, QuantType
+import os
+# --- Monkey Patch onnx.shape_inference to bypass strict checks ---
+original_infer_shapes_path = onnx.shape_inference.infer_shapes_path
+def patched_infer_shapes_path(model_path, output_path=None, check_type=False, strict_mode=False, data_prop=False):
+    try:
+        # Run in non-strict mode
+        return original_infer_shapes_path(model_path, output_path, check_type, False, data_prop)
+    except Exception:
+        if output_path:
+            import shutil
+            shutil.copy(model_path, output_path)
+onnx.shape_inference.infer_shapes_path = patched_infer_shapes_path
+# --------------------------------------------------------------------------
+model_path = "models/resnet18.onnx"
+quantized_path = "models/resnet18_quantized.onnx"
+print(f"Quantizing model: {model_path}...")
+try:
+    quantize_dynamic(
+        model_input=model_path,
+        model_output=quantized_path,
+        weight_type=QuantType.QUInt8,
+        extra_options={
+            'EnableShapeInference': False,
+            'DefaultTensorType': onnx.TensorProto.FLOAT  # <--- เพิ่มตัวนี้เพื่อแก้ Error ล่าสุด
+        }
+    )
+except Exception as e:
+    print(f"Quantization failed: {e}")
+if os.path.exists(quantized_path):
+    print(f"Success: {quantized_path} created. Size: {os.path.getsize(quantized_path)/1e6:.2f} MB")
+else:
+    # Try one more time with a very minimal set of options
+    print("Trying one last alternative...")
+    quantize_dynamic(
+        model_input=model_path,
+        model_output=quantized_path,
+        weight_type=QuantType.QUInt8,
+        # Minimal options
+    )
+if os.path.exists(quantized_path):
+    print(f"Success on second attempt: {quantized_path}")

scripts/04_benchmark_onnx.py ADDED Viewed

	@@ -0,0 +1,42 @@

+import onnxruntime as ort
+import numpy as np
+import time
+import os
+from PIL import Image
+from transformers import AutoImageProcessor
+# Load extractor
+processor = AutoImageProcessor.from_pretrained("microsoft/resnet-18")
+# Test both models
+models = {
+    "ONNX": "models/resnet18.onnx",
+    "ONNX Quantized": "models/resnet18_quantized.onnx"
+}
+# Create test image if not exists
+if not os.path.exists("test.jpg"):
+    img = Image.new("RGB", (224, 224), color=(128, 64, 32))
+    img.save("test.jpg")
+img = Image.open("test.jpg").convert("RGB")
+inputs = processor(images=img, return_tensors="np")
+pixel_values = inputs["pixel_values"].astype(np.float32)
+for name, model_path in models.items():
+    if not os.path.exists(model_path):
+        print(f"Skipping {name}: {model_path} not found")
+        continue
+    session = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
+    times = []
+    for _ in range(100):
+        t0 = time.perf_counter()
+        _ = session.run(["logits"], {"pixel_values": pixel_values})
+        times.append(time.perf_counter() - t0)
+    print(f"\n{name}:")
+    print(f"  Avg Latency: {sum(times)/len(times)*1000:.2f} ms")
+    print(f"  P95 Latency: {sorted(times)[94]*1000:.2f} ms")
+    print(f"  File Size: {os.path.getsize(model_path)/1e6:.2f} MB")

sym_shape_infer_temp.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:736f6cb91a0ae100eaeb13aa7842f5e718ab67b101e2c44115f9d9fbf87e80b3
+size 179747

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import sys
+import os
+# Add the project root to PYTHONPATH
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import pytest
+from fastapi.testclient import TestClient
+from app.main import app
+from PIL import Image
+import io
+client = TestClient(app)
+# --- Helper ---
+def get_test_image() -> bytes:
+    """Create a dummy test image"""
+    img = Image.new("RGB", (224, 224), color=(128, 64, 32))
+    buf = io.BytesIO()
+    img.save(buf, format="JPEG")
+    return buf.getvalue()
+# --- Tests ---
+def test_health_endpoint():
+    res = client.get("/health")
+    assert res.status_code == 200
+    assert res.json() == {"status": "ok"}
+def test_predict_returns_valid_json():
+    img_bytes = get_test_image()
+    res = client.post(
+        "/predict",
+        files={"file": ("test.jpg", img_bytes, "image/jpeg")}
+    )
+    assert res.status_code == 200
+    data = res.json()
+    assert "label" in data
+    assert "score" in data
+    assert isinstance(data["score"], float)
+    assert 0.0 <= data["score"] <= 1.0
+def test_predict_rejects_non_image():
+    res = client.post(
+        "/predict",
+        files={"file": ("test.txt", b"not an image", "text/plain")}
+    )
+    assert res.status_code == 415
+def test_predict_rejects_corrupted_file():
+    res = client.post(
+        "/predict",
+        files={"file": ("bad.jpg", b"\xff\xd8corrupted", "image/jpeg")}
+    )
+    assert res.status_code == 400
+def test_predict_rejects_oversized_file():
+    huge = b"A" * (11 * 1024 * 1024)  # 11MB
+    res = client.post(
+        "/predict",
+        files={"file": ("big.jpg", huge, "image/jpeg")}
+    )
+    assert res.status_code == 413