Image Classification
Keras
PyTorch
dermatology
medical-imaging
multiple-instance-learning
tensorflow
tipsv2
binary-classification
infectious-screening
Instructions to use HawkFranklin-Research/PelliScope with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use HawkFranklin-Research/PelliScope with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://HawkFranklin-Research/PelliScope") - Notebooks
- Google Colab
- Kaggle
| library_name: keras | |
| base_model: google/tipsv2-l14 | |
| pipeline_tag: image-classification | |
| gated: true | |
| extra_gated_prompt: "Request access for research or deployment evaluation. Please share a short justification for why you need the DermoLens model." | |
| extra_gated_fields: | |
| Affiliation: text | |
| Intended use: text | |
| Research use only: checkbox | |
| tags: | |
| - dermatology | |
| - medical-imaging | |
| - multiple-instance-learning | |
| - tensorflow | |
| - pytorch | |
| - tipsv2 | |
| - binary-classification | |
| - infectious-screening | |
| license: other | |
| # DermoLens TIPSv2 + MIL Infectious Screening Deployment Package | |
| This folder packages the latest Training-C production candidate for Hugging Face or container-based deployment. | |
| The realistic deployment model is a **two-tier pipeline**: | |
| 1. Raw dermatology images are passed through `google/tipsv2-l14`. | |
| 2. The resulting TIPSv2 CLS embeddings are passed into the DermoLens MIL classifier. | |
| 3. The MIL probability is converted to a final class using the production threshold. | |
| ## Caution | |
| This package is TIPSv2-only. Do not use Derm Foundation embeddings, Derm Foundation `.npz` archives, or 6144-d feature files. | |
| ## Access Requests | |
| This repository is intended to be published as a gated Hugging Face model card. | |
| By default, Hugging Face already collects the requester email and username for gated models. The extra fields above add: | |
| - a short free-text justification | |
| - intended use | |
| - a research-only acknowledgment checkbox | |
| If the repository remains private, the request form will not be visible. To use the request workflow, the model should be published as a public gated repo. | |
| ## What Is Included | |
| ```text | |
| deploy-hf/ | |
| README.md | |
| README.production-bundle.md | |
| Dockerfile | |
| requirements.txt | |
| deployment_config.json | |
| model/ | |
| binary_tipsv2_screening_model.keras | |
| metadata/ | |
| thresholds.json | |
| production_config.json | |
| production_validation_metrics.json | |
| production_training_history.csv | |
| production_validation_predictions.npz | |
| revised_binary_label_summary.json | |
| best_hyperparameters.json | |
| figures/ | |
| production_learning_curves.png | |
| src/ | |
| inference.py | |
| tipsv2_common_training_reference.py | |
| scripts/ | |
| download_tipsv2.py | |
| tipsv2-local-reference/ | |
| configuration and remote-code reference files from the local TIPSv2 checkout | |
| ``` | |
| ## Model Formats | |
| There are two model components: | |
| | Component | Model | Framework / Format | Role | | |
| | :--- | :--- | :--- | :--- | | |
| | Feature extractor | `google/tipsv2-l14` | PyTorch via Hugging Face Transformers remote code, `safetensors` weights | Converts raw images to 1024-d CLS embeddings | | |
| | MIL classifier | `binary_tipsv2_screening_model.keras` | TensorFlow / Keras `.keras` | Converts a `(3, 1024)` casebag to `P(Infectious)` | | |
| The current system is therefore mixed-framework: **PyTorch for TIPSv2** and **TensorFlow/Keras for the MIL head**. | |
| ## Comprehensive Benchmarks | |
| The full benchmark ledger is also copied into this package as [`FINAL_BENCHMARKS.md`](FINAL_BENCHMARKS.md). The same content is mirrored below so the Hugging Face model card is self-contained. | |
| ### Binary Screening | |
| *Evaluation performed on the full 3,061 case validation set.* | |
| *Operating Threshold: `P(Infectious) >= 0.35`* | |
| | Model Architecture / Format | Model Size | AUC | Accuracy | Precision | Recall (Sensitivity) | F1 Score | Notes | | |
| | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | |
| | **Original Keras (Training-C)** | 1.15 GB+ | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | The original fragmented FP32 pipeline. | | |
| | **PyTorch Unified (FP32)** | 1,865 MB | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | The final production monolith. Mathematically identical to Keras. | | |
| | **PyTorch Unified (FP16)** | 932 MB | 0.755789 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | Halves RAM usage with essentially no accuracy loss. | | |
| | **LiteRT Edge (FP32)** | 1,163 MB | 0.755784 | 0.661875 | 0.544653 | 0.780960 | 0.641745 | Mathematically identical to PyTorch FP32. | | |
| | **LiteRT Edge (INT8 PTQ)** | 297 MB | 0.736973 | 0.669716 | 0.561798 | 0.673968 | 0.612792 | The quantization tradeoff. Lower sensitivity. | | |
| ### 10-Disease Classification | |
| *Evaluation performed on the preliminary 2,336 case dataset.* | |
| *Representative class-level agreement is shown below; equivalence holds across the 10 classes.* | |
| #### Class 0 (Eczema) - Threshold: 0.4747 | |
| | Model Architecture / Format | AUC | Accuracy | Precision | Recall | F1 Score | | |
| | :--- | :--- | :--- | :--- | :--- | :--- | | |
| | **Original Keras (Training-A)** | 0.739529 | 0.656678 | 0.598756 | 0.729167 | 0.657558 | | |
| | **PyTorch Unified (FP32)** | 0.739529 | 0.656678 | 0.598756 | 0.729167 | 0.657558 | | |
| | **PyTorch Unified (FP16)** | 0.739538 | 0.656678 | 0.598756 | 0.729167 | 0.657558 | | |
| #### Class 1 (Allergic Contact Dermatitis) - Threshold: 0.3838 | |
| | Model Architecture / Format | AUC | Accuracy | Precision | Recall | F1 Score | | |
| | :--- | :--- | :--- | :--- | :--- | :--- | | |
| | **Original Keras (Training-A)** | 0.739767 | 0.684932 | 0.572334 | 0.620848 | 0.595604 | | |
| | **PyTorch Unified (FP32)** | 0.739767 | 0.684932 | 0.572334 | 0.620848 | 0.595604 | | |
| | **PyTorch Unified (FP16)** | 0.739774 | 0.685360 | 0.572785 | 0.621993 | 0.596376 | | |
| ## Technical Conclusions | |
| 1. **Mathematical Equivalence:** The manual port of the complex gated attention pooling and global average pooling layers from Keras to PyTorch is numerically aligned for the supported benchmark runs. | |
| 2. **The Power of FP16:** Converting the PyTorch unified engine to FP16 reduces the Docker container memory footprint while preserving the clinical ROC-AUC and sensitivity from the FP32 runs. | |
| 3. **LiteRT Limitations:** The LiteRT FP32 export is mathematically sound, but FP16 conversion for large vision transformers can fail in the Google AI Edge toolchain. INT8 PTQ succeeds but reduces clinical sensitivity. | |
| **Final Deployment Target:** `unified_engine_fp16_weights.pt` running on CPU via FastAPI. | |
| ## Production Decision Rule | |
| The MIL model outputs one probability: | |
| ```text | |
| P(Infectious) | |
| ``` | |
| The production threshold is: | |
| ```text | |
| 0.35 | |
| ``` | |
| Final classification: | |
| ```python | |
| if p_infectious >= 0.35: | |
| prediction = "Infectious" | |
| else: | |
| prediction = "Non Infectious" | |
| ``` | |
| Do not silently use `0.5` for production inference. | |
| ## Input Contract | |
| Input to the full deployment pipeline: | |
| ```text | |
| 1 to 3 RGB images from the same patient case | |
| ``` | |
| Input to the MIL classifier after TIPSv2: | |
| ```text | |
| casebag.shape == (3, 1024) | |
| ``` | |
| Rules: | |
| - Each submitted image is converted to RGB. | |
| - Each image is resized to `448 x 448`, matching the Training-C extraction process. | |
| - Each image is passed through `google/tipsv2-l14` using `model.encode_image(pixel_values)`. | |
| - Each row is the final-layer TIPSv2 CLS token: `out.cls_token[0, 0]`. | |
| - Each CLS embedding must be 1024-d. | |
| - Cases with fewer than 3 images are automatically zero-padded to 3 MIL slots. | |
| - Do not mix images from different patient cases. | |
| - Do not flatten this into image-level classification unless explicitly doing a different experiment. | |
| ## Exact Casebag Behavior | |
| The MIL model always receives exactly 3 slots: | |
| ```text | |
| (3, 1024) | |
| ``` | |
| If 1 image is submitted: | |
| ```text | |
| slot 1 = TIPSv2(image_1) | |
| slot 2 = zeros(1024) | |
| slot 3 = zeros(1024) | |
| ``` | |
| If 2 images are submitted: | |
| ```text | |
| slot 1 = TIPSv2(image_1) | |
| slot 2 = TIPSv2(image_2) | |
| slot 3 = zeros(1024) | |
| ``` | |
| If 3 images are submitted: | |
| ```text | |
| slot 1 = TIPSv2(image_1) | |
| slot 2 = TIPSv2(image_2) | |
| slot 3 = TIPSv2(image_3) | |
| ``` | |
| The padding is handled automatically by `src/inference.py`. The submitted image order is preserved. If a case has more than 3 images, do not pass all images blindly; select/split intentionally because this model was trained with at most 3 images per case. | |
| ## Encoding Contract From Training-C | |
| The deployment encoder must match Training-C: | |
| ```python | |
| image = Image.open(image_path).convert("RGB") | |
| pixel_values = Resize((448, 448))(image) | |
| pixel_values = ToTensor()(pixel_values).unsqueeze(0) | |
| out = tipsv2_model.encode_image(pixel_values) | |
| embedding = out.cls_token[0, 0].float().cpu().numpy() | |
| ``` | |
| This is the same logic used during `APR26/data_extraction/extract_all_cases_tipsv2.py`. | |
| Do not use: | |
| - patch-token averages, | |
| - register tokens, | |
| - normalized text/image similarity vectors, | |
| - Derm Foundation embeddings, | |
| - image-level logits from another model. | |
| ## Production Metrics | |
| Training-C production validation metrics at threshold `0.35`: | |
| | Metric | Value | | |
| | :--- | ---: | | |
| | ROC AUC | 0.7194 | | |
| | PR-AUC | 0.5868 | | |
| | Sensitivity / Recall | 0.7697 | | |
| | Specificity | 0.5851 | | |
| | Accuracy | 0.6565 | | |
| | Precision | 0.5394 | | |
| | F1 | 0.6343 | | |
| | Youden J | 0.3548 | | |
| Dataset state: | |
| | Item | Value | | |
| | :--- | ---: | | |
| | Cases | 3,061 | | |
| | Images / embeddings | 6,517 | | |
| | Infectious cases | 1,187 | | |
| | Non-infectious cases | 1,874 | | |
| | Feature dimension | 1024 | | |
| ## Running Inference Locally | |
| From this folder: | |
| ```bash | |
| pip install -r requirements.txt | |
| python src/inference.py case_image_1.png case_image_2.png | |
| ``` | |
| The script accepts 1 to 3 images from the same patient case. | |
| If TIPSv2 is already cached locally: | |
| ```bash | |
| python src/inference.py case_image_1.png --local-files-only | |
| ``` | |
| If using a vendored/local TIPSv2 folder: | |
| ```bash | |
| python src/inference.py case_image_1.png --tipsv2-model /path/to/google/tipsv2-l14/snapshot | |
| ``` | |
| Output example: | |
| ```json | |
| { | |
| "prediction": "Infectious", | |
| "p_infectious": 0.47, | |
| "threshold": 0.35, | |
| "image_count": 2, | |
| "rule": "Infectious if P(Infectious) >= threshold else Non Infectious" | |
| } | |
| ``` | |
| ## Docker Usage | |
| Build: | |
| ```bash | |
| docker build -t dermolens-tipsv2-mil . | |
| ``` | |
| Run: | |
| ```bash | |
| docker run --rm -v "$PWD/examples:/data" dermolens-tipsv2-mil /data/case_image_1.png /data/case_image_2.png | |
| ``` | |
| The default Dockerfile does not bake the 1.8 GB TIPSv2 weights into the image. This keeps the image smaller and lets the runtime download or mount the Hugging Face cache. | |
| For a self-contained container, uncomment this line in the Dockerfile: | |
| ```dockerfile | |
| # RUN python scripts/download_tipsv2.py | |
| ``` | |
| That will pre-cache `google/tipsv2-l14` inside the image. | |
| ## Hugging Face Push Strategy | |
| Recommended setup: | |
| 1. Push this `deploy-hf/` folder as the DermoLens model repository. | |
| 2. Reference `google/tipsv2-l14` as the upstream feature extractor instead of duplicating the full TIPSv2 weights. | |
| 3. Include `src/inference.py` as the canonical end-to-end raw-image inference code. | |
| 4. Put the raw image dataset in a separate Hugging Face dataset repository. | |
| 5. Keep case-level metadata in the dataset repository so MIL grouping is preserved. | |
| This is better than copying TIPSv2 weights into our repo because: | |
| - TIPSv2 is already a Hugging Face model with its own versioning. | |
| - The real weights are about 1.8 GB. | |
| - Duplicating them creates storage, sync, and licensing ambiguity. | |
| - The deployment container can still be fully self-contained by pre-caching TIPSv2 at Docker build time. | |
| ## Should We Convert Models? | |
| Current recommendation: **do not convert yet**. | |
| Reason: | |
| - TIPSv2 uses PyTorch / Hugging Face remote code. | |
| - The MIL head is small and already saved as TensorFlow/Keras. | |
| - Mixed-framework inference is acceptable inside Docker. | |
| - Conversion adds risk unless we have a specific deployment target that requires ONNX/TFLite/TensorRT. | |
| Future conversion options: | |
| | Option | When useful | Risk | | |
| | :--- | :--- | :--- | | |
| | Convert MIL Keras model to ONNX | If we want one ONNX runtime for the MIL head | Low to moderate | | |
| | Convert TIPSv2 to ONNX | If deploying to a strict ONNX/TensorRT environment | Higher, because custom remote code and image encoder outputs must be validated | | |
| | Retrain/rebuild MIL head in PyTorch | If we want a single PyTorch-only pipeline | Moderate, requires reproducing Training-C weights or retraining | | |
| | Keep mixed PyTorch + TensorFlow | Best current path for Hugging Face/Cloud Run/GCE | Larger dependency footprint | | |
| For Hugging Face, GCloud, Firebase-backed services, or generic Docker deployment, the current mixed-framework package is the pragmatic choice. | |
| ## Deployment Interpretation | |
| This is a research production candidate, not a standalone clinical diagnostic device. It is suitable for controlled research inference, screening-threshold experiments, and deployment engineering validation. | |