Spaces:

Perth0603
/

Random-Forest-Model-for-PhishingDetection

Sleeping

App Files Files Community

Perth0603 commited on Oct 1, 2025

Commit

d384c72

verified ·

1 Parent(s): e23e668

Upload 4 files

Browse files

Files changed (2) hide show

README.md +44 -41
app.py +25 -12

README.md CHANGED Viewed

@@ -1,41 +1,44 @@
----
-title: PhishWatch Proxy
-emoji: 🛡️
-sdk: docker
----
-# Hugging Face Space - Phishing Text Classifier (Docker + FastAPI)
-This Space exposes two endpoints so the Flutter app can call them reliably:
-- `/predict` for text/email/SMS classification via Transformers
-- `/predict-url` for URL classification via your scikit-learn Random Forest model
-## Files
-- Dockerfile - builds a small FastAPI server image
-- app.py - FastAPI app that loads the model and returns `{ label, score }`.
-- requirements.txt - Python dependencies.
-## How to deploy
-1. Create a new Space on Hugging Face (type: Docker).
-2. Upload the contents of this `hf_space/` folder to the Space root (including Dockerfile).
-3. In Space Settings → Variables, add:
-   - MODEL_ID = Perth0603/phishing-email-mobilebert
-   - URL_REPO = Perth0603/Random-Forest-Model-for-PhishingDetection
-   - URL_FILENAME = url_rf_model.joblib  (set to your artifact filename)
-4. Wait for the Space to build and become green. Test:
-   - GET `/` should return `{ status: ok, model: ... }`
-   - POST `/predict` with `{ "inputs": "Win an iPhone! Click here" }`
-   - POST `/predict-url` with `{ "url": "https://example.com/login" }`
-## Flutter app config
-Set the Space URL in your env file so the app targets the Space instead of the Hosted Inference API:
-```
-{"HF_SPACE_URL":"https://<your-space>.hf.space"}
-```
-Run the app:
-```
-flutter run --dart-define-from-file=hf.env.json
-```

+---
+title: PhishWatch Proxy
+emoji: 🛡️
+sdk: docker
+---
+# Hugging Face Space - Phishing Text Classifier (Docker + FastAPI)
+This Space exposes two endpoints so the Flutter app can call them reliably:
+- `/predict` for text/email/SMS classification via Transformers. Returns `{ label, score }` where `score` is the confidence for the predicted label.
+- `/predict-url` for URL classification via your URL model. Returns `{ label, score, phishing_probability, backend, threshold }` where:
+  - `phishing_probability` is always the raw probability of phishing (0..1)
+  - `label` is `PHISH` when `phishing_probability >= threshold`, else `LEGIT`
+  - `score` is the confidence for the predicted label (for `LEGIT`, `score = 1 - phishing_probability`), which lets the app show "Safe Confidence" for legitimate URLs
+## Files
+- Dockerfile - builds a small FastAPI server image
+ - app.py - FastAPI app that loads the model and returns normalized responses as above.
+- requirements.txt - Python dependencies.
+## How to deploy
+1. Create a new Space on Hugging Face (type: Docker).
+2. Upload the contents of this `hf_space/` folder to the Space root (including Dockerfile).
+3. In Space Settings → Variables, add:
+   - MODEL_ID = Perth0603/phishing-email-mobilebert
+   - URL_REPO = Perth0603/Random-Forest-Model-for-PhishingDetection
+   - URL_FILENAME = url_rf_model.joblib  (set to your artifact filename)
+4. Wait for the Space to build and become green. Test:
+   - GET `/` should return `{ status: ok, model: ... }`
+   - POST `/predict` with `{ "inputs": "Win an iPhone! Click here" }`
+   - POST `/predict-url` with `{ "url": "https://example.com/login" }`
+## Flutter app config
+Set the Space URL in your env file so the app targets the Space instead of the Hosted Inference API:
+```
+{"HF_SPACE_URL":"https://<your-space>.hf.space"}
+```
+Run the app:
+```
+flutter run --dart-define-from-file=hf.env.json
+```

app.py CHANGED Viewed

@@ -138,40 +138,53 @@ def predict_url(payload: PredictUrlPayload):
         row = pd.DataFrame({url_col: [payload.url]})
         feats = _engineer_features(row, url_col, feature_cols)
-        score = None
         label = None
         if isinstance(model_type, str) and model_type == 'xgboost_bst':
             if xgb is None:
                 raise RuntimeError("xgboost is not installed but required for this model bundle.")
             dmat = xgb.DMatrix(feats)
-            proba = float(model.predict(dmat)[0])
-            score = proba
-            label = "PHISH" if score >= 0.5 else "LEGIT"
         elif hasattr(model, "predict_proba"):
             proba = model.predict_proba(feats)[0]
             if len(proba) == 2:
-                score = float(proba[1])
-                label = "PHISH" if score >= 0.5 else "LEGIT"
             else:
                 max_idx = int(np.argmax(proba))
-                score = float(proba[max_idx])
                 label = "PHISH" if max_idx == 1 else "LEGIT"
         else:
             pred = model.predict(feats)[0]
             if isinstance(pred, (int, float, np.integer, np.floating)):
                 label = "PHISH" if int(pred) == 1 else "LEGIT"
-                score = 1.0 if label == "PHISH" else 0.0
             else:
                 up = str(pred).strip().upper()
                 if up in ("PHISH", "PHISHING", "MALICIOUS"):
-                    label, score = "PHISH", 1.0
                 else:
-                    label, score = "LEGIT", 0.0
     except Exception as e:
         return JSONResponse(status_code=500, content={"error": str(e)})
-    return {"label": label, "score": float(score)}

         row = pd.DataFrame({url_col: [payload.url]})
         feats = _engineer_features(row, url_col, feature_cols)
+        # We standardize on producing a phishing probability first, then
+        # derive label and a user-facing confidence for the predicted label.
+        phish_proba: float | None = None
         label = None
         if isinstance(model_type, str) and model_type == 'xgboost_bst':
             if xgb is None:
                 raise RuntimeError("xgboost is not installed but required for this model bundle.")
             dmat = xgb.DMatrix(feats)
+            phish_proba = float(model.predict(dmat)[0])
+            label = "PHISH" if phish_proba >= 0.5 else "LEGIT"
         elif hasattr(model, "predict_proba"):
             proba = model.predict_proba(feats)[0]
             if len(proba) == 2:
+                phish_proba = float(proba[1])
+                label = "PHISH" if phish_proba >= 0.5 else "LEGIT"
             else:
                 max_idx = int(np.argmax(proba))
+                # Best-effort: treat index 1 as PHISH if present
+                phish_proba = float(proba[1]) if len(proba) > 1 else float(proba[max_idx])
                 label = "PHISH" if max_idx == 1 else "LEGIT"
         else:
             pred = model.predict(feats)[0]
             if isinstance(pred, (int, float, np.integer, np.floating)):
                 label = "PHISH" if int(pred) == 1 else "LEGIT"
+                phish_proba = 1.0 if label == "PHISH" else 0.0
             else:
                 up = str(pred).strip().upper()
                 if up in ("PHISH", "PHISHING", "MALICIOUS"):
+                    label, phish_proba = "PHISH", 1.0
                 else:
+                    label, phish_proba = "LEGIT", 0.0
     except Exception as e:
         return JSONResponse(status_code=500, content={"error": str(e)})
+    # Ensure we have a probability value
+    phish_proba = float(phish_proba or 0.0)
+    # Display score should be the confidence of the predicted label, to match the
+    # text endpoint and the app UI which expects Safe Confidence for LEGIT.
+    display_score = phish_proba if label == "PHISH" else (1.0 - phish_proba)
+    return {
+        "label": label,
+        "score": float(display_score),
+        "phishing_probability": phish_proba,
+        "backend": str(model_type),
+        "threshold": 0.5,
+    }