| --- |
| language: en |
| license: mit |
| tags: |
| - privacy |
| - web-tracking |
| - tracker-detection |
| - tabular-classification |
| - browser-fingerprinting |
| - safetensors |
| - wasm |
| datasets: |
| - olafuraron/tracker-radar-ml |
| metrics: |
| - f1 |
| - roc_auc |
| - precision |
| - recall |
| --- |
| |
| # Tracker Classifier |
|
|
| A lightweight feedforward neural network for classifying third-party web |
| domains as tracking or non-tracking, designed for on-device inference via |
| WebAssembly. |
|
|
| ## Live Preview |
|
|
| [Live preview](https://olafurjohannsson.github.io/tracker-ml/) |
|
|
| ## Model Description |
|
|
| - **Architecture**: Feedforward NN (input -> 128 -> 64 -> 2) with ReLU and dropout |
| - **Size**: 181 KB (safetensors) |
| - **Input**: 295 behavioral and metadata features from DuckDuckGo Tracker Radar |
| - **Output**: Binary classification (0 = non-tracking, 1 = tracking) |
| - **Training data**: 12,932 domains (80% of labeled set) |
| - **Deployment target**: Kjarni inference engine compiled to WASM with SIMD128 |
|
|
| ## Performance (5-fold CV) |
|
|
| | Model | F1 | Precision | Recall | ROC-AUC | |
| |-------|-----|-----------|--------|---------| |
| | **This model (Feedforward NN)** | 0.848 +/- 0.017 | 0.804 +/- 0.037 | 0.899 +/- 0.006 | 0.928 +/- 0.008 | |
| | Random Forest | 0.895 +/- 0.003 | 0.895 +/- 0.006 | 0.895 +/- 0.006 | 0.958 +/- 0.002 | |
| | XGBoost | 0.893 +/- 0.004 | 0.887 +/- 0.006 | 0.899 +/- 0.004 | 0.959 +/- 0.002 | |
| | FP Heuristic (score >= 2)* | 0.355 | 0.579 | 0.257 | n/a | |
|
|
| *The fingerprinting heuristic targets browser API fingerprinting specifically, |
| not general tracking. The comparison demonstrates the gap between single-vector |
| and multi-vector detection.* |
|
|
| ## Files |
|
|
| - `tracker_classifier.safetensors`: Model weights (181 KB) |
| - `config.json`: Architecture config, feature names, scaler parameters |
| - `scaler.joblib`: Sklearn StandardScaler for feature normalization |
| - `results.json`: Full evaluation metrics |
|
|
| ## Usage |
| ```python |
| import torch |
| import json |
| import numpy as np |
| from safetensors.torch import load_file |
| |
| weights = load_file("tracker_classifier.safetensors") |
| config = json.load(open("config.json")) |
| |
| class TrackerClassifier(torch.nn.Module): |
| def __init__(self, input_dim, hidden_dim=128): |
| super().__init__() |
| self.layer1 = torch.nn.Linear(input_dim, hidden_dim) |
| self.layer2 = torch.nn.Linear(hidden_dim, hidden_dim // 2) |
| self.layer3 = torch.nn.Linear(hidden_dim // 2, 2) |
| self.relu = torch.nn.ReLU() |
| |
| def forward(self, x): |
| x = self.relu(self.layer1(x)) |
| x = self.relu(self.layer2(x)) |
| return self.layer3(x) |
| |
| model = TrackerClassifier(input_dim=config["input_dim"]) |
| model.load_state_dict(weights) |
| model.eval() |
| |
| # Classify (standardize features first) |
| features = np.array([...]) # 295 features |
| mean = np.array(config["scaler_mean"]) |
| scale = np.array(config["scaler_scale"]) |
| features_scaled = (features - mean) / scale |
| |
| with torch.no_grad(): |
| logits = model(torch.FloatTensor(features_scaled).unsqueeze(0)) |
| prediction = logits.argmax(dim=1).item() |
| # 0 = non-tracking, 1 = tracking |
| ``` |
|
|
| ## On-Device Inference |
|
|
| This model is designed for deployment via |
| [Kjarni](https://github.com/olafurjohannsson/kjarni), compiled to |
| WebAssembly with SIMD128 acceleration. The 181 KB safetensors file and |
| three matrix multiplications make it suitable for real-time in-browser |
| classification with no data leaving the device. |
|
|
| ## Limitations |
|
|
| - Trained on a point-in-time snapshot of Tracker Radar (US region) |
| - Metadata features (entity ownership) can cause false positives for CDN domains owned by large companies |
| - Requires periodic retraining as tracking techniques evolve |
| - Tree-based models (RF, XGBoost) outperform this model on accuracy, but cannot run in WASM |
|
|
| ## Links |
|
|
| [Kjarni](https://kjarni.ai) |
|
|
| ## Source |
|
|
| Code and methodology: [github.com/olafurjohannsson/tracker-ml](https://github.com/olafurjohannsson/tracker-ml) |