--- language: en license: mit tags: - privacy - web-tracking - tracker-detection - tabular-classification - browser-fingerprinting - safetensors - wasm datasets: - olafuraron/tracker-radar-ml metrics: - f1 - roc_auc - precision - recall --- # Tracker Classifier A lightweight feedforward neural network for classifying third-party web domains as tracking or non-tracking, designed for on-device inference via WebAssembly. ## Live Preview [Live preview](https://olafurjohannsson.github.io/tracker-ml/) ## Model Description - **Architecture**: Feedforward NN (input -> 128 -> 64 -> 2) with ReLU and dropout - **Size**: 181 KB (safetensors) - **Input**: 295 behavioral and metadata features from DuckDuckGo Tracker Radar - **Output**: Binary classification (0 = non-tracking, 1 = tracking) - **Training data**: 12,932 domains (80% of labeled set) - **Deployment target**: Kjarni inference engine compiled to WASM with SIMD128 ## Performance (5-fold CV) | Model | F1 | Precision | Recall | ROC-AUC | |-------|-----|-----------|--------|---------| | **This model (Feedforward NN)** | 0.848 +/- 0.017 | 0.804 +/- 0.037 | 0.899 +/- 0.006 | 0.928 +/- 0.008 | | Random Forest | 0.895 +/- 0.003 | 0.895 +/- 0.006 | 0.895 +/- 0.006 | 0.958 +/- 0.002 | | XGBoost | 0.893 +/- 0.004 | 0.887 +/- 0.006 | 0.899 +/- 0.004 | 0.959 +/- 0.002 | | FP Heuristic (score >= 2)* | 0.355 | 0.579 | 0.257 | n/a | *The fingerprinting heuristic targets browser API fingerprinting specifically, not general tracking. The comparison demonstrates the gap between single-vector and multi-vector detection.* ## Files - `tracker_classifier.safetensors`: Model weights (181 KB) - `config.json`: Architecture config, feature names, scaler parameters - `scaler.joblib`: Sklearn StandardScaler for feature normalization - `results.json`: Full evaluation metrics ## Usage ```python import torch import json import numpy as np from safetensors.torch import load_file weights = load_file("tracker_classifier.safetensors") config = json.load(open("config.json")) class TrackerClassifier(torch.nn.Module): def __init__(self, input_dim, hidden_dim=128): super().__init__() self.layer1 = torch.nn.Linear(input_dim, hidden_dim) self.layer2 = torch.nn.Linear(hidden_dim, hidden_dim // 2) self.layer3 = torch.nn.Linear(hidden_dim // 2, 2) self.relu = torch.nn.ReLU() def forward(self, x): x = self.relu(self.layer1(x)) x = self.relu(self.layer2(x)) return self.layer3(x) model = TrackerClassifier(input_dim=config["input_dim"]) model.load_state_dict(weights) model.eval() # Classify (standardize features first) features = np.array([...]) # 295 features mean = np.array(config["scaler_mean"]) scale = np.array(config["scaler_scale"]) features_scaled = (features - mean) / scale with torch.no_grad(): logits = model(torch.FloatTensor(features_scaled).unsqueeze(0)) prediction = logits.argmax(dim=1).item() # 0 = non-tracking, 1 = tracking ``` ## On-Device Inference This model is designed for deployment via [Kjarni](https://github.com/olafurjohannsson/kjarni), compiled to WebAssembly with SIMD128 acceleration. The 181 KB safetensors file and three matrix multiplications make it suitable for real-time in-browser classification with no data leaving the device. ## Limitations - Trained on a point-in-time snapshot of Tracker Radar (US region) - Metadata features (entity ownership) can cause false positives for CDN domains owned by large companies - Requires periodic retraining as tracking techniques evolve - Tree-based models (RF, XGBoost) outperform this model on accuracy, but cannot run in WASM ## Links [Kjarni](https://kjarni.ai) ## Source Code and methodology: [github.com/olafurjohannsson/tracker-ml](https://github.com/olafurjohannsson/tracker-ml)