Tracker Classifier

A lightweight feedforward neural network for classifying third-party web domains as tracking or non-tracking, designed for on-device inference via WebAssembly.

Live Preview

Live preview

Model Description

  • Architecture: Feedforward NN (input -> 128 -> 64 -> 2) with ReLU and dropout
  • Size: 181 KB (safetensors)
  • Input: 295 behavioral and metadata features from DuckDuckGo Tracker Radar
  • Output: Binary classification (0 = non-tracking, 1 = tracking)
  • Training data: 12,932 domains (80% of labeled set)
  • Deployment target: Kjarni inference engine compiled to WASM with SIMD128

Performance (5-fold CV)

Model F1 Precision Recall ROC-AUC
This model (Feedforward NN) 0.848 +/- 0.017 0.804 +/- 0.037 0.899 +/- 0.006 0.928 +/- 0.008
Random Forest 0.895 +/- 0.003 0.895 +/- 0.006 0.895 +/- 0.006 0.958 +/- 0.002
XGBoost 0.893 +/- 0.004 0.887 +/- 0.006 0.899 +/- 0.004 0.959 +/- 0.002
FP Heuristic (score >= 2)* 0.355 0.579 0.257 n/a

The fingerprinting heuristic targets browser API fingerprinting specifically, not general tracking. The comparison demonstrates the gap between single-vector and multi-vector detection.

Files

  • tracker_classifier.safetensors: Model weights (181 KB)
  • config.json: Architecture config, feature names, scaler parameters
  • scaler.joblib: Sklearn StandardScaler for feature normalization
  • results.json: Full evaluation metrics

Usage

import torch
import json
import numpy as np
from safetensors.torch import load_file

weights = load_file("tracker_classifier.safetensors")
config = json.load(open("config.json"))

class TrackerClassifier(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim=128):
        super().__init__()
        self.layer1 = torch.nn.Linear(input_dim, hidden_dim)
        self.layer2 = torch.nn.Linear(hidden_dim, hidden_dim // 2)
        self.layer3 = torch.nn.Linear(hidden_dim // 2, 2)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.relu(self.layer2(x))
        return self.layer3(x)

model = TrackerClassifier(input_dim=config["input_dim"])
model.load_state_dict(weights)
model.eval()

# Classify (standardize features first)
features = np.array([...])  # 295 features
mean = np.array(config["scaler_mean"])
scale = np.array(config["scaler_scale"])
features_scaled = (features - mean) / scale

with torch.no_grad():
    logits = model(torch.FloatTensor(features_scaled).unsqueeze(0))
    prediction = logits.argmax(dim=1).item()
    # 0 = non-tracking, 1 = tracking

On-Device Inference

This model is designed for deployment via Kjarni, compiled to WebAssembly with SIMD128 acceleration. The 181 KB safetensors file and three matrix multiplications make it suitable for real-time in-browser classification with no data leaving the device.

Limitations

  • Trained on a point-in-time snapshot of Tracker Radar (US region)
  • Metadata features (entity ownership) can cause false positives for CDN domains owned by large companies
  • Requires periodic retraining as tracking techniques evolve
  • Tree-based models (RF, XGBoost) outperform this model on accuracy, but cannot run in WASM

Links

Kjarni

Source

Code and methodology: github.com/olafurjohannsson/tracker-ml

Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train olafuraron/tracker-classifier