--- license: cc-by-nc-sa-4.0 tags: - web-privacy - tracker-detection - feedforward - safetensors - webassembly datasets: - olafuraron/tracker-radar-ml --- # Entity Cluster Classifier Classifies third-party web domains into four behavioral entity types based on metadata and API usage patterns from DuckDuckGo's [Tracker Radar](https://github.com/duckduckgo/tracker-radar) dataset. ## Live Preview [Live preview](https://olafurjohannsson.github.io/tracker-ml/) ## Labels | Label | Description | |---|---| | `ad_tech` | Advertising, analytics, and tracking companies (Google, Microsoft, Adobe, etc.) | | `cdn_infra` | CDN and infrastructure providers (Amazon, Akamai, Fastly, etc.) | | `platform` | Hosting and platform services (Shopify, GitHub, etc.) | | `ad_management` | Ad blocking and ad management tools | ## Performance - **Accuracy:** 75.2% - **Weighted F1:** 0.767 - **Training data:** 4,973 domains from Tracker Radar US region - **Features:** 164 behavioral features (API usage, cookie behavior, prevalence, resource types) ## Architecture Feedforward neural network: 164 → 64 → 32 → 4 with ReLU activations and dropout (0.2). Model size: 50.3 KB. Designed for on-device inference via [Kjarni](https://github.com/olafurjohannsson/kjarni) WebAssembly runtime with SIMD128 acceleration. ## Usage Features must be standardized using the provided scaler (mean and scale in `entity_cluster_classifier_scaler.json`) before inference. ## Context 57% of domains in Tracker Radar have no ownership information. This model predicts what type of entity a domain belongs to based purely on behavioral signals — no ownership metadata is used as input. See [TrackerML](https://github.com/olafurjohannsson/tracker-ml) for the full project. ## Links [Kjarni](https://kjarni.ai) ## License CC-BY-NC-SA 4.0 (derived from DuckDuckGo Tracker Radar).