| license: cc-by-nc-sa-4.0 | |
| tags: | |
| - web-privacy | |
| - tracker-detection | |
| - entity-attribution | |
| - feedforward | |
| - safetensors | |
| - webassembly | |
| datasets: | |
| - olafuraron/tracker-radar-ml | |
| # Tracking Entity Classifier | |
| Predicts which company owns a third-party tracking domain based on behavioral patterns from DuckDuckGo's [Tracker Radar](https://github.com/duckduckgo/tracker-radar) dataset. No ownership metadata is used as input — the model learns to identify entities from API usage, cookie behavior, resource types, and prevalence patterns. | |
| ## Live Preview | |
| [Live preview](https://olafurjohannsson.github.io/tracker-ml/) | |
| ## Labels | |
| 13 tracking-related entities: | |
| Adobe Inc., ByteDance Ltd., Comcast Corporation, Conversant LLC, Google LLC, HubSpot Inc., Impact, Leven Labs Inc. DBA Admiral, Microsoft Corporation, Oracle Corporation, Salesforce.com Inc., Yahoo Inc., Yandex LLC | |
| ## Performance | |
| - **Accuracy:** 58.5% | |
| - **Weighted F1:** 0.604 | |
| - **Training data:** 731 domains from Tracker Radar US region | |
| - **Features:** 164 behavioral features | |
| Strong per-entity results for distinctive entities: Leven Labs (F1 0.93), Google (F1 0.75), Microsoft (F1 0.65). Less reliable for smaller entities with few training samples. | |
| ## Architecture | |
| Feedforward neural network: 164 → 128 → 64 → 13 with ReLU activations and dropout (0.2). Model size: 118.5 KB. | |
| Designed for on-device inference via [Kjarni](https://github.com/olafurjohannsson/kjarni) WebAssembly runtime with SIMD128 acceleration. | |
| ## Usage | |
| Features must be standardized using the provided scaler (mean and scale in `tracking_entity_classifier_scaler.json`) before inference. This model is most meaningful when applied to domains already identified as ad tech by the [entity cluster classifier](https://huggingface.co/olafuraron/entity-cluster-classifier). | |
| ## Context | |
| This model demonstrates that tracking companies have identifiable behavioral fingerprints — their domains exhibit characteristic patterns of API usage, cookie behavior, and web presence that distinguish them from other entities. See [TrackerML](https://github.com/olafurjohannsson/tracker-ml) for the full project. | |
| ## Links | |
| [Kjarni](https://kjarni.ai) | |
| ## License | |
| CC-BY-NC-SA 4.0 (derived from DuckDuckGo Tracker Radar). | |