olafuraron's picture
Update README.md
b60c1c4 verified
---
license: cc-by-nc-sa-4.0
tags:
- web-privacy
- tracker-detection
- feedforward
- safetensors
- webassembly
datasets:
- olafuraron/tracker-radar-ml
---
# Entity Cluster Classifier
Classifies third-party web domains into four behavioral entity types based on metadata and API usage patterns from DuckDuckGo's [Tracker Radar](https://github.com/duckduckgo/tracker-radar) dataset.
## Live Preview
[Live preview](https://olafurjohannsson.github.io/tracker-ml/)
## Labels
| Label | Description |
|---|---|
| `ad_tech` | Advertising, analytics, and tracking companies (Google, Microsoft, Adobe, etc.) |
| `cdn_infra` | CDN and infrastructure providers (Amazon, Akamai, Fastly, etc.) |
| `platform` | Hosting and platform services (Shopify, GitHub, etc.) |
| `ad_management` | Ad blocking and ad management tools |
## Performance
- **Accuracy:** 75.2%
- **Weighted F1:** 0.767
- **Training data:** 4,973 domains from Tracker Radar US region
- **Features:** 164 behavioral features (API usage, cookie behavior, prevalence, resource types)
## Architecture
Feedforward neural network: 164 → 64 → 32 → 4 with ReLU activations and dropout (0.2). Model size: 50.3 KB.
Designed for on-device inference via [Kjarni](https://github.com/olafurjohannsson/kjarni) WebAssembly runtime with SIMD128 acceleration.
## Usage
Features must be standardized using the provided scaler (mean and scale in `entity_cluster_classifier_scaler.json`) before inference.
## Context
57% of domains in Tracker Radar have no ownership information. This model predicts what type of entity a domain belongs to based purely on behavioral signals — no ownership metadata is used as input. See [TrackerML](https://github.com/olafurjohannsson/tracker-ml) for the full project.
## Links
[Kjarni](https://kjarni.ai)
## License
CC-BY-NC-SA 4.0 (derived from DuckDuckGo Tracker Radar).