Instructions to use alyssaxuu/watchly-sm-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use alyssaxuu/watchly-sm-v2 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("alyssaxuu/watchly-sm-v2") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Watchly Smart-Match v2 (internal v11)
A fine-tuned 3-class NLI cross-encoder used by Watchly β a macOS watcher app β to decide at runtime whether a user's natural-language watch condition (e.g. "deploy succeeded", "my order shipped", "customer frustrated") is satisfied by the OCR text of a page snapshot.
What it's for
Smart-match runs as a layer-2 semantic gate after Watchly's deterministic rule engine. It only sees conditions the rule-drafter LLM routes to it: abstract / sentiment / state-event phrasing the rule engine can't compile to literal text-contains atoms. Numerical thresholds ("more than 100 errors"), state-change detection ("new email arrived"), and subjective conditions ("weather is nice") are handled elsewhere in Watchly's pipeline.
Architecture
- Base: dleemiller/EttinX-nli-s β small NLI cross-encoder
- Params: 68M (~261 MB safetensors)
- Latency: ~20 ms per forward pass on Apple Silicon (M-series)
- Output: 3-class NLI head β
[contradiction, neutral, entailment]. Smart-match uses the entailment column (index 2). - Inputs:
(condition, visible_text)pair. The page text is chunked into 300-char overlapping windows; entailment is max-pooled across chunks.
Training lineage
| Internal version | Description |
|---|---|
| v2 | Initial fine-tune on synthetic scenes corpus (~3000 cases) |
| v3 | + 465-row hard-negative patch (same-surface contrast) |
| v5 | + 240-row CLEAR-only curated round (Claude Haiku judge) |
| v6 | + 240-row topic/identity contrast (per-cluster scenarios) |
| v11 (this release) | + 768-row patch from 3 fresh adversarial holdouts (synonym positives + chrome-shortcut negatives) |
v11 was trained from v6 with 3 epochs at LR 5e-6, batch size 16. Patch shape: 384 contrast cases (3 sets Γ 128 Sonnet-generated adversarial scenarios) + 384 balanced replay from prior pools.
Evaluation
Production smart-match in the Watchly app combines this cross-encoder with a runtime safety-guard layer:
- Lexical-evidence guard (anchor stems must appear un-negated on page)
- Polarity-contrast rescue (synonym TPs, predicate-stem-gated)
- Future-pattern suppression
- Existing danger-word + numeric-progress guards
Numbers below include those guards.
| Suite | v6 (prior production) | v11 (this release) |
|---|---|---|
| Production smart-match in-scope (75 cases) | 96.00% (0 FP) | 96.00% (0 FP) |
| Codex out-of-distribution (28) | 96.43% | 100.00% |
| Fresh holdout (40) | 92.50% | 92.50% |
| Adversarial big holdout v4 (truly held out, 128) | 69.53% | 74.21% |
| Adversarial big holdout v5 (truly held out, 128) | 75.00% | 75.00% |
| Synthetic v2 (1808) | 90.93% | 90.21% |
Zero false positives on the production smart-match path β the metric Watchly cares most about (no spurious watcher fires).
Usage
from sentence_transformers import CrossEncoder
import numpy as np
model = CrossEncoder("alyssaxuu/watchly-sm-v2", max_length=512)
# Page text is chunked into 300-char windows and entailment is max-pooled
chunks = [
"Order #47291 β Shipped\nThank you for your purchase from Bellroy!\nTracking: UPS 1Z999AA10123456784",
]
condition = "my order shipped"
raw = np.array(model.predict([(condition, c) for c in chunks], apply_softmax=True))
entail = float(np.max(raw[:, 2])) # column 2 = entailment
# Production threshold: entail >= 0.50 β match (then runs through guard layer)
print(f"score={entail:.3f}")
In Watchly, the entailment score is then refined by the runtime guard layer described above before becoming a fire/no-fire decision.
Limitations
- Adversarial chrome-shortcut OCR (page is on-topic but state is opposite β e.g. condition "layoffs announced" on a Shopify hiring page): cross-encoder hits a ~75% ceiling on this distribution at the 32M-param scale. The runtime guard layer catches the worst confidence-locked failures; an ensemble with a deberta-v3-base co-classifier pushes the held-out adversarial accuracy to ~84% if size/latency budget allows.
- Synonym-only positives (page uses a different vocabulary than the condition's predicate, e.g. "finished" vs "Upload Complete"): the cross-encoder handles many but not all. The polarity-contrast rescue catches a meaningful fraction; the rest are accepted as missed-fires (preferred over spurious fires).
- Numerical / quantitative claims are out of scope by design β routed to Watchly's deterministic rule engine.
License
Apache 2.0, matching the EttinX-nli-s base model's license.
- Downloads last month
- 19