File size: 4,834 Bytes
67c3d47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
552583f
 
 
 
 
 
 
67c3d47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
552583f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
base_model: ibm-granite/granite-4.0-micro
library_name: peft
pipeline_tag: zero-shot-classification
tags:
- text-classification
- zero-shot-classification
- calibrated
---

# Predicate — a promptable, calibrated criterion classifier (Granite-4.0-micro)

Describe **any** criterion in plain English, pass a piece of content, and get a
**calibrated 0–1 probability** that the content satisfies it. One small probe, any
criterion — **no per-criterion training, no labelled examples per task**.

```
score(content   = "Comparing you to two competitors. About ready to switch.",
      criterion = "the customer is at risk of churning")  →  0.97
```

- **Backbone:** [`ibm-granite/granite-4.0-micro`](https://huggingface.co/ibm-granite/granite-4.0-micro) (Apache-2.0), pulled at runtime.
- **Probe:** a frozen domain LoRA + a small MLP head reading the hidden state at **layer 28** via the **KV-pop** primitive (prefill the content once, pop K criteria against it → cheap multi-criteria).
- **License:** Apache-2.0. The LoRA, MLP head and calibration here are Apache-2.0; the Granite base is Apache-2.0. See `LICENSE` and `NOTICE`.

## What's in this repo
- `lora_granite_L07_armA_r32mlp/` — the PEFT LoRA adapter (rank 32, attn+mlp).
- `base_probe_granite_L07_armA_L28_amortized_kvpop_calibrated.zip` — the MLP head + isotonic calibration + manifest + load-time canary (the probe artifact).

## How to run

**Easiest — the Docker image** (everything bundled; Granite pulls at first run):
```bash
docker run --gpus all -p 8088:8088 ghcr.io/nope-net/predicate-oss:0.1.0
# one content, many criteria, scored in a single content prefill:
curl -X POST localhost:8088/classify_multi_criteria -H 'content-type: application/json' \
  -d '{"content":"My brother has been anxious lately and I worry about him.",
       "criteria":["the speaker themselves is anxious",
                   "someone other than the speaker is anxious"]}'
```

**From source** — the inference code is **Apache-2.0 and ships inside the Docker image**
(`/app/predicate` + `/app/service`). To run the probe yourself, use these weights with that
code: `Scorer.from_path("base_probe_granite_L07_armA_L28_amortized_kvpop_calibrated.zip")`
(point the manifest's `lora_path` at the LoRA in this repo). The artifact's **canary** verifies
the hidden-state frame at load and **refuses to serve on a numerical mismatch** — e.g. a
different GPU architecture (the canary here was generated on an RTX 4000 Ada; pack and serve on
the same GPU family). Built under `torch 2.6.0` / `transformers 5.9.0`.

## What it reads well
- Topic, sentiment, intent, stance
- **Subject / attribution** — whose attribute is this? (self vs third party)
- **Framing** — asserted vs quoted vs hypothetical; first-person voice
- Multi-clause AND / OR, deontic, temporal conditions
- **Cross-language** content (de / es / zh ≈ English)
- **Long documents** (Granite reaches 128K context; the practical limit is GPU memory)
- **Calibrated + reproducible** — same input → same score; safe to threshold, cache, monitor, A/B

## What it is NOT for — please read
Predicate reads **relatedness + instruction-following**, not formal **truth-relation**. It is
**not a judge, a groundedness/hallucination detector, or a pragmatic-comprehension engine**.
For those, a small purpose-built model or a line of code beats it:

| criterion type | use instead |
|---|---|
| scalar / quantifier logic ("not all", XOR, parity) | a rule, or a small NLI model |
| is a claim entailed / contradicted by messy evidence | a groundedness model (e.g. MiniCheck, an NLI encoder) |
| "is this factually true?" (world knowledge) | retrieval + a model that has the facts |
| exact numeric comparison | code |

On **vanilla** zero-shot classification, small encoders (GLiClass, DeBERTa-zero-shot) are
**efficient peers** at 0.4–0.6B params. Predicate's real edge is **expressivity** (arbitrary
inference-time criteria, no retraining), **calibration + reproducibility**, **context length**,
and the **KV-pop cost wedge** (many criteria amortized over one content prefill).

## Honest numbers
Macro AUROC **0.890** across a 9-substrate / 6116-item internal slate (ocular, clean,
held-out, cross-lingual, adversarial, ChaosNLI, compound-"vibe", codependence, implicature).
Strong on ocular/cross-lingual/adversarial; weakest on **implicature** (the truth-relation
wall — a structural limit, not a tuning gap). Encoders win narrow truth-relation tasks;
Predicate wins universality + calibration + cost.

## Non-claims
Predicate is a content classifier — **not predictive, diagnostic, or therapeutic**, and **not a
substitute for human judgment**. Scores are signals for routing, ranking, and human review.

Project page, recipe, and benchmarks: <https://labs.nope.net/predicate>