Instructions to use nopenet/predicate-oss with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use nopenet/predicate-oss with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: ibm-granite/granite-4.0-micro | |
| library_name: peft | |
| pipeline_tag: zero-shot-classification | |
| tags: | |
| - text-classification | |
| - zero-shot-classification | |
| - calibrated | |
| # Predicate — a promptable, calibrated criterion classifier (Granite-4.0-micro) | |
| Describe **any** criterion in plain English, pass a piece of content, and get a | |
| **calibrated 0–1 probability** that the content satisfies it. One small probe, any | |
| criterion — **no per-criterion training, no labelled examples per task**. | |
| ``` | |
| score(content = "Comparing you to two competitors. About ready to switch.", | |
| criterion = "the customer is at risk of churning") → 0.97 | |
| ``` | |
| - **Backbone:** [`ibm-granite/granite-4.0-micro`](https://huggingface.co/ibm-granite/granite-4.0-micro) (Apache-2.0), pulled at runtime. | |
| - **Probe:** a frozen domain LoRA + a small MLP head reading the hidden state at **layer 28** via the **KV-pop** primitive (prefill the content once, pop K criteria against it → cheap multi-criteria). | |
| - **License:** Apache-2.0. The LoRA, MLP head and calibration here are Apache-2.0; the Granite base is Apache-2.0. See `LICENSE` and `NOTICE`. | |
| ## What's in this repo | |
| - `lora_granite_L07_armA_r32mlp/` — the PEFT LoRA adapter (rank 32, attn+mlp). | |
| - `base_probe_granite_L07_armA_L28_amortized_kvpop_calibrated.zip` — the MLP head + isotonic calibration + manifest + load-time canary (the probe artifact). | |
| ## How to run | |
| **Easiest — the Docker image** (everything bundled; Granite pulls at first run): | |
| ```bash | |
| docker run --gpus all -p 8088:8088 ghcr.io/nope-net/predicate-oss:0.1.0 | |
| # one content, many criteria, scored in a single content prefill: | |
| curl -X POST localhost:8088/classify_multi_criteria -H 'content-type: application/json' \ | |
| -d '{"content":"My brother has been anxious lately and I worry about him.", | |
| "criteria":["the speaker themselves is anxious", | |
| "someone other than the speaker is anxious"]}' | |
| ``` | |
| **From source** — the inference code is **Apache-2.0 and ships inside the Docker image** | |
| (`/app/predicate` + `/app/service`). To run the probe yourself, use these weights with that | |
| code: `Scorer.from_path("base_probe_granite_L07_armA_L28_amortized_kvpop_calibrated.zip")` | |
| (point the manifest's `lora_path` at the LoRA in this repo). The artifact's **canary** verifies | |
| the hidden-state frame at load and **refuses to serve on a numerical mismatch** — e.g. a | |
| different GPU architecture (the canary here was generated on an RTX 4000 Ada; pack and serve on | |
| the same GPU family). Built under `torch 2.6.0` / `transformers 5.9.0`. | |
| ## What it reads well | |
| - Topic, sentiment, intent, stance | |
| - **Subject / attribution** — whose attribute is this? (self vs third party) | |
| - **Framing** — asserted vs quoted vs hypothetical; first-person voice | |
| - Multi-clause AND / OR, deontic, temporal conditions | |
| - **Cross-language** content (de / es / zh ≈ English) | |
| - **Long documents** (Granite reaches 128K context; the practical limit is GPU memory) | |
| - **Calibrated + reproducible** — same input → same score; safe to threshold, cache, monitor, A/B | |
| ## What it is NOT for — please read | |
| Predicate reads **relatedness + instruction-following**, not formal **truth-relation**. It is | |
| **not a judge, a groundedness/hallucination detector, or a pragmatic-comprehension engine**. | |
| For those, a small purpose-built model or a line of code beats it: | |
| | criterion type | use instead | | |
| |---|---| | |
| | scalar / quantifier logic ("not all", XOR, parity) | a rule, or a small NLI model | | |
| | is a claim entailed / contradicted by messy evidence | a groundedness model (e.g. MiniCheck, an NLI encoder) | | |
| | "is this factually true?" (world knowledge) | retrieval + a model that has the facts | | |
| | exact numeric comparison | code | | |
| On **vanilla** zero-shot classification, small encoders (GLiClass, DeBERTa-zero-shot) are | |
| **efficient peers** at 0.4–0.6B params. Predicate's real edge is **expressivity** (arbitrary | |
| inference-time criteria, no retraining), **calibration + reproducibility**, **context length**, | |
| and the **KV-pop cost wedge** (many criteria amortized over one content prefill). | |
| ## Honest numbers | |
| Macro AUROC **0.890** across a 9-substrate / 6116-item internal slate (ocular, clean, | |
| held-out, cross-lingual, adversarial, ChaosNLI, compound-"vibe", codependence, implicature). | |
| Strong on ocular/cross-lingual/adversarial; weakest on **implicature** (the truth-relation | |
| wall — a structural limit, not a tuning gap). Encoders win narrow truth-relation tasks; | |
| Predicate wins universality + calibration + cost. | |
| ## Non-claims | |
| Predicate is a content classifier — **not predictive, diagnostic, or therapeutic**, and **not a | |
| substitute for human judgment**. Scores are signals for routing, ranking, and human review. | |
| Project page, recipe, and benchmarks: <https://labs.nope.net/predicate> | |