How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="DoDataThings/distilbert-trade-decision-classifier-v1")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("DoDataThings/distilbert-trade-decision-classifier-v1")
model = AutoModelForSequenceClassification.from_pretrained("DoDataThings/distilbert-trade-decision-classifier-v1")
Quick Links

distilbert-trade-decision-classifier-v1

DistilBERT fine-tuned with LoRA r=32 for classifying user replies to trading-agent proposals into one of six decision intents. Pairs with a regex fast-path and a confirmation prompt for the bookends of a reply-routing pipeline.

How it works

Trading agents that DM proposals ("Approve / decline / hold / size N / trim N?") get free-form text replies back. This model converts the reply into one of six discrete intents so the agent can route it deterministically.

The model is invoked AFTER a fast-path regex tries the canonical phrases first ("approve", "decline", "size 10"). The regex handles routine replies; the model handles everything the regex doesn't match.

Reply text in
   ↓
Canonical-phrase regex      ← catches structured replies cheaply
   ↓ (no match)
THIS MODEL                  ← classifies into 6 intent labels
   ↓
Decision rule:
   β€’ confidence β‰₯ 0.85 AND label β‰  UNCLEAR β†’ commit
   β€’ else                                  β†’ confirmation prompt to the user

Labels (6)

Label What it covers
APPROVE Execute the proposal as stated. "approve", "yes", "let's go", "send it"
DECLINE Kill the proposal. "no", "pass", "kill it", "hard pass"
HOLD Active deferral β€” user is engaged but not deciding yet. "hold off", "checking", "let me think", "leaning approve"
COUNTER_SIZE Execute but at a different share count. "size 10", "dump half", "trim 50"
COUNTER_PRICE Execute but at a different limit price. "at $49", "limit 50", "trim at $48"
UNCLEAR Cannot safely commit. Multi-intent, ambiguous, off-topic, or sarcastic. Falls through to confirmation prompt.

UNCLEAR is a trained refusal label, not a fallback. The model is expected to emit it on multi-intent, ambiguous, or off-topic inputs. Treat it as the model saying "I don't know, ask the human."

Inputs

A single string with structural context tags prepended:

[dm|group][reply_to:N|no_reply_to][in_flight:K] <reply text>
  • [dm] vs [group] β€” chat surface (DM vs group chat)
  • [reply_to:N] vs [no_reply_to] β€” whether the user quote-replied to a specific proposal
  • [in_flight:K] β€” number of proposals currently awaiting decision

Example inputs:

[dm][reply_to:200][in_flight:1] approve
[dm][no_reply_to][in_flight:1] dump half
[dm][reply_to:200][in_flight:2] trim at $49

The tags carry context the model can't infer from the text alone β€” "yes" with 1 proposal in flight is APPROVE; "yes" with 3 in flight and no quote-reply is structurally ambiguous and trained as UNCLEAR.

Usage

Python (transformers)

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="DoDataThings/distilbert-trade-decision-classifier-v1",
)
result = clf("[dm][reply_to:200][in_flight:1] dump half")
print(result)
# [{'label': 'COUNTER_SIZE', 'score': 0.991}]

Python (onnxruntime, CPU)

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("DoDataThings/distilbert-trade-decision-classifier-v1")
sess = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

text = "[dm][no_reply_to][in_flight:1] hold off"
enc = tok(text, truncation=True, max_length=64, return_tensors="np")
logits = sess.run(
    None,
    {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]},
)[0][0]
probs = np.exp(logits) / np.exp(logits).sum()
labels = ["APPROVE", "DECLINE", "HOLD", "COUNTER_SIZE", "COUNTER_PRICE", "UNCLEAR"]
print(labels[int(probs.argmax())], float(probs.max()))
# HOLD 0.943

Deployment shape

The model is not safe to use standalone. Pair with:

  • A confidence threshold (we recommend 0.85)
  • Deterministic safety rails (position size, available cash, mode gate)
  • A confirmation prompt for low-confidence cases

The model picks intent; the system decides whether to act. It does not have final authority over orders.

Design decisions

Narrow-waist split. The model classifies INTENT only, not proposal context. By design, upstream code disambiguates which proposal the reply targets (via quote-reply or single-default rule), and the model only sees the locked-in case. This makes the model independent of ticker / setup / portfolio specifics β€” its job is interpreting "what did the user mean," not "which one."

UNCLEAR as a trained refusal class. A 5-label classifier forced to pick one of {APPROVE, DECLINE, HOLD, COUNTER_SIZE} on ambiguous input is dangerous. The 6th label is the model's escape valve β€” it's trained on multi-intent, ambiguous, off-topic, and sarcastic inputs so it can refuse rather than guess. Combined with the 0.85 confidence threshold, this caps the blast radius of misclassification: an unsafe input either yields UNCLEAR (refusal) or a non-UNCLEAR label with low confidence (falls through to confirmation prompt).

Structural prefix as text, not special tokens. The [dm][reply_to:N][in_flight:K] tags are concatenated into the input string and tokenized as regular subword pieces. This works with off-the-shelf DistilBERT β€” no special-token registration, no tokenizer config drift between train and serve. The model learns the bracket conventions naturally via attention.

Six labels including COUNTER_PRICE. Earlier versions used five labels. The sixth (COUNTER_PRICE) was added because "trim at $49 instead of $48" is a fundamentally different action from "size 10" β€” different downstream extraction (price vs share count). Conflating them would force the consumer to disambiguate post-classification, defeating the purpose of the intent label.

Evaluation

Held-out eval set: 175 hand-curated adversarial examples, ~30 per class, zero-leakage verified against training.

Label Precision Recall F1 Count
APPROVE 0.967 0.967 0.967 30
DECLINE 1.000 0.933 0.966 30
HOLD 0.970 0.941 0.955 34
COUNTER_SIZE 0.968 1.000 0.984 30
COUNTER_PRICE 1.000 1.000 1.000 25
UNCLEAR 0.821 0.885 0.852 26
macro avg 0.954 175
accuracy 0.954

Honest assessment. Zero high-confidence misclassifications on eval (no row labeled wrong at confidence β‰₯ 0.85). DECLINE and COUNTER_PRICE both hit perfect precision (1.000). UNCLEAR is the weakest class at F1 0.85, and the HOLD/UNCLEAR boundary on multi-intent inputs ("approve but only half") is genuinely fuzzy β€” these cases can be reasonably labeled either way. The 0.85 confidence threshold is calibrated so weak cases fall to confirmation rather than commit wrong.

Training

Knob Value
Base model distilbert-base-uncased
Adapter LoRA r=32 on attention projections (q_lin, v_lin)
Sequence length 64
Batch size 32
Learning rate 5e-5, cosine schedule, 10% warmup
Epochs 3, early-stop on eval macro-F1
Class weighting inverse-frequency (functionally uniform β€” data is balanced within 2%)
Hardware Single RTX 4090
Wall time ~9 seconds

Limitations

  1. Classifies INTENT only, not proposal context. The model never sees the actual proposal being responded to β€” upstream proposal-disambiguation must run before this model is invoked.
  2. COUNTER_SIZE emits intent only; share count extraction is a separate downstream step (regex).
  3. COUNTER_PRICE emits intent only; price extraction is a separate downstream step.
  4. Trained on author-curated and synthetically-augmented data. Real-world reply variety may exceed training surface forms; expect ~5% of replies to fall to confirmation-prompt fallback.
  5. UNCLEAR has the lowest F1 (0.85). The boundary with HOLD (active deferral vs no-position) is fuzzy on multi-intent inputs.
  6. English-only. No localization in v1.

Dataset

Training and evaluation data: DoDataThings/trade-decision-classifier-v1-dataset

License

Apache 2.0.

Downloads last month
16
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DoDataThings/distilbert-trade-decision-classifier-v1

Adapter
(378)
this model