FATE-ClinicalTrials-Outcome-256

A directional signal on whether a trial looks like past trials that worked β€” or ones that didn't.

FATE (Failure-And-Trial Embeddings) is the outcome-similarity model in the OntologerMed suite. It embeds any clinical trial into a 256-dimensional vector shaped by historical success and failure patterns, trained on 18,132 completed ClinicalTrials.gov trials with verified outcomes.

When a new Phase 3 trial is posted, there are no results. But the trial's design, population, endpoints, and sponsor carry patterns that correlate with historical outcomes. FATE captures those patterns and places the new trial in context: does it look more like historical successes or historical failures?

Also known as OntologerMed TrialPulse in the Commercializer.ai platform.


Model Overview

Property Value
Model name FATE-ClinicalTrials-Outcome-256
Product name OntologerMed TrialPulse
Base model NeuML/pubmedbert-base-embeddings (PubMedBERT)
Output dimension 256
Training method Contrastive learning β€” Triplet Loss
Training data 18,132 completed trials with verified outcome labels
Positive class 10,609 trials (58.5%) β€” confirmed primary endpoint success
Negative class 7,523 trials (41.5%) β€” completed, negative primary endpoint
License Apache 2.0
HuggingFace Ontologer/FATE-ClinicalTrials-Outcome-256

How It Works

FATE is not a predictor. It is a neighborhood signal.

Given a new or ongoing trial, FATE embeds it into the same 256-dimensional space as 18,132 labeled historical trials. The nearest neighbors' outcome distribution tells you whether this trial's design patterns look more like historical successes or failures.

  • Nearby vectors = similar outcome patterns, not similar diseases or drugs
  • Two trials from completely different indications can be neighbors if their design profiles (phase, endpoints, enrollment, sponsor type) resemble each other
  • A trial that lands in a neighborhood of 80% historical successes is exhibiting success-pattern characteristics β€” not a guarantee, but a directional signal

This is pattern matching against known outcomes. Not causal prediction. Not a probability estimate. Report it as what it is: a similarity signal.


Architecture

Trial text β†’ PubMedBERT (384-dim) β†’ Dense projection (384 β†’ 256) β†’ L2 normalize β†’ vector
  • Base: PubMedBERT β€” pre-trained on 30M+ PubMed abstracts; strong biomedical language understanding
  • Projection: Single dense layer reducing 384 β†’ 256 dimensions
  • Normalisation: L2, enabling cosine similarity via dot product
  • Training signal: Triplet loss β€” anchor and positive share the same outcome label (both success or both failure); negative sampled from the opposite label
  • Triplets: 10,000 generated from the 18,132 labeled trial set

Training Details

Parameter Value
Epochs 10
Batch size 32
Learning rate 2e-5
Loss margin 0.5
Hardware NVIDIA DGX Spark (Blackwell GB10)
Triplet grouping success_label (success / failure)
Label source ClinicalTrials.gov β€” overall_status = COMPLETED, success_label IS NOT NULL

Business Use & Applications

FATE is purpose-built for workflows where teams need a fast, scalable directional signal on trial risk β€” before expensive deep-dives.

Pharmaceutical & Biotech R&D

  • Portfolio risk triage β€” score every trial in your pipeline against historical outcome patterns at intake, not just at readout

    • Flag trials whose design profile resembles high-failure-rate historical cohorts for early intervention
    • Prioritise which internal trials warrant additional mechanistic validation based on outcome-similarity risk signals
    • Surface design features (endpoint type, phase, population definition) that correlate with your portfolio's historical failures
  • Competitive landscape screening β€” assess the historical risk profile of competitor trials before they report

    • Embed every competitor Phase 2/3 trial and compare neighborhood success rates
    • Identify which competitor programmes sit in high-risk historical patterns β€” informing your own BD and partnership decisions
    • Monitor shifting risk signals as competitors amend trial designs or endpoints

Investment & Due Diligence

  • Pre-investment screening β€” apply outcome-pattern triage across a company's entire pipeline in minutes

    • Score each pipeline asset against the 18K labeled trial history before deep diligence
    • Identify programmes sitting in historically weak outcome neighborhoods as a first-pass risk flag
    • Compare the overall risk profile of a portfolio company against historical base rates by phase and indication
  • M&A and licensing diligence β€” add a data-driven risk dimension to asset evaluation

    • Triage dozens of pipeline assets across an acquisition target simultaneously
    • Identify assets with outcome-pattern profiles inconsistent with management's stated confidence levels
    • Provide a reproducible, auditable risk signal alongside analyst qualitative assessment

Clinical Research Organisations (CROs)

  • Bid/no-bid risk assessment β€” evaluate the historical risk pattern of a prospective client trial before committing resources
    • Embed the protocol summary and assess its outcome-pattern neighborhood
    • Flag trials in historically difficult outcome neighborhoods for additional feasibility scrutiny
    • Benchmark the client's proposed design against historical trials with similar patterns

Regulatory Affairs

  • Safety pattern clustering β€” group trials by outcome similarity to identify systematic design patterns in failed programmes
    • Cluster all trials in a compound class by FATE vector; separate successful from failed cohorts
    • Identify structural design features (endpoints, populations, arms) associated with failure in a given indication
    • Use historical failure-pattern analysis to inform protocol design decisions

Example Queries

FATE is used via embedding + nearest-neighbor lookup, not question answering. Below are illustrative examples of how to interpret the output.


Example 1: New Phase 3 Trial β€” Positive Neighborhood

Input trial (new, no results yet):

Phase 3, double-blind RCT of oral JAK inhibitor vs placebo in 480 adults with moderate-to-severe RA who failed β‰₯1 bDMARD. Primary endpoint: ACR20 at Week 12. Sponsor: mid-cap pharma. Prior Phase 2 showed 58% ACR20 vs 22% placebo.

FATE embedding β†’ 20 nearest historical neighbors:

Neighbor Outcome
Phase 3 RCT of tofacitinib in RA (bDMARD-failure) βœ“ Success
Phase 3 RCT of baricitinib in RA (MTX-IR) βœ“ Success
Phase 3 RCT of upadacitinib in RA (bDMARD-failure) βœ“ Success
Phase 3 RCT of filgotinib in RA (bDMARD-failure) βœ“ Success
Phase 3 RCT of abrocitinib in atopic dermatitis βœ“ Success
Phase 3 RCT of ruxolitinib in myelofibrosis βœ— Failure
...

Neighborhood success rate: 17/20 = 85%

Signal: This trial's design pattern sits in a historically high-success neighborhood. Consistent with the well-validated JAK inhibitor class in bDMARD-failure RA.


Example 2: New Phase 2 Trial β€” Mixed Neighborhood

Input trial:

Phase 2, open-label, single-arm study of novel gene therapy in 40 patients with ultra-rare metabolic disorder. Primary endpoint: biomarker normalisation at Week 24. No approved comparator exists. First-in-class.

Neighborhood success rate: 9/20 = 45%

Signal: Mixed neighborhood. First-in-class rare disease gene therapies have historically split between landmark successes and endpoint failures at Phase 2. Warrants deeper mechanistic review before Phase 3 design.


Usage

from sentence_transformers import SentenceTransformer
import numpy as np

# Load model
model = SentenceTransformer("Ontologer/FATE-ClinicalTrials-Outcome-256")

# Embed a new trial (title | summary | interventions | primary outcome)
trial_text = """
Phase 3 RCT of novel JAK inhibitor vs placebo in 480 adults with RA who
failed β‰₯1 bDMARD. Primary endpoint: ACR20 at Week 12.
"""
query_vector = model.encode([trial_text])  # shape: (1, 256)

# Compare against your stored labeled-trial index
# (18K vectors, each tagged with success_label)
similarities = np.dot(query_vector, labeled_vectors.T)  # cosine sim (vectors are L2-normed)
top_k_indices = np.argsort(similarities[0])[::-1][:20]

# Neighborhood signal
success_rate = np.mean([labels[i] for i in top_k_indices])
print(f"Neighborhood success rate: {success_rate:.1%}")

Index size: 256 dims Γ— 4 bytes Γ— 18,132 labeled trials β‰ˆ 18MB. If embedding all 563K trials: β‰ˆ 550MB.


Part of the OntologerMed Suite

Model Role
OntologerMed-ClinicalTrials-Instruct Generative LM β€” reasoning, extraction, and summarisation over trial text
FATE-ClinicalTrials-Outcome-256 (TrialPulse) Outcome-shaped embedding β€” similarity by historical success/failure pattern
MOAt-ClinicalTrials-MoA-256 (TargetLens) Mechanism-of-action embedding β€” similarity by biological pathway
PACT-ClinicalTrials-Pop-256 (PathFinder) Population embedding β€” similarity by patient demographics and disease
ORACLE-ClinicalTrials-SuccessProb-v1 Classifier β€” probability estimate combining all three embedding dimensions

ORACLE combines FATE + MOAt + PACT into a single probability-of-success estimate. FATE provides the outcome-pattern dimension of that combined signal.


Limitations

  • Similarity signal, not prediction: Neighborhood success rates are descriptive, not causal. A trial in a historically successful neighborhood can still fail.
  • Label selection bias: The 18,132 labeled trials are a subset of 308K+ completed ClinicalTrials.gov trials. Trials with formal outcome documentation are overrepresented in sponsored, commercial programmes.
  • Historical patterns may not generalise: First-in-class mechanisms and novel indications have few historical neighbors. Neighborhood signals in sparse regions are less reliable.
  • English only: Trained on English-language trial records. Non-English records produce degraded embeddings.
  • Not medical or investment advice: This model does not predict the outcome of any specific trial. Do not use as a sole input in clinical, regulatory, or financial decision-making.

Citation

@misc{fate-clinicaltrials-2026,
  title        = {FATE-ClinicalTrials-Outcome-256: Outcome-Similarity Embeddings for Clinical Trial Intelligence},
  author       = {Mishra, Sid},
  year         = {2026},
  note         = {Contrastive triplet embedding model trained on 18,132 completed ClinicalTrials.gov trials with verified outcome labels.},
  howpublished = {\url{https://huggingface.co/Ontologer/FATE-ClinicalTrials-Outcome-256}}
}

About the Author

Sid Mishra β€” Founder, Ontologer Β· Convixion AI

Sid is the founder of several AI-native and AI-powered startups and initiatives, based in Singapore. He founded Ontologer as the dedicated AI research arm of Convixion AI, with a focus on building domain-specific language models from the ground up β€” including data pipelines, training infrastructure, evaluation frameworks, and production deployment.

Ontologer generates novel LLM and embedding models purpose-built for use within Convixion AI's Commercializer.ai platform. FATE is part of the OntologerMed suite β€” a family of purpose-built models for clinical trial intelligence. Ontologer performs every step of model development β€” dataset curation, training infrastructure, evaluation, and production deployment β€” in-house.

Collaboration & Custom Work

Sid is open to collaborating on:

  • Custom domain-adapted embedding models β€” contrastive/triplet training on proprietary datasets for specialised retrieval tasks
  • End-to-end LLM and embedding pipelines β€” from data curation to training to production deployment
  • Evaluation framework design β€” task-specific benchmarks and retrieval evaluation pipelines
  • RAG + embedding system design β€” pairing domain-adapted models with retrieval systems for production use
  • Custom model architecture consulting β€” base model selection, training strategy, hardware planning
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ontologer/FATE-ClinicalTrials-Outcome-256