CYP2C9 Variant Function Classifier (research artifact β€” honest negative)

Experimental classifier for CYP2C9 variant functional classification (no_function / decreased_function / normal_function), built by Anukriti AI.

Read this first. This repository is published as a transparent negative result. The v2 model fixes the circularity of v1 but still fails the held-out clinical test (1/6 correct). It is not a clinical predictor and must not be used for dosing decisions. It is shared so the finding β€” that single-assay MAVE labels do not generalize to CPIC clinical phenotype for CYP2C9 β€” is reproducible.

Versions

  • v1 β€” MAVE-threshold scaffold. 8,050 training rows. 5-fold CV accuracy 0.996 (XGB) but circular: the click_score / vamp_score features the labels were thresholded from drive ~77% of feature importance. Leave-anchors-out: 4/4 CPIC anchors misclassified without 500Γ— upweighting. A MAVE-threshold reproducer, not a clinical predictor.
  • v2 β€” non-circular. click_score / vamp_score removed; AlphaMissense (genomic-coordinate-corrected) + CADD added. Trained on the 2,514-row SNV-reachable subset. 5-fold CV AUC ~0.88 (XGB 0.886) β€” believable, not hollow. Held-out clinical test: 1/6 = 17% (only *11 predicted correctly).

The finding

Removing the circular features fixed the inflated CV score, but the model still fails clinically because it was trained on MAVE-threshold labels, and MAVE assay function β‰  CPIC clinical function for CYP2C9. The bottleneck is the label definition β€” not feature quality or model architecture.

AlphaMissense is discriminative where available (monotonic class separation: normal 0.21 β†’ decreased 0.44 β†’ no_function 0.65 mean), but covers only 31.3% of this codon-saturation MAVE library because 67.5% of variants require multi-nucleotide AA changes that AlphaMissense cannot score by design. Coverage is the blocker, not feature quality.

Ground truth / sources

MaveDB (Click-seq + VAMP-seq CYP2C9 libraries), CPIC CYP2C9 allele-function table, PharmVar, Ensembl VEP / AlphaMissense, CADD.

Citation

Part of the Anukriti AI platform validation effort. Project-level preprint: https://doi.org/10.5281/zenodo.20727790 (This DOI covers the broader Anukriti validation study, not a CYP2C9-specific artifact.)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support