LOCO Protein Models for reelGene

This directory contains pre-trained Leave-One-Chromosome-Out (LOCO) protein models used by the reelGene pipeline to evaluate plant gene models. The JSON files were developed in the Buckler Lab and were downloaded from the lab’s Bitbucket repository:

This model bundle is linked from the GeneCAD project to support protein-level screening of predicted gene models:


Overview

reelGene is a gene model evaluation framework that learns conserved protein-sequence “grammar” across related species to distinguish functional genes from likely misannotations. These LOCO models provide the protein language–model parameters required for reelGene scoring.


What is LOCO?

LOCO (Leave-One-Chromosome-Out) is a cross-validation strategy where each model is trained on proteins from all chromosomes except one, and then used to score proteins on the held-out chromosome. This reduces information leakage and overfitting to chromosome-specific features.


Relationship to GeneCAD

GeneCAD is a sequence-only plant genome annotation system that predicts complete gene models directly from DNA, without requiring species-matched RNA-seq, proteomics, or homology inputs. It combines:

  • conservation-aware representations from the PlantCAD2 DNA foundation model,
  • a lightweight ModernBERT head for single-nucleotide labeling,
  • chromosome-scale CRF decoding enforcing gene-structure constraints, and
  • a protein language–model screen to suppress repeat-driven ORFs.

These LOCO protein models are linked from GeneCAD so users can apply reelGene-style protein plausibility scoring to GeneCAD predictions.


Contents

This directory typically contains one JSON file per held-out chromosome, for example:

LOCO_models/
├── chr01.json
├── chr02.json
├── ...
└── chr10.json
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support