Banner

FRACTAL 3B - A Protein Structure Predictor

FRACTAL (Framework for Representation-guided Atomic ConsTruction & ALignment) is a constraint-based protein structure prediction system that leverages pre-trained protein language model representations for geometric inference. The system employs a two-stage architecture: neural constraint prediction followed by physics-guided deterministic folding.

Model Description

FRACTAL utilizes the ESM-2 3B parameter model (esm2_t36_3B_UR50D) as a frozen feature extractor, with specialized prediction heads trained to infer geometric constraints from sequence-derived embeddings. This approach decouples representation learning from structural assembly, enabling efficient training and interpretable intermediate outputs.

Architecture

The model architecture consists of:

  • Encoder: ESM-2 3B (36 layers, 2560-dimensional embeddings) - parameters frozen
  • Pairwise Constraint Head: Predicts inter-residue distance distributions and binary contact maps through outer product attention and convolutional processing
  • Torsion Prediction Head: Multi-layer perceptron for backbone dihedral angle prediction (phi, psi, omega)
  • Confidence Estimation: Per-residue quality score prediction (pLDDT-style metric)

The constraint predictor outputs:

  • Distance probability distributions over 64 bins spanning 0-20 Angstroms
  • Binary contact predictions at 8 Angstrom threshold
  • Backbone torsion angles with circular loss formulation
  • Per-residue confidence scores ranging from 0-100

Training Procedure

Dataset: Curated subset of approximately 1000 high-resolution crystal structures from the Protein Data Bank (resolution < 2.0ร…, R-free < 0.25).

Training Configuration:

  • Optimizer: AdamW (learning rate: 5e-5, weight decay: 0.01)
  • Batch size: 1 sequence per device with 16 gradient accumulation steps (effective batch size: 16)
  • Hardware: 2x NVIDIA Tesla T4 GPUs (16GB VRAM each)
  • Training duration: Approximately 3-4 hours
  • Mixed precision training (fp16) enabled

Loss Function: Weighted multi-task objective combining:

  • Distance prediction: Binned cross-entropy loss with Gaussian distance matrix targets
  • Contact prediction: Binary cross-entropy with distance-based labeling
  • Torsion angles: Mean squared error with circular boundary conditions
  • Confidence: Mean squared error against structure-derived pLDDT scores

Two-Stage Pipeline

  1. Constraint Prediction (this model): Neural network predicts geometric constraints from amino acid sequence
  2. Deterministic Folding (separate module): Gradient-based optimization converts constraints into 3D atomic coordinates by minimizing constraint violation energy

This separation enables rapid experimentation with folding algorithms without retraining the neural components.

Installation

# Install via pip
pip install fractalml

# Alternative: Local installation
git clone https://github.com/Aayan-Mishra/FractalGPT.git
cd FRACTAL
pip install -e .

System Requirements:

  • Python 3.8 or higher
  • PyTorch 2.0 or higher
  • CUDA 11.8 or higher (for GPU acceleration)
  • Minimum 8GB GPU memory recommended

Usage

WebUI

fractal webui

Python API

from fractal.models import ConstraintPredictor
from fractal.geometry.folding import fold_from_constraints

# Initialize pre-trained model
model = ConstraintPredictor.from_pretrained(
    "Fractal-Labs/FRACTAL-1-3B",
    device="cuda"  # Use "cpu" for CPU-only inference
)

# Example: Predict structure for a protein sequence
sequence = "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL"

# Stage 1: Predict geometric constraints
predictions = model.predict_from_sequence(sequence, device="cuda")

# Stage 2: Fold to 3D coordinates
structure = fold_from_constraints(
    predictions,
    num_steps=1000,        # Optimization iterations
    lr=0.01,               # Learning rate for gradient descent
    device="cuda"
)

# Export structure
structure.to_pdb("predicted_structure.pdb")

Command-Line Interface

# End-to-end prediction with visualization
fractal fold input.fasta --checkpoint Fractal-Labs/FRACTAL-1-3B --viz

# Outputs:
# - input.pdb: 3D structure in PDB format
# - input.html: Interactive 3D viewer (Plotly-based)
# - input.png: Static structural rendering

Constraint-Only Prediction

For applications requiring only geometric constraints without 3D folding:

from fractal.models import ConstraintPredictor

model = ConstraintPredictor.from_pretrained("Spestly/FRACTAL-1-3B")
predictions = model.predict_from_sequence(sequence)

# Access individual constraint predictions
distance_distribution = predictions.distance_logits  # Shape: [L, L, 64]
contact_map = predictions.contact_logits             # Shape: [L, L]
backbone_angles = predictions.torsion_angles         # Shape: [L, 3]
quality_scores = predictions.confidence              # Shape: [L]

Performance Characteristics

Computational Requirements

  • Constraint prediction: 2-5 seconds for typical proteins (150-300 residues) on T4 GPU
  • Deterministic folding: 30-60 seconds for medium-sized proteins (150-300 residues)
  • Memory usage: Approximately 6-8GB GPU memory for sequences up to 512 residues

Limitations

  • Sequence length: Maximum 1024 residues due to ESM-2 positional encoding constraints
  • Training data coverage: Limited to approximately 1000 structures; generalisation to novel folds or rare protein families may be reduced
  • Prediction accuracy: This model is designed for research and educational purposes. It does not achieve state-of-the-art accuracy comparable to AlphaFold2/3 or RoseTTAFold on standardized benchmarks
  • Multimer prediction: Current implementation supports monomeric structures only
  • Post-translational modifications: Not explicitly modeled
  • Co-factor binding: Metal ions and small molecule co-factors are not predicted

Known Issues

  • Deterministic folding may converge to local minima for proteins with complex topologies
  • Confidence scores are less calibrated than AlphaFold2 pLDDT scores
  • Performance on disordered regions and long loops is limited

Model Card

Model Details

  • Developed by: Aayan Mishra (Fractal Labs)
  • Model type: Constraint-based protein structure predictor
  • Language: Protein amino acid sequences
  • License: FOCL
  • Base model: ESM-2 3B (facebook/esm2_t36_3B_UR50D)

Intended Use

Primary intended uses:

  • Educational demonstrations of protein structure prediction concepts
  • Research prototyping for constraint-based folding algorithms
  • Generating initial structural hypotheses for further refinement
  • Teaching protein bioinformatics and structural biology

Out-of-scope uses:

  • Clinical or diagnostic applications
  • Drug discovery without extensive validation
  • High-stakes production deployments requiring state-of-the-art accuracy

Ethical Considerations

Protein structure prediction models may be used to design novel proteins with potentially harmful applications. Users should follow established biosafety guidelines and ethical frameworks when working with predicted structures, particularly for:

  • Toxin or venom protein engineering
  • Pathogen-related research
  • Dual-use biological technologies

Citation

If you use FRACTAL in your research, please cite the ESM-2 foundation model aswell as the FRACTAL Model:

@software{fractal2025,
  title = {FRACTAL: Framework for Representation-guided Atomic Construction and Alignment},
  author = {Mishra, Aayan},
  year = {2025},
  url = {https://github.com/Aayan-Mishra/FractalGPT}
}
@article{lin2023evolutionary,
  title={Evolutionary-scale prediction of atomic-level protein structure with a language model},
  author={Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yair and Fazel-Zarandi, Maryam and Sercu, Tom and Candido, Sal and Rives, Alexander},
  journal={Science},
  volume={379},
  number={6637},
  pages={1123--1130},
  year={2023},
  publisher={American Association for the Advancement of Science}
}

Additional Resources

Changelog

Version 1.0 (Current)

  • Initial release with ESM-2 3B backbone
  • Support for sequences up to 1024 residues
  • Multi-task constraint prediction heads
  • Deterministic gradient-based folding

Acknowledgments

This work builds upon the ESM-2 protein language model developed by Meta AI Research. We thank the Protein Data Bank for providing high-quality structural data and Kaggle for computational resources.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including Fractal-Labs/FRACTAL-1-3B