FRACTAL 3B - A Protein Structure Predictor

FRACTAL (Framework for Representation-guided Atomic ConsTruction & ALignment) is a constraint-based protein structure prediction system that leverages pre-trained protein language model representations for geometric inference. The system employs a two-stage architecture: neural constraint prediction followed by physics-guided deterministic folding.

Model Description

FRACTAL utilizes the ESM-2 3B parameter model (esm2_t36_3B_UR50D) as a frozen feature extractor, with specialized prediction heads trained to infer geometric constraints from sequence-derived embeddings. This approach decouples representation learning from structural assembly, enabling efficient training and interpretable intermediate outputs.

Architecture

The model architecture consists of:

Encoder: ESM-2 3B (36 layers, 2560-dimensional embeddings) - parameters frozen
Pairwise Constraint Head: Predicts inter-residue distance distributions and binary contact maps through outer product attention and convolutional processing
Torsion Prediction Head: Multi-layer perceptron for backbone dihedral angle prediction (phi, psi, omega)
Confidence Estimation: Per-residue quality score prediction (pLDDT-style metric)

The constraint predictor outputs:

Distance probability distributions over 64 bins spanning 0-20 Angstroms
Binary contact predictions at 8 Angstrom threshold
Backbone torsion angles with circular loss formulation
Per-residue confidence scores ranging from 0-100

Training Procedure

Dataset: Curated subset of approximately 1000 high-resolution crystal structures from the Protein Data Bank (resolution < 2.0Å, R-free < 0.25).

Training Configuration:

Optimizer: AdamW (learning rate: 5e-5, weight decay: 0.01)
Batch size: 1 sequence per device with 16 gradient accumulation steps (effective batch size: 16)
Hardware: 2x NVIDIA Tesla T4 GPUs (16GB VRAM each)
Training duration: Approximately 3-4 hours
Mixed precision training (fp16) enabled

Loss Function: Weighted multi-task objective combining:

Distance prediction: Binned cross-entropy loss with Gaussian distance matrix targets
Contact prediction: Binary cross-entropy with distance-based labeling
Torsion angles: Mean squared error with circular boundary conditions
Confidence: Mean squared error against structure-derived pLDDT scores

Two-Stage Pipeline

Constraint Prediction (this model): Neural network predicts geometric constraints from amino acid sequence
Deterministic Folding (separate module): Gradient-based optimization converts constraints into 3D atomic coordinates by minimizing constraint violation energy

This separation enables rapid experimentation with folding algorithms without retraining the neural components.

Installation

# Install via pip
pip install fractalml

# Alternative: Local installation
git clone https://github.com/Aayan-Mishra/FractalGPT.git
cd FRACTAL
pip install -e .

System Requirements:

Python 3.8 or higher
PyTorch 2.0 or higher
CUDA 11.8 or higher (for GPU acceleration)
Minimum 8GB GPU memory recommended

Usage

WebUI

fractal webui

Python API

from fractal.models import ConstraintPredictor
from fractal.geometry.folding import fold_from_constraints

# Initialize pre-trained model
model = ConstraintPredictor.from_pretrained(
    "Fractal-Labs/FRACTAL-1-3B",
    device="cuda"  # Use "cpu" for CPU-only inference
)

# Example: Predict structure for a protein sequence
sequence = "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL"

# Stage 1: Predict geometric constraints
predictions = model.predict_from_sequence(sequence, device="cuda")

# Stage 2: Fold to 3D coordinates
structure = fold_from_constraints(
    predictions,
    num_steps=1000,        # Optimization iterations
    lr=0.01,               # Learning rate for gradient descent
    device="cuda"
)

# Export structure
structure.to_pdb("predicted_structure.pdb")

Command-Line Interface

# End-to-end prediction with visualization
fractal fold input.fasta --checkpoint Fractal-Labs/FRACTAL-1-3B --viz

# Outputs:
# - input.pdb: 3D structure in PDB format
# - input.html: Interactive 3D viewer (Plotly-based)
# - input.png: Static structural rendering

Constraint-Only Prediction

For applications requiring only geometric constraints without 3D folding:

from fractal.models import ConstraintPredictor

model = ConstraintPredictor.from_pretrained("Spestly/FRACTAL-1-3B")
predictions = model.predict_from_sequence(sequence)

# Access individual constraint predictions
distance_distribution = predictions.distance_logits  # Shape: [L, L, 64]
contact_map = predictions.contact_logits             # Shape: [L, L]
backbone_angles = predictions.torsion_angles         # Shape: [L, 3]
quality_scores = predictions.confidence              # Shape: [L]

Performance Characteristics

Computational Requirements

Constraint prediction: 2-5 seconds for typical proteins (150-300 residues) on T4 GPU
Deterministic folding: 30-60 seconds for medium-sized proteins (150-300 residues)
Memory usage: Approximately 6-8GB GPU memory for sequences up to 512 residues

Limitations

Sequence length: Maximum 1024 residues due to ESM-2 positional encoding constraints
Training data coverage: Limited to approximately 1000 structures; generalisation to novel folds or rare protein families may be reduced
Prediction accuracy: This model is designed for research and educational purposes. It does not achieve state-of-the-art accuracy comparable to AlphaFold2/3 or RoseTTAFold on standardized benchmarks
Multimer prediction: Current implementation supports monomeric structures only
Post-translational modifications: Not explicitly modeled
Co-factor binding: Metal ions and small molecule co-factors are not predicted

Known Issues

Deterministic folding may converge to local minima for proteins with complex topologies
Confidence scores are less calibrated than AlphaFold2 pLDDT scores
Performance on disordered regions and long loops is limited

Model Card

Model Details

Developed by: Aayan Mishra (Fractal Labs)
Model type: Constraint-based protein structure predictor
Language: Protein amino acid sequences
License: FOCL
Base model: ESM-2 3B (facebook/esm2_t36_3B_UR50D)

Intended Use

Primary intended uses:

Educational demonstrations of protein structure prediction concepts
Research prototyping for constraint-based folding algorithms
Generating initial structural hypotheses for further refinement
Teaching protein bioinformatics and structural biology

Out-of-scope uses:

Clinical or diagnostic applications
Drug discovery without extensive validation
High-stakes production deployments requiring state-of-the-art accuracy

Ethical Considerations

Protein structure prediction models may be used to design novel proteins with potentially harmful applications. Users should follow established biosafety guidelines and ethical frameworks when working with predicted structures, particularly for:

Toxin or venom protein engineering
Pathogen-related research
Dual-use biological technologies

Citation

If you use FRACTAL in your research, please cite the ESM-2 foundation model aswell as the FRACTAL Model:

@software{fractal2025,
  title = {FRACTAL: Framework for Representation-guided Atomic Construction and Alignment},
  author = {Mishra, Aayan},
  year = {2025},
  url = {https://github.com/Aayan-Mishra/FractalGPT}
}

@article{lin2023evolutionary,
  title={Evolutionary-scale prediction of atomic-level protein structure with a language model},
  author={Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yair and Fazel-Zarandi, Maryam and Sercu, Tom and Candido, Sal and Rives, Alexander},
  journal={Science},
  volume={379},
  number={6637},
  pages={1123--1130},
  year={2023},
  publisher={American Association for the Advancement of Science}
}

Additional Resources

GitHub Repository: https://github.com/Aayan-Mishra/FractalGPT
Issue Tracker: https://github.com/Aayan-Mishra/FractalGPT/issues
ESM-2 Documentation: https://github.com/facebookresearch/esm

Changelog

Version 1.0 (Current)

Initial release with ESM-2 3B backbone
Support for sequences up to 1024 residues
Multi-task constraint prediction heads
Deterministic gradient-based folding

Acknowledgments

This work builds upon the ESM-2 protein language model developed by Meta AI Research. We thank the Protein Data Bank for providing high-quality structural data and Kaggle for computational resources.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Fractal-Labs/FRACTAL-1-3B

FRACTAL-1

Collection

1 item • Updated 10 days ago