|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- biology |
|
|
- genomics |
|
|
- codon-optimization |
|
|
- p-adic-math |
|
|
- hyperbolic-geometry |
|
|
- ddg-prediction |
|
|
license: other |
|
|
metrics: |
|
|
- spearmanr |
|
|
--- |
|
|
|
|
|
# Ternary Codon Encoder: P-adic Hyperbolic Embeddings |
|
|
|
|
|
The Ternary Codon Encoder is a neural embedding model that maps the 64 genetic codons into a 16-dimensional hyperbolic space. It is the first model to explicitly use **3-adic valuation** as a mathematical prior to organize the genetic code's hierarchical structure. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture:** MLP-based encoder (12-dim one-hot input $
ightarrow$ 16-dim hyperbolic output). |
|
|
- **Mathematical Foundation:** Leverages 3-adic mathematics to represent the discrete hierarchy of the codon table. |
|
|
- **Latent Space:** Poincaré ball where radial distance encodes 3-adic valuation (conservation/variability). |
|
|
|
|
|
## Key Discoveries |
|
|
|
|
|
- **Physics Dimension:** Latent dimension 13 correlates strongly ($
ho = -0.70$) with molecular mass, volume, and force constants ($k$). |
|
|
- **Linear Stability Manifold:** Provides high-quality feature vectors for sequence-only protein stability ($\Delta\Delta G$) prediction. |
|
|
- **Synonymous Cohesion:** Synonymous codons cluster together in hyperbolic space while maintaining clear boundaries between amino acid groups. |
|
|
|
|
|
## Performance |
|
|
|
|
|
- **DDG Spearman $
ho$:** 0.614 (Sequence-only benchmarking on diverse datasets). |
|
|
- **Improvement:** +105% over baseline p-adic embedding models. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from trainable_codon_encoder import TrainableCodonEncoder |
|
|
|
|
|
# Load model |
|
|
encoder = TrainableCodonEncoder(latent_dim=16, hidden_dim=64) |
|
|
checkpoint = torch.load("pytorch_model.bin", map_location="cpu") |
|
|
encoder.load_state_dict(checkpoint["model_state_dict"]) |
|
|
encoder.eval() |
|
|
|
|
|
# Get embedding for a codon (e.g., ATG index 14) |
|
|
codon_idx = torch.tensor([14]) |
|
|
with torch.no_grad(): |
|
|
z_hyp = encoder(codon_idx) |
|
|
|
|
|
print(f"Hyperbolic Embedding: {z_hyp}") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{ternary_codon_2026, |
|
|
author = {AI Whisperers}, |
|
|
title = {Ternary Codon Encoder: P-adic Hyperbolic Embeddings}, |
|
|
year = {2026}, |
|
|
url = {https://huggingface.co/ai-whisperers/ternary-codon-encoder} |
|
|
} |
|
|
``` |
|
|
|