Taykhoom
/

mRNA-FM

 layer6_emb = out_all.hidden_states[6]
 ```
+### CDS-aware embedding (mRNA sequences)
+For mRNA sequences with a CDS track, use `batch_encode_with_cds` to apply T→U conversion,
+extract only the coding region, chunk to codon boundaries, and encode — all in one call.
+```python
+import numpy as np
+import torch
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("Taykhoom/mRNA-FM", trust_remote_code=True)
+model = AutoModel.from_pretrained("Taykhoom/mRNA-FM", trust_remote_code=True)
+model.eval()
+# Binary CDS track: 1 at the first nucleotide of each codon in the CDS, 0 elsewhere
+sequences = ["ATGCTAGCTAGCTAGCTATGCTAGCTAGCTAGCT"]
+cds = [np.array([0]*5 + [1, 0, 0]*9 + [0]*2)]  # example
+enc, chunk_counts = tokenizer.batch_encode_with_cds(
+    sequences, cds, return_tensors="pt", padding=True, add_special_tokens=True
+)
+with torch.no_grad():
+    out = model(**enc)
+# chunk_counts[i] = number of chunks produced for sequences[i]
+# mean-pool non-special tokens for each sequence:
+hidden = out.last_hidden_state  # (total_chunks, seq_len, 1280)
+```
 ### MLM logits
 ```python