seungwonwonwon
/

CNNED-Protein

Model card Files Files and versions

CNNED-Protein / README.md

seungwonwonwon's picture

Update README.md

4fc5e36 verified 5 months ago

|

history blame contribute delete

1.15 kB

	---
	library_name: "pytorch"
	tags:
	- protein
	- biosequence
	- cnn
	- embedding
	license: apache-2.0
	---

	# CNNED_Protein

	CNN-based embedding model for protein/bio sequences (triplet/contrastive training ready).

	## Model Summary
	- Input: one-hot encoded sequence of shape `(B, A, L)`
	- Encoder: 1D CNN + AvgPooling stacks
	- Output: L2-normalized embedding `(B, D)` via projection head
	- Training: Designed for triplet/contrastive loss (anchor, positive, negative)

	### Config
	- `alphabet_size`: 27
	- `target_size`: 128
	- `channel`: 256
	- `depth`: 3
	- `kernel_size`: 7
	- `l2norm`: True

	## Usage

	```python
	import json, torch
	from safetensors.torch import load_file

	# Load config
	cfg = json.load(open("config.json","r"))
	from model import CNNED_Protein
	model = CNNED_Protein(**cfg).eval()

	# Load weights
	try:
	sd = load_file("model.safetensors")
	except Exception:
	sd = torch.load("model.pt", map_location="cpu")
	model.load_state_dict(sd, strict=True)
	model.eval()

	# Dummy inference
	# x: (B, A, L) one-hot tensor
	x = torch.randn(2, cfg['alphabet_size'], 512)
	y, z = model.encode(x)
	print(y.shape) # (2, target_size)
	```