BERTose IAR Resolver
This repository contains the contrastively refined BERTose checkpoint used for iterative ambiguity resolution (IAR) over ambiguous WURCS BPE tokens.
Quick Start
The recommended user path is the companion notebook:
from huggingface_hub import hf_hub_download
checkpoint = hf_hub_download(
repo_id="supanthadey1/bertose-iar-resolver",
filename="checkpoints/bertose_iar_resolver.pt",
)
ambiguity_map = hf_hub_download(
repo_id="supanthadey1/bertose-iar-resolver",
filename="vocab/bpe_ambiguity_tokens.json",
)
No Hugging Face token is required for this BERTose IAR checkpoint now that the repository is public.
Files
checkpoints/bertose_iar_resolver.pt- BERTose IAR checkpoint.vocab/bpe_vocabulary.json- WURCS BPE vocabulary.vocab/bpe_ambiguity_tokens.json- ambiguous-token map used by the resolver.src/bertose_model.py- BERTose model definition.src/bertose_layers.py- Transformer layers used by BERTose.src/wurcs_bpe_tokenizer.py- WURCS BPE tokenizer.
Input
Provide one WURCS glycan string or a CSV batch with sample_id,wurcs. The resolver is intended for glycans that already contain uncertainty markers in WURCS form.
Free-text ambiguous glycan names are not parsed directly. Convert the name or IUPAC-condensed notation to WURCS first. If the structure is ambiguous, preserve that ambiguity in the WURCS string with WURCS-style uncertainty markers before running BERTose IAR.
Output
Token-level ambiguity-resolution predictions with confidence scores. The companion notebook writes both summary and detail CSVs for batch runs.
Scope
The resolver provides model-backed token updates and confidence values for ambiguous positions. It does not claim to reconstruct a final canonical WURCS string by itself, and it does not perform IUPAC-condensed/name-to-WURCS conversion.
License metadata is currently other; update it when the final release license and citation text are chosen.
- Downloads last month
- -