| --- |
| library_name: pytorch |
| license: other |
| tags: |
| - glycans |
| - wurcs |
| - bertose |
| - ambiguity-resolution |
| - contrastive-learning |
| - pytorch |
| --- |
| |
| # BERTose IAR Resolver |
|
|
| This repository contains the contrastively refined BERTose checkpoint used for iterative ambiguity resolution (IAR) over ambiguous WURCS BPE tokens. |
|
|
| ## Quick Start |
|
|
| The recommended user path is the companion notebook: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| checkpoint = hf_hub_download( |
| repo_id="supanthadey1/bertose-iar-resolver", |
| filename="checkpoints/bertose_iar_resolver.pt", |
| ) |
| ambiguity_map = hf_hub_download( |
| repo_id="supanthadey1/bertose-iar-resolver", |
| filename="vocab/bpe_ambiguity_tokens.json", |
| ) |
| ``` |
|
|
| No Hugging Face token is required for this BERTose IAR checkpoint now that the repository is public. |
|
|
| ## Files |
|
|
| - `checkpoints/bertose_iar_resolver.pt` - BERTose IAR checkpoint. |
| - `vocab/bpe_vocabulary.json` - WURCS BPE vocabulary. |
| - `vocab/bpe_ambiguity_tokens.json` - ambiguous-token map used by the resolver. |
| - `src/bertose_model.py` - BERTose model definition. |
| - `src/bertose_layers.py` - Transformer layers used by BERTose. |
| - `src/wurcs_bpe_tokenizer.py` - WURCS BPE tokenizer. |
|
|
| ## Input |
|
|
| Provide one WURCS glycan string or a CSV batch with `sample_id,wurcs`. The resolver is intended for glycans that already contain uncertainty markers in WURCS form. |
|
|
| Free-text ambiguous glycan names are not parsed directly. Convert the name or IUPAC-condensed notation to WURCS first. If the structure is ambiguous, preserve that ambiguity in the WURCS string with WURCS-style uncertainty markers before running BERTose IAR. |
|
|
| ## Output |
|
|
| Token-level ambiguity-resolution predictions with confidence scores. The companion notebook writes both summary and detail CSVs for batch runs. |
|
|
| ## Scope |
|
|
| The resolver provides model-backed token updates and confidence values for ambiguous positions. It does not claim to reconstruct a final canonical WURCS string by itself, and it does not perform IUPAC-condensed/name-to-WURCS conversion. |
|
|
| License metadata is currently `other`; update it when the final release license and citation text are chosen. |
|
|