gbyuvd
/

FastChemTokenizer

Feature Extraction

Model card Files Files and versions

gbyuvd commited on Sep 19, 2025

Commit

d378b8a

·

verified ·

1 Parent(s): f5b18a8

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -49,10 +49,10 @@ Trained on ~2.7M valid SMILES built and curated from ChemBL34 (Zdrazil _et al._
 ## 🛠️ Implementation
-- **Algorithm**: Trie-based longest-prefix-match (no regex, no BPE)
 - **Caching**: `@lru_cache` for repeated string encoding
 - **HF Compatible**: Implements `__call__`, `encode_plus`, `batch_encode_plus`, `save_pretrained`, `from_pretrained`
-- **Memory Efficient**: No token set — pure trie traversal
 ```python
 from FastChemTokenizer import FastChemTokenizer
@@ -175,4 +175,4 @@ Apache 2.0
   pages = {D654-D659},
   doi = {10.1093/nar/gkac1008}
 }
-```

 ## 🛠️ Implementation
+- **Algorithm**: Trie-based longest-prefix-match
 - **Caching**: `@lru_cache` for repeated string encoding
 - **HF Compatible**: Implements `__call__`, `encode_plus`, `batch_encode_plus`, `save_pretrained`, `from_pretrained`
+- **Memory Efficient**: Trie traversal and cache
 ```python
 from FastChemTokenizer import FastChemTokenizer
   pages = {D654-D659},
   doi = {10.1093/nar/gkac1008}
 }
+```