Instructions to use samrawal/medical-sentence-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use samrawal/medical-sentence-tokenizer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="samrawal/medical-sentence-tokenizer")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("samrawal/medical-sentence-tokenizer") model = AutoModelForTokenClassification.from_pretrained("samrawal/medical-sentence-tokenizer") - Notebooks
- Google Colab
- Kaggle
clinitokenizer is a sentence tokenizer for clinical text to split unstructured text from clinical text (such as Electronic Medical Records) into individual sentences.
To use this model, see the clinitokenizer repository.
General English sentence tokenizers are often unable to correctly parse medical abbreviations, jargon, and other conventions often used in medical records (see "Motivating Examples" section below). clinitokenizer is specifically trained on medical record data and can perform better in these situations (conversely, for non-domain specific use, using more general sentence tokenizers may yield better results).
The model has been trained on multiple datasets provided by i2b2 (now n2c2). Please visit the n2c2 site to request access to the dataset.
- Downloads last month
- 9