rcgale
/

bort-test

@@ -14,15 +14,54 @@ Our use of the AphasiaBank data was governed by the TalkBank consortium's data u
 Limitations exist regarding accents and dialect, which in turn would affect the scenarios in which a system based on our model could (and should) be used.
 It should also be noted that these models and any derived technology are not meant to be tools to diagnose medical conditions, a task best left to qualified clinicians.
-## Pre-trained Model Variants
-- **BORT-PR** (upload ETA ≤ ACL 2023)
-- **BORT-SP** (upload ETA ≤ ACL 2023)
-- **BORT-PR-SP** (upload ETA ≤ ACL 2023)
-- **BORT-PR-NOISY** (upload ETA ≤ ACL 2023)
-- **BORT-SP-NOISY** (upload ETA ≤ ACL 2023)
-- **BORT-PR-SP-NOISY** (upload ETA ≤ ACL 2023)
 ## Wikipedia Dataset Used in Pre-Training
 The BPE-tokenized version of the dataset, including metadata used in word transforms.

 Limitations exist regarding accents and dialect, which in turn would affect the scenarios in which a system based on our model could (and should) be used.
 It should also be noted that these models and any derived technology are not meant to be tools to diagnose medical conditions, a task best left to qualified clinicians.
+## Usage
+### Downloading BORT
+```python
+from transformers import AutoTokenizer, BartForConditionalGeneration
+tokenizer = AutoTokenizer.from_pretrained("rcgale/bort-test")
+model = BartForConditionalGeneration.from_pretrained("rcgale/bort-test")
+```
+The above uses the default variant, `bort-pr-sp-noisy`. Each variant from the paper can be retrieved by specifiying
+like so:
+```python
+BartForConditionalGeneration.from_pretrained("rcgale/bort-test", variant="bort-sp")
+```
+The following variants are available, pre-trained on the specified proportion of each task:
+| Code                 | Pronunciation | Spelling | Noise |
+|----------------------|---------------|----------|-------|
+| `bort-pr`            | 10%           | —        | —     |
+| `bort-sp`            | —             | 10%      | —     |
+| `bort-pr-sp`         | 10%           | 10%      | —     |
+| `bort-pr-noisy`      | 10%           | —        | 5%    |
+| `bort-sp-noisy`      | —             | 10%      | 5%    |
+| `bort-pr-sp-noisy`   | 10%           | 10%      | 5%    |
+## Basic usage
+```python
+in_texts = [
+    "Due to its coastal location, lɔŋ ·aɪlən·d winter temperatures are milder than most of the state.",
+    "Due to its coastal location, lɔŋ ·b·iʧ winter temperatures are milder than most of the state.",
+    "Due to its coastal location, Long ·b·iʧ winter temperatures are milder than most of the state.",
+    "Due to its coastal location, lɔŋfɝd winter temperatures are milder than most of the state.",
+]
+inputs = tokenizer(in_texts, return_tensors="pt", padding=True)
+summary_ids = model.generate(inputs["input_ids"], num_beams=2, min_length=0, max_length=2048)
+decoded = tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
+for in_text, out_text in zip(in_texts, decoded):
+    print(f"In:   \t{in_text}")
+    print(f"Out:  \t{out_text}")
+    print()
+```
 ## Wikipedia Dataset Used in Pre-Training
 The BPE-tokenized version of the dataset, including metadata used in word transforms.