Robert Gale commited on
Commit
85097f0
1 Parent(s): 9368dd9
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -45,12 +45,35 @@ The following variants are available, pre-trained on the specified proportion of
45
 
46
  ## Basic usage
47
 
 
 
 
 
 
 
 
48
  ```python
 
 
 
 
 
49
  in_texts = [
 
 
 
 
 
 
 
 
50
  "Due to its coastal location, l蓴艐 路a瑟l蓹n路d winter temperatures are milder than most of the state.",
 
 
 
51
  "Due to its coastal location, l蓴艐 路b路i失 winter temperatures are milder than most of the state.",
52
- "Due to its coastal location, Long 路b路i失 winter temperatures are milder than most of the state.",
53
- "Due to its coastal location, l蓴艐f蓾d winter temperatures are milder than most of the state.",
54
  ]
55
 
56
  inputs = tokenizer(in_texts, return_tensors="pt", padding=True)
 
45
 
46
  ## Basic usage
47
 
48
+ BORT was intended to be fine-tuned to a specific task, but for a basic demonstration of what distinguishes it from
49
+ other LLMs, please consider the following example. The pre-trained model has no issue translating "Long /a瑟l蓹nd/" to
50
+ "Long Island", or "Long /bi失/" to "Long Beach". The next two texts demonstrate the effect of context. While
51
+ "l蓴艐 路a瑟l蓹n路d" still translates to "Long Island", "l蓴艐 路b路i失" bumps up against a homonym, and the model produces
52
+ "long beech". (Note: the bullet character `路` is used to prevent the BPE tokenizer
53
+ from combining phonemes.)
54
+
55
  ```python
56
+ from transformers import AutoTokenizer, BartForConditionalGeneration
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained("rcgale/bort-test")
59
+ model = BartForConditionalGeneration.from_pretrained("rcgale/bort-test")
60
+
61
  in_texts = [
62
+ "Due to its coastal location, Long 路a瑟l蓹n路d winter temperatures are milder than most of the state.",
63
+ # Output:
64
+ # Due to its coastal location, Long Island winter temperatures are milder than most of the state.",
65
+
66
+ "Due to its coastal location, Long 路b路i失 winter temperatures are milder than most of the state.",
67
+ # Output:
68
+ # Due to its coastal location, Long Beach winter temperatures are milder than most of the state.",
69
+
70
  "Due to its coastal location, l蓴艐 路a瑟l蓹n路d winter temperatures are milder than most of the state.",
71
+ # Output:
72
+ # Due to its coastal location, Long Island winter temperatures are milder than most of the state.",
73
+
74
  "Due to its coastal location, l蓴艐 路b路i失 winter temperatures are milder than most of the state.",
75
+ # Output:
76
+ # Due to its coastal location, long beech winter temperatures are milder than most of the state.",
77
  ]
78
 
79
  inputs = tokenizer(in_texts, return_tensors="pt", padding=True)