Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,68 @@ pipeline_tag: fill-mask
|
|
| 9 |
widget:
|
| 10 |
- text: летнее легкое
|
| 11 |
library_name: transformers
|
| 12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
widget:
|
| 10 |
- text: летнее легкое
|
| 11 |
library_name: transformers
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
A model for solving the problem of missing words in search queries. The model uses the context of the query to generate possible words that could be missing.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
|
| 19 |
+
## don't forget
|
| 20 |
+
# pip install protobuf sentencepiece
|
| 21 |
+
|
| 22 |
+
from transformers import pipeline
|
| 23 |
+
unmasker = pipeline('fill-mask', model='fkrasnov2/COLD2', device='cuda')
|
| 24 |
+
unmasker("электроника зарядка [MASK] USB")
|
| 25 |
+
|
| 26 |
+
[{'score': 0.3712620437145233,
|
| 27 |
+
'token': 1131,
|
| 28 |
+
'token_str': 'автомобильная',
|
| 29 |
+
'sequence': 'электроника зарядка автомобильная usb'},
|
| 30 |
+
{'score': 0.12239563465118408,
|
| 31 |
+
'token': 7436,
|
| 32 |
+
'token_str': 'быстрая',
|
| 33 |
+
'sequence': 'электроника зарядка быстрая usb'},
|
| 34 |
+
{'score': 0.046715956181287766,
|
| 35 |
+
'token': 5819,
|
| 36 |
+
'token_str': 'проводная',
|
| 37 |
+
'sequence': 'электроника зарядка проводная usb'},
|
| 38 |
+
{'score': 0.031308457255363464,
|
| 39 |
+
'token': 635,
|
| 40 |
+
'token_str': 'универсальная',
|
| 41 |
+
'sequence': 'электроника зарядка универсальная usb'},
|
| 42 |
+
{'score': 0.02941182069480419,
|
| 43 |
+
'token': 2371,
|
| 44 |
+
'token_str': 'адаптер',
|
| 45 |
+
'sequence': 'электроника зарядка адаптер usb'}]
|
| 46 |
+
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Related prepositions and prompts can be used to improve tokenization.
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
unmasker("одежда женское [MASK] для_праздника")
|
| 54 |
+
|
| 55 |
+
[{'score': 0.9355553984642029,
|
| 56 |
+
'token': 503,
|
| 57 |
+
'token_str': 'платье',
|
| 58 |
+
'sequence': 'одежда женское платье для_праздника'},
|
| 59 |
+
{'score': 0.011321154423058033,
|
| 60 |
+
'token': 615,
|
| 61 |
+
'token_str': 'кольцо',
|
| 62 |
+
'sequence': 'одежда женское кольцо для_праздника'},
|
| 63 |
+
{'score': 0.008672593161463737,
|
| 64 |
+
'token': 993,
|
| 65 |
+
'token_str': 'украшение',
|
| 66 |
+
'sequence': 'одежда женское украшение для_праздника'},
|
| 67 |
+
{'score': 0.0038903721142560244,
|
| 68 |
+
'token': 27100,
|
| 69 |
+
'token_str': 'пончо',
|
| 70 |
+
'sequence': 'одежда женское пончо для_праздника'},
|
| 71 |
+
{'score': 0.003703165566548705,
|
| 72 |
+
'token': 453,
|
| 73 |
+
'token_str': 'белье',
|
| 74 |
+
'sequence': 'одежда женское белье для_праздника'}]
|
| 75 |
+
|
| 76 |
+
```
|