Commit
·
96251f2
1
Parent(s):
83b6e21
Update README.md
Browse files
README.md
CHANGED
|
@@ -43,19 +43,60 @@ Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma,
|
|
| 43 |
* This keyphrase generation model is very domain-specific and will perform very well on abstracts of scientific papers. It's not recommended to use this model for other domains, but you are free to test it out.
|
| 44 |
* Only works for English documents.
|
| 45 |
* For a custom model, please consult the training notebook for more information (link incoming).
|
|
|
|
| 46 |
|
| 47 |
### ❓ How to use
|
| 48 |
```python
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
```
|
| 51 |
|
| 52 |
```python
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```
|
| 57 |
# Output
|
| 58 |
-
|
|
|
|
| 59 |
```
|
| 60 |
|
| 61 |
## 📚 Training Dataset
|
|
|
|
| 43 |
* This keyphrase generation model is very domain-specific and will perform very well on abstracts of scientific papers. It's not recommended to use this model for other domains, but you are free to test it out.
|
| 44 |
* Only works for English documents.
|
| 45 |
* For a custom model, please consult the training notebook for more information (link incoming).
|
| 46 |
+
* Sometimes the output can make no sense.
|
| 47 |
|
| 48 |
### ❓ How to use
|
| 49 |
```python
|
| 50 |
+
# Model parameters
|
| 51 |
+
from transformers import (
|
| 52 |
+
Text2TextGenerationPipeline,
|
| 53 |
+
BartForConditionalGeneration,
|
| 54 |
+
AutoTokenizer,
|
| 55 |
+
)
|
| 56 |
+
import numpy as np
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
class KeyphraseGenerationPipeline(Text2TextGenerationPipeline):
|
| 60 |
+
def __init__(self, model, keyphrase_sep_token=";", *args, **kwargs):
|
| 61 |
+
super().__init__(
|
| 62 |
+
model=BartForConditionalGeneration.from_pretrained(model),
|
| 63 |
+
tokenizer=AutoTokenizer.from_pretrained(model),
|
| 64 |
+
*args,
|
| 65 |
+
**kwargs
|
| 66 |
+
)
|
| 67 |
+
self.keyphrase_sep_token = keyphrase_sep_token
|
| 68 |
+
|
| 69 |
+
def postprocess(self, model_outputs):
|
| 70 |
+
results = super().postprocess(
|
| 71 |
+
model_outputs=model_outputs
|
| 72 |
+
)
|
| 73 |
+
return np.unique([result.strip() for result in results[0].get("generated_text").split(self.keyphrase_sep_token)])
|
| 74 |
```
|
| 75 |
|
| 76 |
```python
|
| 77 |
+
model_name = "DeDeckerThomas/keyphrase-generation-keybart-inspec"
|
| 78 |
+
generator = KeyphraseGenerationPipeline(model=model_name)
|
| 79 |
|
| 80 |
+
```python
|
| 81 |
+
text = """
|
| 82 |
+
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
| 83 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
| 84 |
+
Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
|
| 85 |
+
The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
|
| 86 |
+
Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …), keyphrase extraction can be improved.
|
| 87 |
+
These new methods also focus on the semantics and context of a document, which is quite an improvement.
|
| 88 |
+
""".replace(
|
| 89 |
+
"\n", ""
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
keyphrases = generator(text)
|
| 93 |
+
|
| 94 |
+
print(keyphrases)
|
| 95 |
|
| 96 |
```
|
| 97 |
# Output
|
| 98 |
+
['artificial intelligence' 'classical machine learning methods'
|
| 99 |
+
'keyphrase extraction' 'lingu' 'statistics' 'text analysis']
|
| 100 |
```
|
| 101 |
|
| 102 |
## 📚 Training Dataset
|