Spaces:
Running
Running
| # Model documentation & parameters | |
| **Algorithm version**: The model version to use. Note that *any* HF model can be wrapped to a `KeyBERT` model. | |
| **Text**: The main text prompt to "understand", i.e., generate keywords. | |
| **Minimum keyphrase ngram**: Lower bound for phrase size. Each keyword will have at least this many words. | |
| **Maximum keyphrase ngram**: Upper bound for phrase size. Each keyword will have at least this many words. | |
| **Stop words**: Stopwords to remove from the document. If not provided, no stop words removal. | |
| **Use MaxSum**: To diversify the results, we take the `2 x MaxSum candidates` most similar words/phrases to the document. Then, we take all top_n combinations from the `2 x MaxSum candidates` and extract the combination that are the least similar to each other by cosine similarity. Control usage of max sum similarity for keywords generated. | |
| **MaxSum candidates**: Candidates considered when enabling `Use MaxSum`. | |
| **Use Max. marginal relevance**: To diversify the results, we can use Maximal Margin Relevance (MMR) to create keywords / keyphrases which is also based on cosine similarity. | |
| **Diversity**: Diversity for the results when enabling `max. marginal relevance`. | |
| **Number of keywords**: How many keywords should be generated (maximal 50). | |
| # Model card -- KeywordBERT | |
| **Model Details**: KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document. | |
| **Developers**: Maarten Grootendorst. | |
| **Distributors**: Original developer's code from [https://github.com/MaartenGr/KeyBERT](https://github.com/MaartenGr/KeyBERT). | |
| **Model date**: 2020. | |
| **Model type**: Different BERT and SciBERT models, trained on [CIRCA data](https://circa.res.ibm.com/index.html). | |
| **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: | |
| N.A. | |
| **Paper or other resource for more information**: | |
| The [KeyBERT GitHub repo](https://github.com/MaartenGr/KeyBERT). | |
| **License**: MIT | |
| **Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core). | |
| **Intended Use. Use cases that were envisioned during development**: N.A. | |
| **Primary intended uses/users**: N.A. | |
| **Out-of-scope use cases**: Production-level inference. | |
| **Metrics**: N.A. | |
| **Datasets**: N.A. | |
| **Ethical Considerations**: Unclear, please consult with original authors in case of questions. | |
| **Caveats and Recommendations**: Unclear, please consult with original authors in case of questions. | |
| Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs) | |
| ## Citation | |
| ```bib | |
| @misc{grootendorst2020keybert, | |
| author = {Maarten Grootendorst}, | |
| title = {KeyBERT: Minimal keyword extraction with BERT.}, | |
| year = 2020, | |
| publisher = {Zenodo}, | |
| version = {v0.3.0}, | |
| doi = {10.5281/zenodo.4461265}, | |
| url = {https://doi.org/10.5281/zenodo.4461265} | |
| } | |
| ``` |