keyword_bert

Running

App Files Files Community

keyword_bert / model_cards /article.md

jannisborn

update

6938961 unverified about 3 years ago

preview code

raw

history blame contribute delete

3.17 kB

	# Model documentation & parameters

	Algorithm version: The model version to use. Note that any HF model can be wrapped to a `KeyBERT` model.

	Text: The main text prompt to "understand", i.e., generate keywords.

	Minimum keyphrase ngram: Lower bound for phrase size. Each keyword will have at least this many words.

	Maximum keyphrase ngram: Upper bound for phrase size. Each keyword will have at least this many words.

	Stop words: Stopwords to remove from the document. If not provided, no stop words removal.

	Use MaxSum: To diversify the results, we take the `2 x MaxSum candidates` most similar words/phrases to the document. Then, we take all top_n combinations from the `2 x MaxSum candidates` and extract the combination that are the least similar to each other by cosine similarity. Control usage of max sum similarity for keywords generated.

	MaxSum candidates: Candidates considered when enabling `Use MaxSum`.

	Use Max. marginal relevance: To diversify the results, we can use Maximal Margin Relevance (MMR) to create keywords / keyphrases which is also based on cosine similarity.

	Diversity: Diversity for the results when enabling `max. marginal relevance`.

	Number of keywords: How many keywords should be generated (maximal 50).


	# Model card -- KeywordBERT

	Model Details: KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

	Developers: Maarten Grootendorst.

	Distributors: Original developer's code from [https://github.com/MaartenGr/KeyBERT](https://github.com/MaartenGr/KeyBERT).

	Model date: 2020.

	Model type: Different BERT and SciBERT models, trained on [CIRCA data](https://circa.res.ibm.com/index.html).

	Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
	N.A.

	Paper or other resource for more information:
	The [KeyBERT GitHub repo](https://github.com/MaartenGr/KeyBERT).

	License: MIT

	Where to send questions or comments about the model: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

	Intended Use. Use cases that were envisioned during development: N.A.

	Primary intended uses/users: N.A.

	Out-of-scope use cases: Production-level inference.

	Metrics: N.A.

	Datasets: N.A.

	Ethical Considerations: Unclear, please consult with original authors in case of questions.

	Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

	Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

	## Citation
	```bib
	@misc{grootendorst2020keybert,
	author = {Maarten Grootendorst},
	title = {KeyBERT: Minimal keyword extraction with BERT.},
	year = 2020,
	publisher = {Zenodo},
	version = {v0.3.0},
	doi = {10.5281/zenodo.4461265},
	url = {https://doi.org/10.5281/zenodo.4461265}
	}
	```

	# Model documentation & parameters

	Algorithm version: The model version to use. Note that any HF model can be wrapped to a `KeyBERT` model.

	Text: The main text prompt to "understand", i.e., generate keywords.

	Minimum keyphrase ngram: Lower bound for phrase size. Each keyword will have at least this many words.

	Maximum keyphrase ngram: Upper bound for phrase size. Each keyword will have at least this many words.

	Stop words: Stopwords to remove from the document. If not provided, no stop words removal.

	Use MaxSum: To diversify the results, we take the `2 x MaxSum candidates` most similar words/phrases to the document. Then, we take all top_n combinations from the `2 x MaxSum candidates` and extract the combination that are the least similar to each other by cosine similarity. Control usage of max sum similarity for keywords generated.

	MaxSum candidates: Candidates considered when enabling `Use MaxSum`.

	Use Max. marginal relevance: To diversify the results, we can use Maximal Margin Relevance (MMR) to create keywords / keyphrases which is also based on cosine similarity.

	Diversity: Diversity for the results when enabling `max. marginal relevance`.

	Number of keywords: How many keywords should be generated (maximal 50).


	# Model card -- KeywordBERT

	Model Details: KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

	Developers: Maarten Grootendorst.

	Distributors: Original developer's code from [https://github.com/MaartenGr/KeyBERT](https://github.com/MaartenGr/KeyBERT).

	Model date: 2020.

	Model type: Different BERT and SciBERT models, trained on [CIRCA data](https://circa.res.ibm.com/index.html).

	Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
	N.A.

	Paper or other resource for more information:
	The [KeyBERT GitHub repo](https://github.com/MaartenGr/KeyBERT).

	License: MIT

	Where to send questions or comments about the model: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

	Intended Use. Use cases that were envisioned during development: N.A.

	Primary intended uses/users: N.A.

	Out-of-scope use cases: Production-level inference.

	Metrics: N.A.

	Datasets: N.A.

	Ethical Considerations: Unclear, please consult with original authors in case of questions.

	Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

	Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

	## Citation
	```bib
	@misc{grootendorst2020keybert,
	author = {Maarten Grootendorst},
	title = {KeyBERT: Minimal keyword extraction with BERT.},
	year = 2020,
	publisher = {Zenodo},
	version = {v0.3.0},
	doi = {10.5281/zenodo.4461265},
	url = {https://doi.org/10.5281/zenodo.4461265}
	}
	```