Add BERTopic model

4b7d29d over 2 years ago

5.23 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# transformers_issues_topics

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("sweetapplee/transformers_issues_topics")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 30
	* Number of training documents: 9000

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| bert - tokenizer - tokenizers - pretrained - pytorch \| 16 \| -1_bert_tokenizer_tokenizers_pretrained \|
	\| 0 \| tokenization - tokenizer - tokenizers - token - tokens \| 2184 \| 0_tokenization_tokenizer_tokenizers_token \|
	\| 1 \| tf - tpu - t5 - tftrainer - onnx \| 1761 \| 1_tf_tpu_t5_tftrainer \|
	\| 2 \| modelcard - modelcards - card - model - cards \| 939 \| 2_modelcard_modelcards_card_model \|
	\| 3 \| importerror - attributeerror - valueerror - typeerror - runmlmpy \| 486 \| 3_importerror_attributeerror_valueerror_typeerror \|
	\| 4 \| doc - docstring - docstrings - docs - document \| 449 \| 4_doc_docstring_docstrings_docs \|
	\| 5 \| albertforpretraining - xlnet - albertbasev2 - albertformaskedlm - xlnetlmheadmodel \| 400 \| 5_albertforpretraining_xlnet_albertbasev2_albertformaskedlm \|
	\| 6 \| gpt2 - gpt2tokenizer - gpt2xl - gpt - gpt2tokenizerfast \| 348 \| 6_gpt2_gpt2tokenizer_gpt2xl_gpt \|
	\| 7 \| readmemd - readmetxt - readme - file - camembertbasereadmemd \| 273 \| 7_readmemd_readmetxt_readme_file \|
	\| 8 \| s2s - s2sdistill - s2t - s2strainer - exampless2s \| 260 \| 8_s2s_s2sdistill_s2t_s2strainer \|
	\| 9 \| longformer - longformers - longformerformultiplechoice - longformertokenizerfast - globalattentionmask \| 216 \| 9_longformer_longformers_longformerformultiplechoice_longformertokenizerfast \|
	\| 10 \| transformerscli - transformers - transformer - importerror - transformerxl \| 194 \| 10_transformerscli_transformers_transformer_importerror \|
	\| 11 \| tests - testing - slow - test - faster \| 187 \| 11_tests_testing_slow_test \|
	\| 12 \| cuda - cuda0 - memory - ram - gpus \| 159 \| 12_cuda_cuda0_memory_ram \|
	\| 13 \| pipeline - pipelines - ner - nerpipeline - featureextractionpipeline \| 145 \| 13_pipeline_pipelines_ner_nerpipeline \|
	\| 14 \| questionansweringpipeline - longformerforquestionanswering - answering - questionanswering - distilbertforquestionanswering \| 144 \| 14_questionansweringpipeline_longformerforquestionanswering_answering_questionanswering \|
	\| 15 \| trainertrain - trainer - loggingstrategy - logging - training \| 139 \| 15_trainertrain_trainer_loggingstrategy_logging \|
	\| 16 \| benchmark - benchmarks - accuracy - precision - comparison \| 139 \| 16_benchmark_benchmarks_accuracy_precision \|
	\| 17 \| labelsmoothednllloss - label - labelsmoothingfactor - labels - labelsmoothing \| 75 \| 17_labelsmoothednllloss_label_labelsmoothingfactor_labels \|
	\| 18 \| huggingfacemaster - huggingfacetokenizers297 - huggingface - huggingfaces - huggingfacetransformers \| 74 \| 18_huggingfacemaster_huggingfacetokenizers297_huggingface_huggingfaces \|
	\| 19 \| generationbeamsearchpy - generatebeamsearch - beamsearch - nonbeamsearch - beam \| 73 \| 19_generationbeamsearchpy_generatebeamsearch_beamsearch_nonbeamsearch \|
	\| 20 \| wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 \| 59 \| 20_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc \|
	\| 21 \| flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel \| 52 \| 21_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax \|
	\| 22 \| notebook - notebooks - notebookprogresscallback - community - colab \| 51 \| 22_notebook_notebooks_notebookprogresscallback_community \|
	\| 23 \| wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue \| 40 \| 23_wandbproject_wandb_wandbcallback_wandbdisabled \|
	\| 24 \| cachedir - cache - cachedpath - caching - cached \| 34 \| 24_cachedir_cache_cachedpath_caching \|
	\| 25 \| closed - add - bort - added - deleted \| 33 \| 25_closed_add_bort_added \|
	\| 26 \| electra - electrapretrainedmodel - electraformaskedlm - electralarge - electraformultiplechoice \| 26 \| 26_electra_electrapretrainedmodel_electraformaskedlm_electralarge \|
	\| 27 \| layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf \| 26 \| 27_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased \|
	\| 28 \| isort - blackisortflake8 - github - repo - version \| 18 \| 28_isort_blackisortflake8_github_repo \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 30
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.23.5
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.4
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.34.1
	* Numba: 0.56.4
	* Plotly: 5.15.0
	* Python: 3.10.12