topic_model_general_normal_april8 / README.md

Add BERTopic model

a36730c verified almost 2 years ago

10.5 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# topic_model_general_normal_april8

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 80
	* Number of training documents: 6795

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| models - language - llms - language models - chatgpt \| 11 \| -1_models_language_llms_language models \|
	\| 0 \| translation - language - models - data - generation \| 2010 \| 0_translation_language_models_data \|
	\| 1 \| visual - multimodal - image - images - video \| 510 \| 1_visual_multimodal_image_images \|
	\| 2 \| reasoning - math - cot - mathematical - problems \| 432 \| 2_reasoning_math_cot_mathematical \|
	\| 3 \| attacks - attack - adversarial - safety - jailbreak \| 340 \| 3_attacks_attack_adversarial_safety \|
	\| 4 \| medical - clinical - biomedical - health - healthcare \| 318 \| 4_medical_clinical_biomedical_health \|
	\| 5 \| code - code generation - generation - programming - software \| 303 \| 5_code_code generation_generation_programming \|
	\| 6 \| students - education - ai - chatgpt - student \| 153 \| 6_students_education_ai_chatgpt \|
	\| 7 \| robot - planning - robots - navigation - robotic \| 110 \| 7_robot_planning_robots_navigation \|
	\| 8 \| dialogue - taskoriented - dialog - dialogue systems - systems \| 107 \| 8_dialogue_taskoriented_dialog_dialogue systems \|
	\| 9 \| knowledge - question - answering - question answering - kgs \| 97 \| 9_knowledge_question_answering_question answering \|
	\| 10 \| financial - sentiment - stock - market - investment \| 78 \| 10_financial_sentiment_stock_market \|
	\| 11 \| bias - gender - biases - gender bias - fairness \| 78 \| 11_bias_gender_biases_gender bias \|
	\| 12 \| emotion - emotional - empathetic - mental health - affective \| 77 \| 12_emotion_emotional_empathetic_mental health \|
	\| 13 \| privacy - private - federated - data - attack \| 76 \| 13_privacy_private_federated_data \|
	\| 14 \| text - detection - texts - aigenerated - machinegenerated \| 75 \| 14_text_detection_texts_aigenerated \|
	\| 15 \| radiology - medical - reports - image - radiology reports \| 75 \| 15_radiology_medical_reports_image \|
	\| 16 \| training - parallelism - gpu - memory - hardware \| 71 \| 16_training_parallelism_gpu_memory \|
	\| 17 \| summarization - summaries - abstractive - summary - text summarization \| 70 \| 17_summarization_summaries_abstractive_summary \|
	\| 18 \| game - games - agents - social - llm agents \| 69 \| 18_game_games_agents_social \|
	\| 19 \| quantization - quantized - weights - memory - compression \| 66 \| 19_quantization_quantized_weights_memory \|
	\| 20 \| sql - texttosql - table - database - tabular \| 62 \| 20_sql_texttosql_table_database \|
	\| 21 \| retrieval - ranking - rag - reranking - retrievalaugmented \| 61 \| 21_retrieval_ranking_rag_reranking \|
	\| 22 \| lora - attention - lowrank - finetuning - memory \| 59 \| 22_lora_attention_lowrank_finetuning \|
	\| 23 \| legal - patent - claim - court - law \| 58 \| 23_legal_patent_claim_court \|
	\| 24 \| alignment - preference - reward - rlhf - preferences \| 58 \| 24_alignment_preference_reward_rlhf \|
	\| 25 \| recommendation - recommender - recommendations - recommender systems - user \| 56 \| 25_recommendation_recommender_recommendations_recommender systems \|
	\| 26 \| transformer - transformers - attention - layers - layer \| 55 \| 26_transformer_transformers_attention_layers \|
	\| 27 \| tom - cognitive - analogical - analogies - human \| 52 \| 27_tom_cognitive_analogical_analogies \|
	\| 28 \| vulnerability - vulnerabilities - code - security - smart \| 48 \| 28_vulnerability_vulnerabilities_code_security \|
	\| 29 \| materials - chemistry - materials science - chemical - molecular \| 48 \| 29_materials_chemistry_materials science_chemical \|
	\| 30 \| agent - agents - rl - environments - language agents \| 47 \| 30_agent_agents_rl_environments \|
	\| 31 \| repair - bugs - bug - program repair - apr \| 43 \| 31_repair_bugs_bug_program repair \|
	\| 32 \| graph - graphs - graph reasoning - graph neural - graph data \| 43 \| 32_graph_graphs_graph reasoning_graph neural \|
	\| 33 \| speech - asr - audio - speech recognition - recognition \| 42 \| 33_speech_asr_audio_speech recognition \|
	\| 34 \| ai - ethical - regulation - risks - risk \| 41 \| 34_ai_ethical_regulation_risks \|
	\| 35 \| personality - traits - personality traits - personas - personalities \| 41 \| 35_personality_traits_personality traits_personas \|
	\| 36 \| context - context window - window - length - long \| 36 \| 36_context_context window_window_length \|
	\| 37 \| chatgpt - research - writing - ai - academic \| 34 \| 37_chatgpt_research_writing_ai \|
	\| 38 \| incontext - demonstrations - icl - incontext learning - learning \| 33 \| 38_incontext_demonstrations_icl_incontext learning \|
	\| 39 \| sentiment - sentiment analysis - analysis - aspectbased - polarity \| 32 \| 39_sentiment_sentiment analysis_analysis_aspectbased \|
	\| 40 \| cultural - opinions - political - survey - values \| 30 \| 40_cultural_opinions_political_survey \|
	\| 41 \| tool - tools - apis - api - llms \| 29 \| 41_tool_tools_apis_api \|
	\| 42 \| hallucinations - hallucination - hallucination detection - detection - llms \| 29 \| 42_hallucinations_hallucination_hallucination detection_detection \|
	\| 43 \| creative - ideas - ai - creativity - storytelling \| 28 \| 43_creative_ideas_ai_creativity \|
	\| 44 \| music - musical - audio - lyrics - song \| 28 \| 44_music_musical_audio_lyrics \|
	\| 45 \| scaling - scaling laws - laws - training - model \| 27 \| 45_scaling_scaling laws_laws_training \|
	\| 46 \| physics - students - chatgpt - education - responses \| 26 \| 46_physics_students_chatgpt_education \|
	\| 47 \| correction - grammatical - gec - error - error correction \| 26 \| 47_correction_grammatical_gec_error \|
	\| 48 \| test - unit - tests - test generation - test cases \| 23 \| 48_test_unit_tests_test generation \|
	\| 49 \| pruning - sparsity - structured pruning - structured - weights \| 23 \| 49_pruning_sparsity_structured pruning_structured \|
	\| 50 \| commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question \| 21 \| 50_commonsense_commonsense knowledge_knowledge_commonsense question answering \|
	\| 51 \| distillation - teacher - student - kd - knowledge distillation \| 20 \| 51_distillation_teacher_student_kd \|
	\| 52 \| visualization - visualizations - data visualization - natural - natural language \| 20 \| 52_visualization_visualizations_data visualization_natural \|
	\| 53 \| hallucination - hallucinations - lvlms - mllms - visual \| 20 \| 53_hallucination_hallucinations_lvlms_mllms \|
	\| 54 \| adversarial - vlms - attacks - attack - adversarial examples \| 20 \| 54_adversarial_vlms_attacks_attack \|
	\| 55 \| verilog - design - hardware - hardware design - rtl \| 18 \| 55_verilog_design_hardware_hardware design \|
	\| 56 \| spatial - geospatial - geographic - location - populations \| 18 \| 56_spatial_geospatial_geographic_location \|
	\| 57 \| intent - intent detection - slot - detection - slot filling \| 18 \| 57_intent_intent detection_slot_detection \|
	\| 58 \| prompts - prompt - performance - negated - pseudocode \| 18 \| 58_prompts_prompt_performance_negated \|
	\| 59 \| brain - fmri - neural - activity - eeg \| 17 \| 59_brain_fmri_neural_activity \|
	\| 60 \| watermarking - copyright - protection - text - model \| 16 \| 60_watermarking_copyright_protection_text \|
	\| 61 \| public - social - media - early - ai \| 16 \| 61_public_social_media_early \|
	\| 62 \| ai - productivity - chatbots - chatgpt - economy \| 15 \| 62_ai_productivity_chatbots_chatgpt \|
	\| 63 \| poetry - poems - poetry generation - lyrics - poem \| 15 \| 63_poetry_poems_poetry generation_lyrics \|
	\| 64 \| geoscience - astronomy - scientific - astronomical - galactica \| 15 \| 64_geoscience_astronomy_scientific_astronomical \|
	\| 65 \| editing - knowledge editing - knowledge - model editing - editing methods \| 14 \| 65_editing_knowledge editing_knowledge_model editing \|
	\| 66 \| argument - arguments - argumentation - fallacy - fallacies \| 14 \| 66_argument_arguments_argumentation_fallacy \|
	\| 67 \| mobile - wireless - devices - aigc - network \| 14 \| 67_mobile_wireless_devices_aigc \|
	\| 68 \| design - bid - 3d - designs - generative \| 14 \| 68_design_bid_3d_designs \|
	\| 69 \| simplification - text simplification - text - sentence - readability \| 14 \| 69_simplification_text simplification_text_sentence \|
	\| 70 \| urban - traffic - transportation - foundation models - foundation \| 13 \| 70_urban_traffic_transportation_foundation models \|
	\| 71 \| log - anomaly - root - anomaly detection - cloud \| 13 \| 71_log_anomaly_root_anomaly detection \|
	\| 72 \| forgetting - catastrophic forgetting - catastrophic - continual - finetuning \| 13 \| 72_forgetting_catastrophic forgetting_catastrophic_continual \|
	\| 73 \| scientific - papers - review - gpt4 - feedback \| 13 \| 73_scientific_papers_review_gpt4 \|
	\| 74 \| causal - causality - causal discovery - causal inference - causal reasoning \| 13 \| 74_causal_causality_causal discovery_causal inference \|
	\| 75 \| product - ecommerce - attribute - extraction - product descriptions \| 13 \| 75_product_ecommerce_attribute_extraction \|
	\| 76 \| optimizers - adam - deep - training - networks \| 12 \| 76_optimizers_adam_deep_training \|
	\| 77 \| chinese - questions - subjects - school - ceval \| 12 \| 77_chinese_questions_subjects_school \|
	\| 78 \| speculative - decoding - draft - speculative decoding - draft model \| 12 \| 78_speculative_decoding_draft_speculative decoding \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: auto
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True
	* zeroshot_min_similarity: 0.7
	* zeroshot_topic_list: None

	## Framework versions

	* Numpy: 1.25.2
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.6
	* Pandas: 2.0.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.6.1
	* Transformers: 4.38.2
	* Numba: 0.58.1
	* Plotly: 5.15.0
	* Python: 3.10.12


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# topic_model_general_normal_april8

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 80
	* Number of training documents: 6795

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| models - language - llms - language models - chatgpt \| 11 \| -1_models_language_llms_language models \|
	\| 0 \| translation - language - models - data - generation \| 2010 \| 0_translation_language_models_data \|
	\| 1 \| visual - multimodal - image - images - video \| 510 \| 1_visual_multimodal_image_images \|
	\| 2 \| reasoning - math - cot - mathematical - problems \| 432 \| 2_reasoning_math_cot_mathematical \|
	\| 3 \| attacks - attack - adversarial - safety - jailbreak \| 340 \| 3_attacks_attack_adversarial_safety \|
	\| 4 \| medical - clinical - biomedical - health - healthcare \| 318 \| 4_medical_clinical_biomedical_health \|
	\| 5 \| code - code generation - generation - programming - software \| 303 \| 5_code_code generation_generation_programming \|
	\| 6 \| students - education - ai - chatgpt - student \| 153 \| 6_students_education_ai_chatgpt \|
	\| 7 \| robot - planning - robots - navigation - robotic \| 110 \| 7_robot_planning_robots_navigation \|
	\| 8 \| dialogue - taskoriented - dialog - dialogue systems - systems \| 107 \| 8_dialogue_taskoriented_dialog_dialogue systems \|
	\| 9 \| knowledge - question - answering - question answering - kgs \| 97 \| 9_knowledge_question_answering_question answering \|
	\| 10 \| financial - sentiment - stock - market - investment \| 78 \| 10_financial_sentiment_stock_market \|
	\| 11 \| bias - gender - biases - gender bias - fairness \| 78 \| 11_bias_gender_biases_gender bias \|
	\| 12 \| emotion - emotional - empathetic - mental health - affective \| 77 \| 12_emotion_emotional_empathetic_mental health \|
	\| 13 \| privacy - private - federated - data - attack \| 76 \| 13_privacy_private_federated_data \|
	\| 14 \| text - detection - texts - aigenerated - machinegenerated \| 75 \| 14_text_detection_texts_aigenerated \|
	\| 15 \| radiology - medical - reports - image - radiology reports \| 75 \| 15_radiology_medical_reports_image \|
	\| 16 \| training - parallelism - gpu - memory - hardware \| 71 \| 16_training_parallelism_gpu_memory \|
	\| 17 \| summarization - summaries - abstractive - summary - text summarization \| 70 \| 17_summarization_summaries_abstractive_summary \|
	\| 18 \| game - games - agents - social - llm agents \| 69 \| 18_game_games_agents_social \|
	\| 19 \| quantization - quantized - weights - memory - compression \| 66 \| 19_quantization_quantized_weights_memory \|
	\| 20 \| sql - texttosql - table - database - tabular \| 62 \| 20_sql_texttosql_table_database \|
	\| 21 \| retrieval - ranking - rag - reranking - retrievalaugmented \| 61 \| 21_retrieval_ranking_rag_reranking \|
	\| 22 \| lora - attention - lowrank - finetuning - memory \| 59 \| 22_lora_attention_lowrank_finetuning \|
	\| 23 \| legal - patent - claim - court - law \| 58 \| 23_legal_patent_claim_court \|
	\| 24 \| alignment - preference - reward - rlhf - preferences \| 58 \| 24_alignment_preference_reward_rlhf \|
	\| 25 \| recommendation - recommender - recommendations - recommender systems - user \| 56 \| 25_recommendation_recommender_recommendations_recommender systems \|
	\| 26 \| transformer - transformers - attention - layers - layer \| 55 \| 26_transformer_transformers_attention_layers \|
	\| 27 \| tom - cognitive - analogical - analogies - human \| 52 \| 27_tom_cognitive_analogical_analogies \|
	\| 28 \| vulnerability - vulnerabilities - code - security - smart \| 48 \| 28_vulnerability_vulnerabilities_code_security \|
	\| 29 \| materials - chemistry - materials science - chemical - molecular \| 48 \| 29_materials_chemistry_materials science_chemical \|
	\| 30 \| agent - agents - rl - environments - language agents \| 47 \| 30_agent_agents_rl_environments \|
	\| 31 \| repair - bugs - bug - program repair - apr \| 43 \| 31_repair_bugs_bug_program repair \|
	\| 32 \| graph - graphs - graph reasoning - graph neural - graph data \| 43 \| 32_graph_graphs_graph reasoning_graph neural \|
	\| 33 \| speech - asr - audio - speech recognition - recognition \| 42 \| 33_speech_asr_audio_speech recognition \|
	\| 34 \| ai - ethical - regulation - risks - risk \| 41 \| 34_ai_ethical_regulation_risks \|
	\| 35 \| personality - traits - personality traits - personas - personalities \| 41 \| 35_personality_traits_personality traits_personas \|
	\| 36 \| context - context window - window - length - long \| 36 \| 36_context_context window_window_length \|
	\| 37 \| chatgpt - research - writing - ai - academic \| 34 \| 37_chatgpt_research_writing_ai \|
	\| 38 \| incontext - demonstrations - icl - incontext learning - learning \| 33 \| 38_incontext_demonstrations_icl_incontext learning \|
	\| 39 \| sentiment - sentiment analysis - analysis - aspectbased - polarity \| 32 \| 39_sentiment_sentiment analysis_analysis_aspectbased \|
	\| 40 \| cultural - opinions - political - survey - values \| 30 \| 40_cultural_opinions_political_survey \|
	\| 41 \| tool - tools - apis - api - llms \| 29 \| 41_tool_tools_apis_api \|
	\| 42 \| hallucinations - hallucination - hallucination detection - detection - llms \| 29 \| 42_hallucinations_hallucination_hallucination detection_detection \|
	\| 43 \| creative - ideas - ai - creativity - storytelling \| 28 \| 43_creative_ideas_ai_creativity \|
	\| 44 \| music - musical - audio - lyrics - song \| 28 \| 44_music_musical_audio_lyrics \|
	\| 45 \| scaling - scaling laws - laws - training - model \| 27 \| 45_scaling_scaling laws_laws_training \|
	\| 46 \| physics - students - chatgpt - education - responses \| 26 \| 46_physics_students_chatgpt_education \|
	\| 47 \| correction - grammatical - gec - error - error correction \| 26 \| 47_correction_grammatical_gec_error \|
	\| 48 \| test - unit - tests - test generation - test cases \| 23 \| 48_test_unit_tests_test generation \|
	\| 49 \| pruning - sparsity - structured pruning - structured - weights \| 23 \| 49_pruning_sparsity_structured pruning_structured \|
	\| 50 \| commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question \| 21 \| 50_commonsense_commonsense knowledge_knowledge_commonsense question answering \|
	\| 51 \| distillation - teacher - student - kd - knowledge distillation \| 20 \| 51_distillation_teacher_student_kd \|
	\| 52 \| visualization - visualizations - data visualization - natural - natural language \| 20 \| 52_visualization_visualizations_data visualization_natural \|
	\| 53 \| hallucination - hallucinations - lvlms - mllms - visual \| 20 \| 53_hallucination_hallucinations_lvlms_mllms \|
	\| 54 \| adversarial - vlms - attacks - attack - adversarial examples \| 20 \| 54_adversarial_vlms_attacks_attack \|
	\| 55 \| verilog - design - hardware - hardware design - rtl \| 18 \| 55_verilog_design_hardware_hardware design \|
	\| 56 \| spatial - geospatial - geographic - location - populations \| 18 \| 56_spatial_geospatial_geographic_location \|
	\| 57 \| intent - intent detection - slot - detection - slot filling \| 18 \| 57_intent_intent detection_slot_detection \|
	\| 58 \| prompts - prompt - performance - negated - pseudocode \| 18 \| 58_prompts_prompt_performance_negated \|
	\| 59 \| brain - fmri - neural - activity - eeg \| 17 \| 59_brain_fmri_neural_activity \|
	\| 60 \| watermarking - copyright - protection - text - model \| 16 \| 60_watermarking_copyright_protection_text \|
	\| 61 \| public - social - media - early - ai \| 16 \| 61_public_social_media_early \|
	\| 62 \| ai - productivity - chatbots - chatgpt - economy \| 15 \| 62_ai_productivity_chatbots_chatgpt \|
	\| 63 \| poetry - poems - poetry generation - lyrics - poem \| 15 \| 63_poetry_poems_poetry generation_lyrics \|
	\| 64 \| geoscience - astronomy - scientific - astronomical - galactica \| 15 \| 64_geoscience_astronomy_scientific_astronomical \|
	\| 65 \| editing - knowledge editing - knowledge - model editing - editing methods \| 14 \| 65_editing_knowledge editing_knowledge_model editing \|
	\| 66 \| argument - arguments - argumentation - fallacy - fallacies \| 14 \| 66_argument_arguments_argumentation_fallacy \|
	\| 67 \| mobile - wireless - devices - aigc - network \| 14 \| 67_mobile_wireless_devices_aigc \|
	\| 68 \| design - bid - 3d - designs - generative \| 14 \| 68_design_bid_3d_designs \|
	\| 69 \| simplification - text simplification - text - sentence - readability \| 14 \| 69_simplification_text simplification_text_sentence \|
	\| 70 \| urban - traffic - transportation - foundation models - foundation \| 13 \| 70_urban_traffic_transportation_foundation models \|
	\| 71 \| log - anomaly - root - anomaly detection - cloud \| 13 \| 71_log_anomaly_root_anomaly detection \|
	\| 72 \| forgetting - catastrophic forgetting - catastrophic - continual - finetuning \| 13 \| 72_forgetting_catastrophic forgetting_catastrophic_continual \|
	\| 73 \| scientific - papers - review - gpt4 - feedback \| 13 \| 73_scientific_papers_review_gpt4 \|
	\| 74 \| causal - causality - causal discovery - causal inference - causal reasoning \| 13 \| 74_causal_causality_causal discovery_causal inference \|
	\| 75 \| product - ecommerce - attribute - extraction - product descriptions \| 13 \| 75_product_ecommerce_attribute_extraction \|
	\| 76 \| optimizers - adam - deep - training - networks \| 12 \| 76_optimizers_adam_deep_training \|
	\| 77 \| chinese - questions - subjects - school - ceval \| 12 \| 77_chinese_questions_subjects_school \|
	\| 78 \| speculative - decoding - draft - speculative decoding - draft model \| 12 \| 78_speculative_decoding_draft_speculative decoding \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: auto
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True
	* zeroshot_min_similarity: 0.7
	* zeroshot_topic_list: None

	## Framework versions

	* Numpy: 1.25.2
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.6
	* Pandas: 2.0.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.6.1
	* Transformers: 4.38.2
	* Numba: 0.58.1
	* Plotly: 5.15.0
	* Python: 3.10.12