File size: 10,464 Bytes
a36730c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# topic_model_general_normal_april8

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 80
* Number of training documents: 6795

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | models - language - llms - language models - chatgpt | 11 | -1_models_language_llms_language models | 
| 0 | translation - language - models - data - generation | 2010 | 0_translation_language_models_data | 
| 1 | visual - multimodal - image - images - video | 510 | 1_visual_multimodal_image_images | 
| 2 | reasoning - math - cot - mathematical - problems | 432 | 2_reasoning_math_cot_mathematical | 
| 3 | attacks - attack - adversarial - safety - jailbreak | 340 | 3_attacks_attack_adversarial_safety | 
| 4 | medical - clinical - biomedical - health - healthcare | 318 | 4_medical_clinical_biomedical_health | 
| 5 | code - code generation - generation - programming - software | 303 | 5_code_code generation_generation_programming | 
| 6 | students - education - ai - chatgpt - student | 153 | 6_students_education_ai_chatgpt | 
| 7 | robot - planning - robots - navigation - robotic | 110 | 7_robot_planning_robots_navigation | 
| 8 | dialogue - taskoriented - dialog - dialogue systems - systems | 107 | 8_dialogue_taskoriented_dialog_dialogue systems | 
| 9 | knowledge - question - answering - question answering - kgs | 97 | 9_knowledge_question_answering_question answering | 
| 10 | financial - sentiment - stock - market - investment | 78 | 10_financial_sentiment_stock_market | 
| 11 | bias - gender - biases - gender bias - fairness | 78 | 11_bias_gender_biases_gender bias | 
| 12 | emotion - emotional - empathetic - mental health - affective | 77 | 12_emotion_emotional_empathetic_mental health | 
| 13 | privacy - private - federated - data - attack | 76 | 13_privacy_private_federated_data | 
| 14 | text - detection - texts - aigenerated - machinegenerated | 75 | 14_text_detection_texts_aigenerated | 
| 15 | radiology - medical - reports - image - radiology reports | 75 | 15_radiology_medical_reports_image | 
| 16 | training - parallelism - gpu - memory - hardware | 71 | 16_training_parallelism_gpu_memory | 
| 17 | summarization - summaries - abstractive - summary - text summarization | 70 | 17_summarization_summaries_abstractive_summary | 
| 18 | game - games - agents - social - llm agents | 69 | 18_game_games_agents_social | 
| 19 | quantization - quantized - weights - memory - compression | 66 | 19_quantization_quantized_weights_memory | 
| 20 | sql - texttosql - table - database - tabular | 62 | 20_sql_texttosql_table_database | 
| 21 | retrieval - ranking - rag - reranking - retrievalaugmented | 61 | 21_retrieval_ranking_rag_reranking | 
| 22 | lora - attention - lowrank - finetuning - memory | 59 | 22_lora_attention_lowrank_finetuning | 
| 23 | legal - patent - claim - court - law | 58 | 23_legal_patent_claim_court | 
| 24 | alignment - preference - reward - rlhf - preferences | 58 | 24_alignment_preference_reward_rlhf | 
| 25 | recommendation - recommender - recommendations - recommender systems - user | 56 | 25_recommendation_recommender_recommendations_recommender systems | 
| 26 | transformer - transformers - attention - layers - layer | 55 | 26_transformer_transformers_attention_layers | 
| 27 | tom - cognitive - analogical - analogies - human | 52 | 27_tom_cognitive_analogical_analogies | 
| 28 | vulnerability - vulnerabilities - code - security - smart | 48 | 28_vulnerability_vulnerabilities_code_security | 
| 29 | materials - chemistry - materials science - chemical - molecular | 48 | 29_materials_chemistry_materials science_chemical | 
| 30 | agent - agents - rl - environments - language agents | 47 | 30_agent_agents_rl_environments | 
| 31 | repair - bugs - bug - program repair - apr | 43 | 31_repair_bugs_bug_program repair | 
| 32 | graph - graphs - graph reasoning - graph neural - graph data | 43 | 32_graph_graphs_graph reasoning_graph neural | 
| 33 | speech - asr - audio - speech recognition - recognition | 42 | 33_speech_asr_audio_speech recognition | 
| 34 | ai - ethical - regulation - risks - risk | 41 | 34_ai_ethical_regulation_risks | 
| 35 | personality - traits - personality traits - personas - personalities | 41 | 35_personality_traits_personality traits_personas | 
| 36 | context - context window - window - length - long | 36 | 36_context_context window_window_length | 
| 37 | chatgpt - research - writing - ai - academic | 34 | 37_chatgpt_research_writing_ai | 
| 38 | incontext - demonstrations - icl - incontext learning - learning | 33 | 38_incontext_demonstrations_icl_incontext learning | 
| 39 | sentiment - sentiment analysis - analysis - aspectbased - polarity | 32 | 39_sentiment_sentiment analysis_analysis_aspectbased | 
| 40 | cultural - opinions - political - survey - values | 30 | 40_cultural_opinions_political_survey | 
| 41 | tool - tools - apis - api - llms | 29 | 41_tool_tools_apis_api | 
| 42 | hallucinations - hallucination - hallucination detection - detection - llms | 29 | 42_hallucinations_hallucination_hallucination detection_detection | 
| 43 | creative - ideas - ai - creativity - storytelling | 28 | 43_creative_ideas_ai_creativity | 
| 44 | music - musical - audio - lyrics - song | 28 | 44_music_musical_audio_lyrics | 
| 45 | scaling - scaling laws - laws - training - model | 27 | 45_scaling_scaling laws_laws_training | 
| 46 | physics - students - chatgpt - education - responses | 26 | 46_physics_students_chatgpt_education | 
| 47 | correction - grammatical - gec - error - error correction | 26 | 47_correction_grammatical_gec_error | 
| 48 | test - unit - tests - test generation - test cases | 23 | 48_test_unit_tests_test generation | 
| 49 | pruning - sparsity - structured pruning - structured - weights | 23 | 49_pruning_sparsity_structured pruning_structured | 
| 50 | commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question | 21 | 50_commonsense_commonsense knowledge_knowledge_commonsense question answering | 
| 51 | distillation - teacher - student - kd - knowledge distillation | 20 | 51_distillation_teacher_student_kd | 
| 52 | visualization - visualizations - data visualization - natural - natural language | 20 | 52_visualization_visualizations_data visualization_natural | 
| 53 | hallucination - hallucinations - lvlms - mllms - visual | 20 | 53_hallucination_hallucinations_lvlms_mllms | 
| 54 | adversarial - vlms - attacks - attack - adversarial examples | 20 | 54_adversarial_vlms_attacks_attack | 
| 55 | verilog - design - hardware - hardware design - rtl | 18 | 55_verilog_design_hardware_hardware design | 
| 56 | spatial - geospatial - geographic - location - populations | 18 | 56_spatial_geospatial_geographic_location | 
| 57 | intent - intent detection - slot - detection - slot filling | 18 | 57_intent_intent detection_slot_detection | 
| 58 | prompts - prompt - performance - negated - pseudocode | 18 | 58_prompts_prompt_performance_negated | 
| 59 | brain - fmri - neural - activity - eeg | 17 | 59_brain_fmri_neural_activity | 
| 60 | watermarking - copyright - protection - text - model | 16 | 60_watermarking_copyright_protection_text | 
| 61 | public - social - media - early - ai | 16 | 61_public_social_media_early | 
| 62 | ai - productivity - chatbots - chatgpt - economy | 15 | 62_ai_productivity_chatbots_chatgpt | 
| 63 | poetry - poems - poetry generation - lyrics - poem | 15 | 63_poetry_poems_poetry generation_lyrics | 
| 64 | geoscience - astronomy - scientific - astronomical - galactica | 15 | 64_geoscience_astronomy_scientific_astronomical | 
| 65 | editing - knowledge editing - knowledge - model editing - editing methods | 14 | 65_editing_knowledge editing_knowledge_model editing | 
| 66 | argument - arguments - argumentation - fallacy - fallacies | 14 | 66_argument_arguments_argumentation_fallacy | 
| 67 | mobile - wireless - devices - aigc - network | 14 | 67_mobile_wireless_devices_aigc | 
| 68 | design - bid - 3d - designs - generative | 14 | 68_design_bid_3d_designs | 
| 69 | simplification - text simplification - text - sentence - readability | 14 | 69_simplification_text simplification_text_sentence | 
| 70 | urban - traffic - transportation - foundation models - foundation | 13 | 70_urban_traffic_transportation_foundation models | 
| 71 | log - anomaly - root - anomaly detection - cloud | 13 | 71_log_anomaly_root_anomaly detection | 
| 72 | forgetting - catastrophic forgetting - catastrophic - continual - finetuning | 13 | 72_forgetting_catastrophic forgetting_catastrophic_continual | 
| 73 | scientific - papers - review - gpt4 - feedback | 13 | 73_scientific_papers_review_gpt4 | 
| 74 | causal - causality - causal discovery - causal inference - causal reasoning | 13 | 74_causal_causality_causal discovery_causal inference | 
| 75 | product - ecommerce - attribute - extraction - product descriptions | 13 | 75_product_ecommerce_attribute_extraction | 
| 76 | optimizers - adam - deep - training - networks | 12 | 76_optimizers_adam_deep_training | 
| 77 | chinese - questions - subjects - school - ceval | 12 | 77_chinese_questions_subjects_school | 
| 78 | speculative - decoding - draft - speculative decoding - draft model | 12 | 78_speculative_decoding_draft_speculative decoding |
  
</details>

## Training hyperparameters

* calculate_probabilities: False
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: auto
* seed_topic_list: None
* top_n_words: 10
* verbose: True
* zeroshot_min_similarity: 0.7
* zeroshot_topic_list: None

## Framework versions

* Numpy: 1.25.2
* HDBSCAN: 0.8.33
* UMAP: 0.5.6
* Pandas: 2.0.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.6.1
* Transformers: 4.38.2
* Numba: 0.58.1
* Plotly: 5.15.0
* Python: 3.10.12