File size: 10,464 Bytes
a36730c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# topic_model_general_normal_april8
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 80
* Number of training documents: 6795
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | models - language - llms - language models - chatgpt | 11 | -1_models_language_llms_language models |
| 0 | translation - language - models - data - generation | 2010 | 0_translation_language_models_data |
| 1 | visual - multimodal - image - images - video | 510 | 1_visual_multimodal_image_images |
| 2 | reasoning - math - cot - mathematical - problems | 432 | 2_reasoning_math_cot_mathematical |
| 3 | attacks - attack - adversarial - safety - jailbreak | 340 | 3_attacks_attack_adversarial_safety |
| 4 | medical - clinical - biomedical - health - healthcare | 318 | 4_medical_clinical_biomedical_health |
| 5 | code - code generation - generation - programming - software | 303 | 5_code_code generation_generation_programming |
| 6 | students - education - ai - chatgpt - student | 153 | 6_students_education_ai_chatgpt |
| 7 | robot - planning - robots - navigation - robotic | 110 | 7_robot_planning_robots_navigation |
| 8 | dialogue - taskoriented - dialog - dialogue systems - systems | 107 | 8_dialogue_taskoriented_dialog_dialogue systems |
| 9 | knowledge - question - answering - question answering - kgs | 97 | 9_knowledge_question_answering_question answering |
| 10 | financial - sentiment - stock - market - investment | 78 | 10_financial_sentiment_stock_market |
| 11 | bias - gender - biases - gender bias - fairness | 78 | 11_bias_gender_biases_gender bias |
| 12 | emotion - emotional - empathetic - mental health - affective | 77 | 12_emotion_emotional_empathetic_mental health |
| 13 | privacy - private - federated - data - attack | 76 | 13_privacy_private_federated_data |
| 14 | text - detection - texts - aigenerated - machinegenerated | 75 | 14_text_detection_texts_aigenerated |
| 15 | radiology - medical - reports - image - radiology reports | 75 | 15_radiology_medical_reports_image |
| 16 | training - parallelism - gpu - memory - hardware | 71 | 16_training_parallelism_gpu_memory |
| 17 | summarization - summaries - abstractive - summary - text summarization | 70 | 17_summarization_summaries_abstractive_summary |
| 18 | game - games - agents - social - llm agents | 69 | 18_game_games_agents_social |
| 19 | quantization - quantized - weights - memory - compression | 66 | 19_quantization_quantized_weights_memory |
| 20 | sql - texttosql - table - database - tabular | 62 | 20_sql_texttosql_table_database |
| 21 | retrieval - ranking - rag - reranking - retrievalaugmented | 61 | 21_retrieval_ranking_rag_reranking |
| 22 | lora - attention - lowrank - finetuning - memory | 59 | 22_lora_attention_lowrank_finetuning |
| 23 | legal - patent - claim - court - law | 58 | 23_legal_patent_claim_court |
| 24 | alignment - preference - reward - rlhf - preferences | 58 | 24_alignment_preference_reward_rlhf |
| 25 | recommendation - recommender - recommendations - recommender systems - user | 56 | 25_recommendation_recommender_recommendations_recommender systems |
| 26 | transformer - transformers - attention - layers - layer | 55 | 26_transformer_transformers_attention_layers |
| 27 | tom - cognitive - analogical - analogies - human | 52 | 27_tom_cognitive_analogical_analogies |
| 28 | vulnerability - vulnerabilities - code - security - smart | 48 | 28_vulnerability_vulnerabilities_code_security |
| 29 | materials - chemistry - materials science - chemical - molecular | 48 | 29_materials_chemistry_materials science_chemical |
| 30 | agent - agents - rl - environments - language agents | 47 | 30_agent_agents_rl_environments |
| 31 | repair - bugs - bug - program repair - apr | 43 | 31_repair_bugs_bug_program repair |
| 32 | graph - graphs - graph reasoning - graph neural - graph data | 43 | 32_graph_graphs_graph reasoning_graph neural |
| 33 | speech - asr - audio - speech recognition - recognition | 42 | 33_speech_asr_audio_speech recognition |
| 34 | ai - ethical - regulation - risks - risk | 41 | 34_ai_ethical_regulation_risks |
| 35 | personality - traits - personality traits - personas - personalities | 41 | 35_personality_traits_personality traits_personas |
| 36 | context - context window - window - length - long | 36 | 36_context_context window_window_length |
| 37 | chatgpt - research - writing - ai - academic | 34 | 37_chatgpt_research_writing_ai |
| 38 | incontext - demonstrations - icl - incontext learning - learning | 33 | 38_incontext_demonstrations_icl_incontext learning |
| 39 | sentiment - sentiment analysis - analysis - aspectbased - polarity | 32 | 39_sentiment_sentiment analysis_analysis_aspectbased |
| 40 | cultural - opinions - political - survey - values | 30 | 40_cultural_opinions_political_survey |
| 41 | tool - tools - apis - api - llms | 29 | 41_tool_tools_apis_api |
| 42 | hallucinations - hallucination - hallucination detection - detection - llms | 29 | 42_hallucinations_hallucination_hallucination detection_detection |
| 43 | creative - ideas - ai - creativity - storytelling | 28 | 43_creative_ideas_ai_creativity |
| 44 | music - musical - audio - lyrics - song | 28 | 44_music_musical_audio_lyrics |
| 45 | scaling - scaling laws - laws - training - model | 27 | 45_scaling_scaling laws_laws_training |
| 46 | physics - students - chatgpt - education - responses | 26 | 46_physics_students_chatgpt_education |
| 47 | correction - grammatical - gec - error - error correction | 26 | 47_correction_grammatical_gec_error |
| 48 | test - unit - tests - test generation - test cases | 23 | 48_test_unit_tests_test generation |
| 49 | pruning - sparsity - structured pruning - structured - weights | 23 | 49_pruning_sparsity_structured pruning_structured |
| 50 | commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question | 21 | 50_commonsense_commonsense knowledge_knowledge_commonsense question answering |
| 51 | distillation - teacher - student - kd - knowledge distillation | 20 | 51_distillation_teacher_student_kd |
| 52 | visualization - visualizations - data visualization - natural - natural language | 20 | 52_visualization_visualizations_data visualization_natural |
| 53 | hallucination - hallucinations - lvlms - mllms - visual | 20 | 53_hallucination_hallucinations_lvlms_mllms |
| 54 | adversarial - vlms - attacks - attack - adversarial examples | 20 | 54_adversarial_vlms_attacks_attack |
| 55 | verilog - design - hardware - hardware design - rtl | 18 | 55_verilog_design_hardware_hardware design |
| 56 | spatial - geospatial - geographic - location - populations | 18 | 56_spatial_geospatial_geographic_location |
| 57 | intent - intent detection - slot - detection - slot filling | 18 | 57_intent_intent detection_slot_detection |
| 58 | prompts - prompt - performance - negated - pseudocode | 18 | 58_prompts_prompt_performance_negated |
| 59 | brain - fmri - neural - activity - eeg | 17 | 59_brain_fmri_neural_activity |
| 60 | watermarking - copyright - protection - text - model | 16 | 60_watermarking_copyright_protection_text |
| 61 | public - social - media - early - ai | 16 | 61_public_social_media_early |
| 62 | ai - productivity - chatbots - chatgpt - economy | 15 | 62_ai_productivity_chatbots_chatgpt |
| 63 | poetry - poems - poetry generation - lyrics - poem | 15 | 63_poetry_poems_poetry generation_lyrics |
| 64 | geoscience - astronomy - scientific - astronomical - galactica | 15 | 64_geoscience_astronomy_scientific_astronomical |
| 65 | editing - knowledge editing - knowledge - model editing - editing methods | 14 | 65_editing_knowledge editing_knowledge_model editing |
| 66 | argument - arguments - argumentation - fallacy - fallacies | 14 | 66_argument_arguments_argumentation_fallacy |
| 67 | mobile - wireless - devices - aigc - network | 14 | 67_mobile_wireless_devices_aigc |
| 68 | design - bid - 3d - designs - generative | 14 | 68_design_bid_3d_designs |
| 69 | simplification - text simplification - text - sentence - readability | 14 | 69_simplification_text simplification_text_sentence |
| 70 | urban - traffic - transportation - foundation models - foundation | 13 | 70_urban_traffic_transportation_foundation models |
| 71 | log - anomaly - root - anomaly detection - cloud | 13 | 71_log_anomaly_root_anomaly detection |
| 72 | forgetting - catastrophic forgetting - catastrophic - continual - finetuning | 13 | 72_forgetting_catastrophic forgetting_catastrophic_continual |
| 73 | scientific - papers - review - gpt4 - feedback | 13 | 73_scientific_papers_review_gpt4 |
| 74 | causal - causality - causal discovery - causal inference - causal reasoning | 13 | 74_causal_causality_causal discovery_causal inference |
| 75 | product - ecommerce - attribute - extraction - product descriptions | 13 | 75_product_ecommerce_attribute_extraction |
| 76 | optimizers - adam - deep - training - networks | 12 | 76_optimizers_adam_deep_training |
| 77 | chinese - questions - subjects - school - ceval | 12 | 77_chinese_questions_subjects_school |
| 78 | speculative - decoding - draft - speculative decoding - draft model | 12 | 78_speculative_decoding_draft_speculative decoding |
</details>
## Training hyperparameters
* calculate_probabilities: False
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: auto
* seed_topic_list: None
* top_n_words: 10
* verbose: True
* zeroshot_min_similarity: 0.7
* zeroshot_topic_list: None
## Framework versions
* Numpy: 1.25.2
* HDBSCAN: 0.8.33
* UMAP: 0.5.6
* Pandas: 2.0.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.6.1
* Transformers: 4.38.2
* Numba: 0.58.1
* Plotly: 5.15.0
* Python: 3.10.12
|