File size: 14,346 Bytes
d8bf775
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# BERTopic_ML-ArXiv-Abstracts

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("b-verma/BERTopic_ML-ArXiv-Abstracts")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 120
* Number of training documents: 117592

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | data - models - model - learning - based | 153 | Machine Learning and Deep Learning | 
| 0 | policy - reinforcement - reinforcement learning - rl - agent | 32094 | Reinforcement Learning and Control | 
| 1 | graph - node - graphs - nodes - gnns | 10423 | Graph Embedding and Representation Learning | 
| 2 | speech - audio - speaker - music - asr | 3598 | Speech Technology | 
| 3 | 3d - object - video - segmentation - point | 3527 | 3D Object Understanding | 
| 4 | equations - differential - physics - differential equations - pdes | 2706 | Discovering and Solving Partial Differential Equations | 
| 5 | adversarial - attacks - adversarial examples - robustness - attack | 2287 | Adversarial Robustness | 
| 6 | networks - relu - neural - neural networks - activation | 2210 | Deep Learning Activation Functions | 
| 7 | segmentation - medical - images - image - tumor | 2207 | Medical Image Segmentation | 
| 8 | gradient - stochastic - sgd - convergence - convex | 1835 | Convergence Analysis of Non-Convex Optimization Algorithms | 
| 9 | federated - fl - federated learning - clients - privacy | 1717 | Federated Learning and Privacy | 
| 10 | channel - wireless - radio - network - communication | 1698 | Channel Allocation and Estimation in Wireless Communications | 
| 11 | privacy - private - differential privacy - dp - differential | 1449 | Privacy-Preserving Machine Learning | 
| 12 | clinical - patient - patients - medical - health | 1353 | Clinical Patient Representation Learning | 
| 13 | gans - gan - generative - generative adversarial - generator | 1298 | Generative Adversarial Networks (GANs) | 
| 14 | bandit - regret - arm - bandits - arms | 1269 | Armed Bandit Problems | 
| 15 | financial - stock - market - trading - price | 1246 | Financial Time Series Analysis | 
| 16 | recommendation - user - item - recommender - items | 1193 | Recommendation Systems | 
| 17 | power - energy - electricity - load - forecasting | 1144 | Power and Energy Forecasting | 
| 18 | causal - treatment - observational - effect - causal inference | 1070 | Causal Inference and Learning | 
| 19 | explanations - explanation - counterfactual - interpretability - interpretable | 1048 | Explanation Methods for Machine Learning Models | 
| 20 | driving - autonomous - vehicle - vehicles - driver | 1007 | Autonomous Driving | 
| 21 | malware - detection - iot - security - attacks | 959 | Cybersecurity Threats in IoT Networks | 
| 22 | quantum - classical - circuit - circuits - quantum machine | 947 | Quantum Machine Learning | 
| 23 | fairness - fair - bias - discrimination - protected | 938 | Fair Machine Learning | 
| 24 | hardware - memory - gpu - dnn - accelerators | 924 | Edge AI Hardware for Efficient DNN Inference | 
| 25 | clustering - means - clusters - cluster - algorithm | 921 | Clustering Algorithms | 
| 26 | crop - images - satellite - remote sensing - hyperspectral | 899 | Remote Sensing and Deep Learning | 
| 27 | time series - series - time - forecasting - series forecasting | 847 | Time Series Analysis and Forecasting | 
| 28 | pruning - compression - sparsity - sparse - network | 836 | Neural Network Pruning | 
| 29 | distributed - communication - sgd - decentralized - gradient | 822 | Distributed Optimization Methods | 
| 30 | label - labels - multi label - noisy - noisy labels | 812 | Multi-Label Learning | 
| 31 | meta - meta learning - shot - task - shot learning | 776 | Few-Shot Learning and Meta-Learning | 
| 32 | traffic - temporal - travel - spatial - road | 770 | Traffic Forecasting and Prediction | 
| 33 | anomaly - anomaly detection - detection - anomalies - outlier | 750 | Anomaly Detection | 
| 34 | uncertainty - calibration - bayesian - bayesian neural - bayesian neural networks | 735 | Uncertainty Estimation in Deep Learning | 
| 35 | variational - inference - posterior - mcmc - carlo | 724 | Inference and Approximation | 
| 36 | domain - domain adaptation - adaptation - source - target | 717 | Unsupervised Domain Adaptation | 
| 37 | continual - continual learning - forgetting - catastrophic forgetting - catastrophic | 679 | Continual Learning and Forgetting | 
| 38 | vae - latent - variational - vaes - generative | 678 | Disentangled Representation Learning | 
| 39 | visual - image - vqa - modal - captioning | 627 | Multimodal Vision and Language Understanding | 
| 40 | code - program - software - programs - source code | 621 | Software Engineering | 
| 41 | brain - fmri - functional - disease - ad | 615 | Brain Connectivity and Disease Diagnosis | 
| 42 | spiking - snns - spike - neurons - spiking neural | 603 | Spiking Neural Networks (SNNs) | 
| 43 | activity - activity recognition - har - gait - sensor | 600 | Human Activity Recognition (HAR) | 
| 44 | dictionary - sparse - signal - dictionary learning - recovery | 595 | Sparse Signal Processing | 
| 45 | news - social - media - fake - fake news | 580 | Fake News Detection | 
| 46 | automl - ml - machine learning - machine - research | 542 | Automated Machine Learning (AutoML) | 
| 47 | class - imbalanced - classifiers - minority - classification | 500 | Class Imbalance in Classification | 
| 48 | gravitational - galaxy - solar - simulations - mass | 491 | Gravitational Wave Detection and Analysis | 
| 49 | molecular - molecules - chemical - drug - molecule | 481 | Molecular Design and Discovery | 
| 50 | recurrent - rnns - rnn - recurrent neural - lstm | 479 | Recurrent Neural Networks (RNNs) | 
| 51 | bo - bayesian optimization - optimization - bayesian - function | 476 | Global Optimization with Bayesian Methods | 
| 52 | logic - reasoning - symbolic - logical - relational | 473 | Integrating Reasoning and Learning | 
| 53 | climate - weather - precipitation - water - forecasting | 472 | Climate and Weather Prediction | 
| 54 | gp - gaussian - gaussian process - gaussian processes - processes | 470 | Scalable Gaussian Process Inference for Large Datasets | 
| 55 | regret - online - online learning - convex - bounds | 456 | Online Learning and Regret Bounds | 
| 56 | language - bert - fine - language models - fine tuning | 455 | Fine-tuning Language Models | 
| 57 | nas - search - architecture search - architecture - neural architecture | 453 | Neural Architecture Search (NAS) | 
| 58 | eeg - bci - brain - eeg signals - signals | 453 | Emotion and Brain Signals Analysis | 
| 59 | dialogue - dialog - conversational - responses - conversation | 433 | Conversational AI Models | 
| 60 | emotion - emotion recognition - facial - recognition - emotions | 417 | Emotion Recognition | 
| 61 | knowledge - knowledge graph - knowledge graphs - kg - entities | 409 | Embedding Knowledge Graphs | 
| 62 | active learning - active - al - learning - labeling | 388 | Active Learning | 
| 63 | quantization - precision - bit - quantized - floating | 378 | Quantization for Deep Neural Networks | 
| 64 | materials - molecular - chemical - atomic - material | 356 | Materials Discovery and Property Prediction using Machine Learning | 
| 65 | bounds - pac - bound - generalization - pac bayes | 352 | Generalization Bounds | 
| 66 | fault - maintenance - industrial - manufacturing - monitoring | 329 | Fault Detection and Diagnosis in Industrial Settings | 
| 67 | translation - machine translation - nmt - neural machine translation - neural machine | 329 | Machine Translation | 
| 68 | tensor - tensors - rank - decomposition - tensor completion | 328 | Tensor Completion and Rank Decomposition | 
| 69 | topic - topics - topic models - lda - topic modeling | 325 | Topic Modeling | 
| 70 | covid - covid 19 - 19 - chest - ct | 312 | Computer-Aided Diagnosis of COVID-19 | 
| 71 | teacher - distillation - student - knowledge distillation - knowledge | 310 | Knowledge Transfer and Distillation | 
| 72 | students - student - course - courses - educational | 310 | Education Technology | 
| 73 | combinatorial - problems - combinatorial optimization - problem - solvers | 310 | Combinatorial Optimization | 
| 74 | trees - tree - forest - decision - decision trees | 304 | Interpretable Machine Learning Models | 
| 75 | contrastive - contrastive learning - self supervised - supervised - self | 303 | Contrastive Learning for Representation Learning | 
| 76 | face - face recognition - facial - deepfake - recognition | 298 | Face Recognition and Bias | 
| 77 | lasso - regression - sparse - screening - sparsity | 298 | High-Dimensional Sparse Regression | 
| 78 | kernel - kernels - random - regression - ridge | 296 | Kernel Methods and Regression | 
| 79 | seismic - inversion - reservoir - oil - velocity | 295 | Seismic Inverse Modeling | 
| 80 | backdoor - poisoning - attacks - attack - backdoor attacks | 292 | Backdoor Attacks | 
| 81 | manifold - manifold learning - dimensional - manifolds - dimensionality | 288 | Manifold Learning and Dimensionality Reduction | 
| 82 | ecg - heart - electrocardiogram - cardiac - signals | 288 | Cardiac Signal Processing and Classification | 
| 83 | attention - vision - vit - transformers - transformer | 286 | Computer Vision Transformers | 
| 84 | word - embeddings - word embeddings - words - embedding | 283 | "Word Embeddings and Their Applications in Natural Language Processing" | 
| 85 | question - qa - questions - answering - answer | 279 | Question Answering | 
| 86 | denoising - image - noise - restoration - image denoising | 276 | Image Denoising | 
| 87 | ctr - product - commerce - click - user | 274 | Advertising and Predictive Modeling | 
| 88 | graphical - graphical models - belief propagation - belief - ising | 266 | Inference and Learning in Graphical Models | 
| 89 | transport - ot - optimal transport - wasserstein - optimal | 264 | Optimal Transport and Related Methods | 
| 90 | matrix - rank - matrix completion - completion - low rank | 259 | Low Rank Matrix Completion | 
| 91 | covid - covid 19 - 19 - pandemic - spread | 255 | COVID-19 Forecasting and Prediction | 
| 92 | svm - support vector - support - svms - vector | 255 | Machine Learning - SVM | 
| 93 | physics - particle - detector - high energy - energy | 244 | High Energy Particle Physics | 
| 94 | feature selection - feature - selection - features - feature selection methods | 240 | Feature Selection for High-Dimensional Data | 
| 95 | ranking - items - rank - pairwise - comparisons | 232 | Ranking and Learning from Noisy Comparisons | 
| 96 | hyperparameter - hpo - hyperparameters - hyperparameter optimization - optimization | 229 | Hyperparameter Optimization for Deep Learning Models | 
| 97 | pricing - revenue - price - auctions - regret | 227 | Dynamic Pricing and Demand Learning | 
| 98 | ood - ood detection - distribution - distribution ood - detection | 225 | Out-of-Distribution Detection in Deep Learning | 
| 99 | bayesian - bayesian networks - bayesian network - structure - structure learning | 224 | Bayesian Network Structure Learning | 
| 100 | pca - principal - principal component - component analysis - principal component analysis | 224 | Principal Component Analysis (PCA) | 
| 101 | protein - proteins - sequence - sequences - structure | 214 | Protein Representation and Prediction | 
| 102 | hashing - hash - codes - retrieval - search | 209 | Large-Scale Image Retrieval and Hashing | 
| 103 | submodular - submodular functions - functions - maximization - approximation | 207 | Submodular Function Minimization | 
| 104 | mixture - em - mixtures - em algorithm - mixture models | 206 | Mixture Models and EM Algorithm | 
| 105 | metric learning - metric - distance - distance metric - similarity | 206 | Metric Learning for Machine Learning | 
| 106 | equivariant - equivariance - group - symmetry - spherical | 200 | Equivariant Deep Learning | 
| 107 | nmf - nonnegative - factorization - matrix - matrix factorization | 199 | NMF (Nonnegative Matrix Factorization) | 
| 108 | compression - video - coding - distortion - rate distortion | 198 | Neural Compression | 
| 109 | mri - reconstruction - pet - imaging - mr | 198 | Magnetic Resonance Imaging Reconstruction | 
| 110 | oct - retinal - dr - diabetic - images | 197 | Retinal Imaging and Disease Diagnosis | 
| 111 | entity - relation - relation extraction - entities - extraction | 192 | Relation Extraction | 
| 112 | handwritten - text - characters - character - recognition | 178 | Handwritten Character Recognition | 
| 113 | augmentation - data augmentation - mixup - data - augmentations | 178 | Data Augmentation for Improving Deep Learning Performance | 
| 114 | crowdsourcing - workers - crowd - worker - crowdsourced | 168 | Crowdsourcing Labeling and Annotation | 
| 115 | summarization - summaries - summary - abstractive - text | 165 | Automatic Summarization | 
| 116 | circuit - design - circuits - chip - synthesis | 161 | Circuit Design Optimization | 
| 117 | view - multi view - views - multi - clustering | 160 | Multi-View Clustering | 
| 118 | cancer - gene - genes - disease - expression | 158 | Cancer Gene Expression Analysis |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: None
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: True
* zeroshot_min_similarity: 0.7
* zeroshot_topic_list: None

## Framework versions

* Numpy: 2.1.3
* HDBSCAN: 0.8.40
* UMAP: 0.5.7
* Pandas: 2.2.3
* Scikit-Learn: 1.6.1
* Sentence-transformers: 3.4.1
* Transformers: 4.49.0
* Numba: 0.61.0
* Plotly: 6.0.1
* Python: 3.10.16