File size: 14,346 Bytes
d8bf775 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# BERTopic_ML-ArXiv-Abstracts
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("b-verma/BERTopic_ML-ArXiv-Abstracts")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 120
* Number of training documents: 117592
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | data - models - model - learning - based | 153 | Machine Learning and Deep Learning |
| 0 | policy - reinforcement - reinforcement learning - rl - agent | 32094 | Reinforcement Learning and Control |
| 1 | graph - node - graphs - nodes - gnns | 10423 | Graph Embedding and Representation Learning |
| 2 | speech - audio - speaker - music - asr | 3598 | Speech Technology |
| 3 | 3d - object - video - segmentation - point | 3527 | 3D Object Understanding |
| 4 | equations - differential - physics - differential equations - pdes | 2706 | Discovering and Solving Partial Differential Equations |
| 5 | adversarial - attacks - adversarial examples - robustness - attack | 2287 | Adversarial Robustness |
| 6 | networks - relu - neural - neural networks - activation | 2210 | Deep Learning Activation Functions |
| 7 | segmentation - medical - images - image - tumor | 2207 | Medical Image Segmentation |
| 8 | gradient - stochastic - sgd - convergence - convex | 1835 | Convergence Analysis of Non-Convex Optimization Algorithms |
| 9 | federated - fl - federated learning - clients - privacy | 1717 | Federated Learning and Privacy |
| 10 | channel - wireless - radio - network - communication | 1698 | Channel Allocation and Estimation in Wireless Communications |
| 11 | privacy - private - differential privacy - dp - differential | 1449 | Privacy-Preserving Machine Learning |
| 12 | clinical - patient - patients - medical - health | 1353 | Clinical Patient Representation Learning |
| 13 | gans - gan - generative - generative adversarial - generator | 1298 | Generative Adversarial Networks (GANs) |
| 14 | bandit - regret - arm - bandits - arms | 1269 | Armed Bandit Problems |
| 15 | financial - stock - market - trading - price | 1246 | Financial Time Series Analysis |
| 16 | recommendation - user - item - recommender - items | 1193 | Recommendation Systems |
| 17 | power - energy - electricity - load - forecasting | 1144 | Power and Energy Forecasting |
| 18 | causal - treatment - observational - effect - causal inference | 1070 | Causal Inference and Learning |
| 19 | explanations - explanation - counterfactual - interpretability - interpretable | 1048 | Explanation Methods for Machine Learning Models |
| 20 | driving - autonomous - vehicle - vehicles - driver | 1007 | Autonomous Driving |
| 21 | malware - detection - iot - security - attacks | 959 | Cybersecurity Threats in IoT Networks |
| 22 | quantum - classical - circuit - circuits - quantum machine | 947 | Quantum Machine Learning |
| 23 | fairness - fair - bias - discrimination - protected | 938 | Fair Machine Learning |
| 24 | hardware - memory - gpu - dnn - accelerators | 924 | Edge AI Hardware for Efficient DNN Inference |
| 25 | clustering - means - clusters - cluster - algorithm | 921 | Clustering Algorithms |
| 26 | crop - images - satellite - remote sensing - hyperspectral | 899 | Remote Sensing and Deep Learning |
| 27 | time series - series - time - forecasting - series forecasting | 847 | Time Series Analysis and Forecasting |
| 28 | pruning - compression - sparsity - sparse - network | 836 | Neural Network Pruning |
| 29 | distributed - communication - sgd - decentralized - gradient | 822 | Distributed Optimization Methods |
| 30 | label - labels - multi label - noisy - noisy labels | 812 | Multi-Label Learning |
| 31 | meta - meta learning - shot - task - shot learning | 776 | Few-Shot Learning and Meta-Learning |
| 32 | traffic - temporal - travel - spatial - road | 770 | Traffic Forecasting and Prediction |
| 33 | anomaly - anomaly detection - detection - anomalies - outlier | 750 | Anomaly Detection |
| 34 | uncertainty - calibration - bayesian - bayesian neural - bayesian neural networks | 735 | Uncertainty Estimation in Deep Learning |
| 35 | variational - inference - posterior - mcmc - carlo | 724 | Inference and Approximation |
| 36 | domain - domain adaptation - adaptation - source - target | 717 | Unsupervised Domain Adaptation |
| 37 | continual - continual learning - forgetting - catastrophic forgetting - catastrophic | 679 | Continual Learning and Forgetting |
| 38 | vae - latent - variational - vaes - generative | 678 | Disentangled Representation Learning |
| 39 | visual - image - vqa - modal - captioning | 627 | Multimodal Vision and Language Understanding |
| 40 | code - program - software - programs - source code | 621 | Software Engineering |
| 41 | brain - fmri - functional - disease - ad | 615 | Brain Connectivity and Disease Diagnosis |
| 42 | spiking - snns - spike - neurons - spiking neural | 603 | Spiking Neural Networks (SNNs) |
| 43 | activity - activity recognition - har - gait - sensor | 600 | Human Activity Recognition (HAR) |
| 44 | dictionary - sparse - signal - dictionary learning - recovery | 595 | Sparse Signal Processing |
| 45 | news - social - media - fake - fake news | 580 | Fake News Detection |
| 46 | automl - ml - machine learning - machine - research | 542 | Automated Machine Learning (AutoML) |
| 47 | class - imbalanced - classifiers - minority - classification | 500 | Class Imbalance in Classification |
| 48 | gravitational - galaxy - solar - simulations - mass | 491 | Gravitational Wave Detection and Analysis |
| 49 | molecular - molecules - chemical - drug - molecule | 481 | Molecular Design and Discovery |
| 50 | recurrent - rnns - rnn - recurrent neural - lstm | 479 | Recurrent Neural Networks (RNNs) |
| 51 | bo - bayesian optimization - optimization - bayesian - function | 476 | Global Optimization with Bayesian Methods |
| 52 | logic - reasoning - symbolic - logical - relational | 473 | Integrating Reasoning and Learning |
| 53 | climate - weather - precipitation - water - forecasting | 472 | Climate and Weather Prediction |
| 54 | gp - gaussian - gaussian process - gaussian processes - processes | 470 | Scalable Gaussian Process Inference for Large Datasets |
| 55 | regret - online - online learning - convex - bounds | 456 | Online Learning and Regret Bounds |
| 56 | language - bert - fine - language models - fine tuning | 455 | Fine-tuning Language Models |
| 57 | nas - search - architecture search - architecture - neural architecture | 453 | Neural Architecture Search (NAS) |
| 58 | eeg - bci - brain - eeg signals - signals | 453 | Emotion and Brain Signals Analysis |
| 59 | dialogue - dialog - conversational - responses - conversation | 433 | Conversational AI Models |
| 60 | emotion - emotion recognition - facial - recognition - emotions | 417 | Emotion Recognition |
| 61 | knowledge - knowledge graph - knowledge graphs - kg - entities | 409 | Embedding Knowledge Graphs |
| 62 | active learning - active - al - learning - labeling | 388 | Active Learning |
| 63 | quantization - precision - bit - quantized - floating | 378 | Quantization for Deep Neural Networks |
| 64 | materials - molecular - chemical - atomic - material | 356 | Materials Discovery and Property Prediction using Machine Learning |
| 65 | bounds - pac - bound - generalization - pac bayes | 352 | Generalization Bounds |
| 66 | fault - maintenance - industrial - manufacturing - monitoring | 329 | Fault Detection and Diagnosis in Industrial Settings |
| 67 | translation - machine translation - nmt - neural machine translation - neural machine | 329 | Machine Translation |
| 68 | tensor - tensors - rank - decomposition - tensor completion | 328 | Tensor Completion and Rank Decomposition |
| 69 | topic - topics - topic models - lda - topic modeling | 325 | Topic Modeling |
| 70 | covid - covid 19 - 19 - chest - ct | 312 | Computer-Aided Diagnosis of COVID-19 |
| 71 | teacher - distillation - student - knowledge distillation - knowledge | 310 | Knowledge Transfer and Distillation |
| 72 | students - student - course - courses - educational | 310 | Education Technology |
| 73 | combinatorial - problems - combinatorial optimization - problem - solvers | 310 | Combinatorial Optimization |
| 74 | trees - tree - forest - decision - decision trees | 304 | Interpretable Machine Learning Models |
| 75 | contrastive - contrastive learning - self supervised - supervised - self | 303 | Contrastive Learning for Representation Learning |
| 76 | face - face recognition - facial - deepfake - recognition | 298 | Face Recognition and Bias |
| 77 | lasso - regression - sparse - screening - sparsity | 298 | High-Dimensional Sparse Regression |
| 78 | kernel - kernels - random - regression - ridge | 296 | Kernel Methods and Regression |
| 79 | seismic - inversion - reservoir - oil - velocity | 295 | Seismic Inverse Modeling |
| 80 | backdoor - poisoning - attacks - attack - backdoor attacks | 292 | Backdoor Attacks |
| 81 | manifold - manifold learning - dimensional - manifolds - dimensionality | 288 | Manifold Learning and Dimensionality Reduction |
| 82 | ecg - heart - electrocardiogram - cardiac - signals | 288 | Cardiac Signal Processing and Classification |
| 83 | attention - vision - vit - transformers - transformer | 286 | Computer Vision Transformers |
| 84 | word - embeddings - word embeddings - words - embedding | 283 | "Word Embeddings and Their Applications in Natural Language Processing" |
| 85 | question - qa - questions - answering - answer | 279 | Question Answering |
| 86 | denoising - image - noise - restoration - image denoising | 276 | Image Denoising |
| 87 | ctr - product - commerce - click - user | 274 | Advertising and Predictive Modeling |
| 88 | graphical - graphical models - belief propagation - belief - ising | 266 | Inference and Learning in Graphical Models |
| 89 | transport - ot - optimal transport - wasserstein - optimal | 264 | Optimal Transport and Related Methods |
| 90 | matrix - rank - matrix completion - completion - low rank | 259 | Low Rank Matrix Completion |
| 91 | covid - covid 19 - 19 - pandemic - spread | 255 | COVID-19 Forecasting and Prediction |
| 92 | svm - support vector - support - svms - vector | 255 | Machine Learning - SVM |
| 93 | physics - particle - detector - high energy - energy | 244 | High Energy Particle Physics |
| 94 | feature selection - feature - selection - features - feature selection methods | 240 | Feature Selection for High-Dimensional Data |
| 95 | ranking - items - rank - pairwise - comparisons | 232 | Ranking and Learning from Noisy Comparisons |
| 96 | hyperparameter - hpo - hyperparameters - hyperparameter optimization - optimization | 229 | Hyperparameter Optimization for Deep Learning Models |
| 97 | pricing - revenue - price - auctions - regret | 227 | Dynamic Pricing and Demand Learning |
| 98 | ood - ood detection - distribution - distribution ood - detection | 225 | Out-of-Distribution Detection in Deep Learning |
| 99 | bayesian - bayesian networks - bayesian network - structure - structure learning | 224 | Bayesian Network Structure Learning |
| 100 | pca - principal - principal component - component analysis - principal component analysis | 224 | Principal Component Analysis (PCA) |
| 101 | protein - proteins - sequence - sequences - structure | 214 | Protein Representation and Prediction |
| 102 | hashing - hash - codes - retrieval - search | 209 | Large-Scale Image Retrieval and Hashing |
| 103 | submodular - submodular functions - functions - maximization - approximation | 207 | Submodular Function Minimization |
| 104 | mixture - em - mixtures - em algorithm - mixture models | 206 | Mixture Models and EM Algorithm |
| 105 | metric learning - metric - distance - distance metric - similarity | 206 | Metric Learning for Machine Learning |
| 106 | equivariant - equivariance - group - symmetry - spherical | 200 | Equivariant Deep Learning |
| 107 | nmf - nonnegative - factorization - matrix - matrix factorization | 199 | NMF (Nonnegative Matrix Factorization) |
| 108 | compression - video - coding - distortion - rate distortion | 198 | Neural Compression |
| 109 | mri - reconstruction - pet - imaging - mr | 198 | Magnetic Resonance Imaging Reconstruction |
| 110 | oct - retinal - dr - diabetic - images | 197 | Retinal Imaging and Disease Diagnosis |
| 111 | entity - relation - relation extraction - entities - extraction | 192 | Relation Extraction |
| 112 | handwritten - text - characters - character - recognition | 178 | Handwritten Character Recognition |
| 113 | augmentation - data augmentation - mixup - data - augmentations | 178 | Data Augmentation for Improving Deep Learning Performance |
| 114 | crowdsourcing - workers - crowd - worker - crowdsourced | 168 | Crowdsourcing Labeling and Annotation |
| 115 | summarization - summaries - summary - abstractive - text | 165 | Automatic Summarization |
| 116 | circuit - design - circuits - chip - synthesis | 161 | Circuit Design Optimization |
| 117 | view - multi view - views - multi - clustering | 160 | Multi-View Clustering |
| 118 | cancer - gene - genes - disease - expression | 158 | Cancer Gene Expression Analysis |
</details>
## Training hyperparameters
* calculate_probabilities: True
* language: None
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: True
* zeroshot_min_similarity: 0.7
* zeroshot_topic_list: None
## Framework versions
* Numpy: 2.1.3
* HDBSCAN: 0.8.40
* UMAP: 0.5.7
* Pandas: 2.2.3
* Scikit-Learn: 1.6.1
* Sentence-transformers: 3.4.1
* Transformers: 4.49.0
* Numba: 0.61.0
* Plotly: 6.0.1
* Python: 3.10.16
|