| |
|
| | ---
|
| | tags:
|
| | - bertopic
|
| | library_name: bertopic
|
| | pipeline_tag: text-classification
|
| | ---
|
| |
|
| | # bertopic_openai_emb_model
|
| |
|
| | This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| | BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| |
|
| | ## Usage
|
| |
|
| | To use this model, please install BERTopic:
|
| |
|
| | ```
|
| | pip install -U bertopic
|
| | ```
|
| |
|
| | You can use the model as follows:
|
| |
|
| | ```python
|
| | from bertopic import BERTopic
|
| | topic_model = BERTopic.load("MaximSIMO/bertopic_openai_emb_model")
|
| |
|
| | topic_model.get_topic_info()
|
| | ```
|
| |
|
| | ## Topic overview
|
| |
|
| | * Number of topics: 3
|
| | * Number of training documents: 100
|
| |
|
| | <details>
|
| | <summary>Click here for an overview of all topics.</summary>
|
| |
|
| | | Topic ID | Topic Keywords | Topic Frequency | Label |
|
| | |----------|----------------|-----------------|-------|
|
| | | -1 | Evening TV Programming | 13 | -1_Evening TV Programming |
|
| | | 0 | Elettrodotti e ambiente | 15 | 0_Elettrodotti e ambiente |
|
| | | 1 | Political Tensions | 72 | 1_Political Tensions |
|
| |
|
| | </details>
|
| |
|
| | ## Training hyperparameters
|
| |
|
| | * calculate_probabilities: False
|
| | * language: multilingual
|
| | * low_memory: False
|
| | * min_topic_size: 10
|
| | * n_gram_range: (1, 1)
|
| | * nr_topics: None
|
| | * seed_topic_list: None
|
| | * top_n_words: 10
|
| | * verbose: True
|
| | * zeroshot_min_similarity: 0.7
|
| | * zeroshot_topic_list: None
|
| |
|
| | ## Framework versions
|
| |
|
| | * Numpy: 2.2.6
|
| | * HDBSCAN: 0.8.41
|
| | * UMAP: 0.5.11
|
| | * Pandas: 2.3.3
|
| | * Scikit-Learn: 1.7.2
|
| | * Sentence-transformers: 5.2.2
|
| | * Transformers: 5.1.0
|
| | * Numba: 0.63.1
|
| | * Plotly: 6.5.2
|
| | * Python: 3.10.19
|
| | |