D0men1c0
/

ISSR_Visual_Model

BERTopic

Model card Files Files and versions

xet

Community

D0men1c0 commited on Aug 26, 2024

Commit

7221d15

verified ·

1 Parent(s): ff0a63f

Update README.md

Browse files

Files changed (1) hide show

README.md +106 -73

README.md CHANGED Viewed

@@ -1,73 +1,106 @@
----
-tags:
-- bertopic
-library_name: bertopic
----
-# ISSR_Visual_Model
-This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
-BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
-## Usage
-To use this model, please install BERTopic:
-```
-pip install -U bertopic
-```
-You can use the model as follows:
-```python
-from bertopic import BERTopic
-topic_model = BERTopic.load("D0men1c0/ISSR_Visual_Model")
-topic_model.get_topic_info()
-```
-## Topic overview
-* Number of topics: 5
-* Number of training documents: 3727
-<details>
-  <summary>Click here for an overview of all topics.</summary>
-  | Topic ID | Topic Keywords | Topic Frequency | Label |
-|----------|----------------|-----------------|-------|
-| -1 | drug - people - gun -  -  | 93 | -1_drug_people_gun_ |
-| 0 | gun - people - drug -  -  | 134 | 0_gun_people_drug_ |
-| 1 | drug - gun -  -  -  | 2701 | 1_drug_gun__ |
-| 2 | people - gun -  -  -  | 429 | 2_people_gun__ |
-| 3 | people - gun - drug -  -  | 370 | 3_people_gun_drug_ |
-</details>
-## Training hyperparameters
-* calculate_probabilities: False
-* language: None
-* low_memory: False
-* min_topic_size: 50
-* n_gram_range: (1, 3)
-* nr_topics: None
-* seed_topic_list: None
-* top_n_words: 5
-* verbose: True
-* zeroshot_min_similarity: 0.7
-* zeroshot_topic_list: None
-## Framework versions
-* Numpy: 1.26.4
-* HDBSCAN: 0.8.36
-* UMAP: 0.5.6
-* Pandas: 2.2.2
-* Scikit-Learn: 1.4.1.post1
-* Sentence-transformers: 3.0.1
-* Transformers: 4.39.3
-* Numba: 0.60.0
-* Plotly: 5.22.0
-* Python: 3.12.4

+---
+tags:
+- bertopic
+library_name: bertopic
+---
+# ISSR_Visual_Model
+This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
+BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
+## Usage
+To use this model, please install BERTopic:
+```
+pip install -U bertopic
+```
+You can use the model as follows:
+```python
+from bertopic import BERTopic
+topic_model = BERTopic.load("D0men1c0/ISSR_Visual_Model")
+topic_model.get_topic_info()
+```
+You can make predictions as follows:
+```python
+val_labels = [...] # list of caption
+val_images = [...] # list of images
+topic, _ = topic_model.transform(val_labels, images=val_images)
+all_topic_info = [topic_model.get_topic_info(t) for t in topic]
+all_prediction_info = pd.concat(all_topic_info, ignore_index=True)
+# Visualize predictions:
+sample_images = 100
+n_images = min(sample_images, len(val_images))
+n_cols = 4
+n_rows = math.ceil(n_images / n_cols)
+fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, n_rows * 3))
+axes = axes.flatten()
+for i, (path, (_, row)) in enumerate(zip(val_images[:n_images], all_prediction_info.iterrows())):
+    ax = axes[i]
+    ax.imshow(Image.open(path))
+    ax.axis('off')
+    ax.set_title(f"Topic {row['Topic']}: {row['KeyBERTInspired'][0]}")
+# Hide unused axes
+for j in range(n_images, len(axes)):
+    axes[j].axis('off')
+plt.tight_layout()
+plt.show()
+```
+## Topic overview
+* Number of topics: 5
+* Number of training documents: 3727
+<details>
+  <summary>Click here for an overview of all topics.</summary>
+  | Topic ID | Topic Keywords | Topic Frequency | Label |
+|----------|----------------|-----------------|-------|
+| -1 | drug - people - gun -  -  | 93 | -1_drug_people_gun_ |
+| 0 | gun - people - drug -  -  | 134 | 0_gun_people_drug_ |
+| 1 | drug - gun -  -  -  | 2701 | 1_drug_gun__ |
+| 2 | people - gun -  -  -  | 429 | 2_people_gun__ |
+| 3 | people - gun - drug -  -  | 370 | 3_people_gun_drug_ |
+</details>
+## Training hyperparameters
+* calculate_probabilities: False
+* language: None
+* low_memory: False
+* min_topic_size: 50
+* n_gram_range: (1, 3)
+* nr_topics: None
+* seed_topic_list: None
+* top_n_words: 5
+* verbose: True
+* zeroshot_min_similarity: 0.7
+* zeroshot_topic_list: None
+## Framework versions
+* Numpy: 1.26.4
+* HDBSCAN: 0.8.36
+* UMAP: 0.5.6
+* Pandas: 2.2.2
+* Scikit-Learn: 1.4.1.post1
+* Sentence-transformers: 3.0.1
+* Transformers: 4.39.3
+* Numba: 0.60.0
+* Plotly: 5.22.0
+* Python: 3.12.4