D0men1c0 commited on
Commit
7221d15
·
verified ·
1 Parent(s): ff0a63f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -73
README.md CHANGED
@@ -1,73 +1,106 @@
1
-
2
- ---
3
- tags:
4
- - bertopic
5
- library_name: bertopic
6
- ---
7
-
8
- # ISSR_Visual_Model
9
-
10
- This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
11
- BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
12
-
13
- ## Usage
14
-
15
- To use this model, please install BERTopic:
16
-
17
- ```
18
- pip install -U bertopic
19
- ```
20
-
21
- You can use the model as follows:
22
-
23
- ```python
24
- from bertopic import BERTopic
25
- topic_model = BERTopic.load("D0men1c0/ISSR_Visual_Model")
26
-
27
- topic_model.get_topic_info()
28
- ```
29
-
30
- ## Topic overview
31
-
32
- * Number of topics: 5
33
- * Number of training documents: 3727
34
-
35
- <details>
36
- <summary>Click here for an overview of all topics.</summary>
37
-
38
- | Topic ID | Topic Keywords | Topic Frequency | Label |
39
- |----------|----------------|-----------------|-------|
40
- | -1 | drug - people - gun - - | 93 | -1_drug_people_gun_ |
41
- | 0 | gun - people - drug - - | 134 | 0_gun_people_drug_ |
42
- | 1 | drug - gun - - - | 2701 | 1_drug_gun__ |
43
- | 2 | people - gun - - - | 429 | 2_people_gun__ |
44
- | 3 | people - gun - drug - - | 370 | 3_people_gun_drug_ |
45
-
46
- </details>
47
-
48
- ## Training hyperparameters
49
-
50
- * calculate_probabilities: False
51
- * language: None
52
- * low_memory: False
53
- * min_topic_size: 50
54
- * n_gram_range: (1, 3)
55
- * nr_topics: None
56
- * seed_topic_list: None
57
- * top_n_words: 5
58
- * verbose: True
59
- * zeroshot_min_similarity: 0.7
60
- * zeroshot_topic_list: None
61
-
62
- ## Framework versions
63
-
64
- * Numpy: 1.26.4
65
- * HDBSCAN: 0.8.36
66
- * UMAP: 0.5.6
67
- * Pandas: 2.2.2
68
- * Scikit-Learn: 1.4.1.post1
69
- * Sentence-transformers: 3.0.1
70
- * Transformers: 4.39.3
71
- * Numba: 0.60.0
72
- * Plotly: 5.22.0
73
- * Python: 3.12.4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ ---
7
+
8
+ # ISSR_Visual_Model
9
+
10
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
11
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
12
+
13
+ ## Usage
14
+
15
+ To use this model, please install BERTopic:
16
+
17
+ ```
18
+ pip install -U bertopic
19
+ ```
20
+
21
+ You can use the model as follows:
22
+
23
+ ```python
24
+ from bertopic import BERTopic
25
+ topic_model = BERTopic.load("D0men1c0/ISSR_Visual_Model")
26
+
27
+ topic_model.get_topic_info()
28
+ ```
29
+
30
+ You can make predictions as follows:
31
+ ```python
32
+
33
+ val_labels = [...] # list of caption
34
+ val_images = [...] # list of images
35
+
36
+ topic, _ = topic_model.transform(val_labels, images=val_images)
37
+ all_topic_info = [topic_model.get_topic_info(t) for t in topic]
38
+ all_prediction_info = pd.concat(all_topic_info, ignore_index=True)
39
+
40
+ # Visualize predictions:
41
+ sample_images = 100
42
+ n_images = min(sample_images, len(val_images))
43
+ n_cols = 4
44
+ n_rows = math.ceil(n_images / n_cols)
45
+
46
+ fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, n_rows * 3))
47
+ axes = axes.flatten()
48
+
49
+ for i, (path, (_, row)) in enumerate(zip(val_images[:n_images], all_prediction_info.iterrows())):
50
+ ax = axes[i]
51
+ ax.imshow(Image.open(path))
52
+ ax.axis('off')
53
+ ax.set_title(f"Topic {row['Topic']}: {row['KeyBERTInspired'][0]}")
54
+
55
+ # Hide unused axes
56
+ for j in range(n_images, len(axes)):
57
+ axes[j].axis('off')
58
+
59
+ plt.tight_layout()
60
+ plt.show()
61
+ ```
62
+
63
+ ## Topic overview
64
+
65
+ * Number of topics: 5
66
+ * Number of training documents: 3727
67
+
68
+ <details>
69
+ <summary>Click here for an overview of all topics.</summary>
70
+
71
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
72
+ |----------|----------------|-----------------|-------|
73
+ | -1 | drug - people - gun - - | 93 | -1_drug_people_gun_ |
74
+ | 0 | gun - people - drug - - | 134 | 0_gun_people_drug_ |
75
+ | 1 | drug - gun - - - | 2701 | 1_drug_gun__ |
76
+ | 2 | people - gun - - - | 429 | 2_people_gun__ |
77
+ | 3 | people - gun - drug - - | 370 | 3_people_gun_drug_ |
78
+
79
+ </details>
80
+
81
+ ## Training hyperparameters
82
+
83
+ * calculate_probabilities: False
84
+ * language: None
85
+ * low_memory: False
86
+ * min_topic_size: 50
87
+ * n_gram_range: (1, 3)
88
+ * nr_topics: None
89
+ * seed_topic_list: None
90
+ * top_n_words: 5
91
+ * verbose: True
92
+ * zeroshot_min_similarity: 0.7
93
+ * zeroshot_topic_list: None
94
+
95
+ ## Framework versions
96
+
97
+ * Numpy: 1.26.4
98
+ * HDBSCAN: 0.8.36
99
+ * UMAP: 0.5.6
100
+ * Pandas: 2.2.2
101
+ * Scikit-Learn: 1.4.1.post1
102
+ * Sentence-transformers: 3.0.1
103
+ * Transformers: 4.39.3
104
+ * Numba: 0.60.0
105
+ * Plotly: 5.22.0
106
+ * Python: 3.12.4