The topics that you create can be hierarchically reduced. In order to understand the potential hierarchical
structure of the topics, we can use `scipy.cluster.hierarchy` to create clusters and visualize how
they relate to one another. This might help to select an appropriate `nr_topics` when reducing the number
of topics that you have created. To visualize this hierarchy, run the following:
```python
topic_model.visualize_hierarchy()
```
!!! note
Do note that this is not the actual procedure of `.reduce_topics()` when `nr_topics` is set to
auto since HDBSCAN is used to automatically extract topics. The visualization above closely resembles
the actual procedure of `.reduce_topics()` when any number of `nr_topics` is selected.
### **Hierarchical labels**
Although visualizing this hierarchy gives us information about the structure, it would be helpful to see what happens
to the topic representations when merging topics. To do so, we first need to calculate the representations of the
hierarchical topics:
First, we train a basic BERTopic model:
```python
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)
hierarchical_topics = topic_model.hierarchical_topics(docs)
```
To visualize these results, we simply need to pass the resulting `hierarchical_topics` to our `.visualize_hierarchy` function:
```python
topic_model.visualize_hierarchy(hierarchical_topics=hierarchical_topics)
```
If you **hover** over the black circles, you will see the topic representation at that level of the hierarchy. These representations
help you understand the effect of merging certain topics. Some might be logical to merge whilst others might not. Moreover,
we can now see which sub-topics can be found within certain larger themes.
### **Text-based topic tree**
Although this gives a nice overview of the potential hierarchy, hovering over all black circles can be tiresome. Instead, we can
use `topic_model.get_topic_tree` to create a text-based representation of this hierarchy. Although the general structure is more difficult
to view, we can see better which topics could be logically merged:
```python
>>> tree = topic_model.get_topic_tree(hierarchical_topics)
>>> print(tree)
.
└─atheists_atheism_god_moral_atheist
├─atheists_atheism_god_atheist_argument
│ ├─■──atheists_atheism_god_atheist_argument ── Topic: 21
│ └─■──br_god_exist_genetic_existence ── Topic: 124
└─■──moral_morality_objective_immoral_morals ── Topic: 29
```
Click here to view the full tree.
```bash
.
├─people_armenian_said_god_armenians
│ ├─god_jesus_jehovah_lord_christ
│ │ ├─god_jesus_jehovah_lord_christ
│ │ │ ├─jehovah_lord_mormon_mcconkie_god
│ │ │ │ ├─■──ra_satan_thou_god_lucifer ── Topic: 94
│ │ │ │ └─■──jehovah_lord_mormon_mcconkie_unto ── Topic: 78
│ │ │ └─jesus_mary_god_hell_sin
│ │ │ ├─jesus_hell_god_eternal_heaven
│ │ │ │ ├─hell_jesus_eternal_god_heaven
│ │ │ │ │ ├─■──jesus_tomb_disciples_resurrection_john ── Topic: 69
│ │ │ │ │ └─■──hell_eternal_god_jesus_heaven ── Topic: 53
│ │ │ │ └─■──aaron_baptism_sin_law_god ── Topic: 89
│ │ │ └─■──mary_sin_maria_priest_conception ── Topic: 56
│ │ └─■──marriage_married_marry_ceremony_marriages ── Topic: 110
│ └─people_armenian_armenians_said_mr
│ ├─people_armenian_armenians_said_israel
│ │ ├─god_homosexual_homosexuality_atheists_sex
│ │ │ ├─homosexual_homosexuality_sex_gay_homosexuals
│ │ │ │ ├─■──kinsey_sex_gay_men_sexual ── Topic: 44
│ │ │ │ └─homosexuality_homosexual_sin_homosexuals_gay
│ │ │ │ ├─■──gay_homosexual_homosexuals_sexual_cramer ── Topic: 50
│ │ │ │ └─■──homosexuality_homosexual_sin_paul_sex ── Topic: 27
│ │ │ └─god_atheists_atheism_moral_atheist
│ │ │ ├─islam_quran_judas_islamic_book
│ │ │ │ ├─■──jim_context_challenges_articles_quote ── Topic: 36
│ │ │ │ └─islam_quran_judas_islamic_book
│ │ │ │ ├─■──islam_quran_islamic_rushdie_muslims ── Topic: 31
│ │ │ │ └─■──judas_scripture_bible_books_greek ── Topic: 33
│ │ │ └─atheists_atheism_god_moral_atheist
│ │ │ ├─atheists_atheism_god_atheist_argument
│ │ │ │ ├─■──atheists_atheism_god_atheist_argument ── Topic: 21
│ │ │ │ └─■──br_god_exist_genetic_existence ── Topic: 124
│ │ │ └─■──moral_morality_objective_immoral_morals ── Topic: 29
│ │ └─armenian_armenians_people_israel_said
│ │ ├─armenian_armenians_israel_people_jews
│ │ │ ├─tax_rights_government_income_taxes
│ │ │ │ ├─■──rights_right_slavery_slaves_residence ── Topic: 106
│ │ │ │ └─tax_government_taxes_income_libertarians
│ │ │ │ ├─■──government_libertarians_libertarian_regulation_party ── Topic: 58
│ │ │ │ └─■──tax_taxes_income_billion_deficit ── Topic: 41
│ │ │ └─armenian_armenians_israel_people_jews
│ │ │ ├─gun_guns_militia_firearms_amendment
│ │ │ │ ├─■──blacks_penalty_death_cruel_punishment ── Topic: 55
│ │ │ │ └─■──gun_guns_militia_firearms_amendment ── Topic: 7
│ │ │ └─armenian_armenians_israel_jews_turkish
│ │ │ ├─■──israel_israeli_jews_arab_jewish ── Topic: 4
│ │ │ └─■──armenian_armenians_turkish_armenia_azerbaijan ── Topic: 15
│ │ └─stephanopoulos_president_mr_myers_ms
│ │ ├─■──serbs_muslims_stephanopoulos_mr_bosnia ── Topic: 35
│ │ └─■──myers_stephanopoulos_president_ms_mr ── Topic: 87
│ └─batf_fbi_koresh_compound_gas
│ ├─■──reno_workers_janet_clinton_waco ── Topic: 77
│ └─batf_fbi_koresh_gas_compound
│ ├─batf_koresh_fbi_warrant_compound
│ │ ├─■──batf_warrant_raid_compound_fbi ── Topic: 42
│ │ └─■──koresh_batf_fbi_children_compound ── Topic: 61
│ └─■──fbi_gas_tear_bds_building ── Topic: 23
└─use_like_just_dont_new
├─game_team_year_games_like
│ ├─game_team_games_25_year
│ │ ├─game_team_games_25_season
│ │ │ ├─window_printer_use_problem_mhz
│ │ │ │ ├─mhz_wire_simms_wiring_battery
│ │ │ │ │ ├─simms_mhz_battery_cpu_heat
│ │ │ │ │ │ ├─simms_pds_simm_vram_lc
│ │ │ │ │ │ │ ├─■──pds_nubus_lc_slot_card ── Topic: 119
│ │ │ │ │ │ │ └─■──simms_simm_vram_meg_dram ── Topic: 32
│ │ │ │ │ │ └─mhz_battery_cpu_heat_speed
│ │ │ │ │ │ ├─mhz_cpu_speed_heat_fan
│ │ │ │ │ │ │ ├─mhz_cpu_speed_heat_fan
│ │ │ │ │ │ │ │ ├─■──fan_cpu_heat_sink_fans ── Topic: 92
│ │ │ │ │ │ │ │ └─■──mhz_speed_cpu_fpu_clock ── Topic: 22
│ │ │ │ │ │ │ └─■──monitor_turn_power_computer_electricity ── Topic: 91
│ │ │ │ │ │ └─battery_batteries_concrete_duo_discharge
│ │ │ │ │ │ ├─■──duo_battery_apple_230_problem ── Topic: 121
│ │ │ │ │ │ └─■──battery_batteries_concrete_discharge_temperature ── Topic: 75
│ │ │ │ │ └─wire_wiring_ground_neutral_outlets
│ │ │ │ │ ├─wire_wiring_ground_neutral_outlets
│ │ │ │ │ │ ├─wire_wiring_ground_neutral_outlets
│ │ │ │ │ │ │ ├─■──leds_uv_blue_light_boards ── Topic: 66
│ │ │ │ │ │ │ └─■──wire_wiring_ground_neutral_outlets ── Topic: 120
│ │ │ │ │ │ └─scope_scopes_phone_dial_number
│ │ │ │ │ │ ├─■──dial_number_phone_line_output ── Topic: 93
│ │ │ │ │ │ └─■──scope_scopes_motorola_generator_oscilloscope ── Topic: 113
│ │ │ │ │ └─celp_dsp_sampling_antenna_digital
│ │ │ │ │ ├─■──antenna_antennas_receiver_cable_transmitter ── Topic: 70
│ │ │ │ │ └─■──celp_dsp_sampling_speech_voice ── Topic: 52
│ │ │ │ └─window_printer_xv_mouse_windows
│ │ │ │ ├─window_xv_error_widget_problem
│ │ │ │ │ ├─error_symbol_undefined_xterm_rx
│ │ │ │ │ │ ├─■──symbol_error_undefined_doug_parse ── Topic: 63
│ │ │ │ │ │ └─■──rx_remote_server_xdm_xterm ── Topic: 45
│ │ │ │ │ └─window_xv_widget_application_expose
│ │ │ │ │ ├─window_widget_expose_application_event
│ │ │ │ │ │ ├─■──gc_mydisplay_draw_gxxor_drawing ── Topic: 103
│ │ │ │ │ │ └─■──window_widget_application_expose_event ── Topic: 25
│ │ │ │ │ └─xv_den_polygon_points_algorithm
│ │ │ │ │ ├─■──den_polygon_points_algorithm_polygons ── Topic: 28
│ │ │ │ │ └─■──xv_24bit_image_bit_images ── Topic: 57
│ │ │ │ └─printer_fonts_print_mouse_postscript
│ │ │ │ ├─printer_fonts_print_font_deskjet
│ │ │ │ │ ├─■──scanner_logitech_grayscale_ocr_scanman ── Topic: 108
│ │ │ │ │ └─printer_fonts_print_font_deskjet
│ │ │ │ │ ├─■──printer_print_deskjet_hp_ink ── Topic: 18
│ │ │ │ │ └─■──fonts_font_truetype_tt_atm ── Topic: 49
│ │ │ │ └─mouse_ghostscript_midi_driver_postscript
│ │ │ │ ├─ghostscript_midi_postscript_files_file
│ │ │ │ │ ├─■──ghostscript_postscript_pageview_ghostview_dsc ── Topic: 104
│ │ │ │ │ └─midi_sound_file_windows_driver
│ │ │ │ │ ├─■──location_mar_file_host_rwrr ── Topic: 83
│ │ │ │ │ └─■──midi_sound_driver_blaster_soundblaster ── Topic: 98
│ │ │ │ └─■──mouse_driver_mice_ball_problem ── Topic: 68
│ │ │ └─game_team_games_25_season
│ │ │ ├─1st_sale_condition_comics_hulk
│ │ │ │ ├─sale_condition_offer_asking_cd
│ │ │ │ │ ├─condition_stereo_amp_speakers_asking
│ │ │ │ │ │ ├─■──miles_car_amfm_toyota_cassette ── Topic: 62
│ │ │ │ │ │ └─■──amp_speakers_condition_stereo_audio ── Topic: 24
│ │ │ │ │ └─games_sale_pom_cds_shipping
│ │ │ │ │ ├─pom_cds_sale_shipping_cd
│ │ │ │ │ │ ├─■──size_shipping_sale_condition_mattress ── Topic: 100
│ │ │ │ │ │ └─■──pom_cds_cd_sale_picture ── Topic: 37
│ │ │ │ │ └─■──games_game_snes_sega_genesis ── Topic: 40
│ │ │ │ └─1st_hulk_comics_art_appears
│ │ │ │ ├─1st_hulk_comics_art_appears
│ │ │ │ │ ├─lens_tape_camera_backup_lenses
│ │ │ │ │ │ ├─■──tape_backup_tapes_drive_4mm ── Topic: 107
│ │ │ │ │ │ └─■──lens_camera_lenses_zoom_pouch ── Topic: 114
│ │ │ │ │ └─1st_hulk_comics_art_appears
│ │ │ │ │ ├─■──1st_hulk_comics_art_appears ── Topic: 105
│ │ │ │ │ └─■──books_book_cover_trek_chemistry ── Topic: 125
│ │ │ │ └─tickets_hotel_ticket_voucher_package
│ │ │ │ ├─■──hotel_voucher_package_vacation_room ── Topic: 74
│ │ │ │ └─■──tickets_ticket_june_airlines_july ── Topic: 84
│ │ │ └─game_team_games_season_hockey
│ │ │ ├─game_hockey_team_25_550
│ │ │ │ ├─■──espn_pt_pts_game_la ── Topic: 17
│ │ │ │ └─■──team_25_game_hockey_550 ── Topic: 2
│ │ │ └─■──year_game_hit_baseball_players ── Topic: 0
│ │ └─bike_car_greek_insurance_msg
│ │ ├─car_bike_insurance_cars_engine
│ │ │ ├─car_insurance_cars_radar_engine
│ │ │ │ ├─insurance_health_private_care_canada
│ │ │ │ │ ├─■──insurance_health_private_care_canada ── Topic: 99
│ │ │ │ │ └─■──insurance_car_accident_rates_sue ── Topic: 82
│ │ │ │ └─car_cars_radar_engine_detector
│ │ │ │ ├─car_radar_cars_detector_engine
│ │ │ │ │ ├─■──radar_detector_detectors_ka_alarm ── Topic: 39
│ │ │ │ │ └─car_cars_mustang_ford_engine
│ │ │ │ │ ├─■──clutch_shift_shifting_transmission_gear ── Topic: 88
│ │ │ │ │ └─■──car_cars_mustang_ford_v8 ── Topic: 14
│ │ │ │ └─oil_diesel_odometer_diesels_car
│ │ │ │ ├─odometer_oil_sensor_car_drain
│ │ │ │ │ ├─■──odometer_sensor_speedo_gauge_mileage ── Topic: 96
│ │ │ │ │ └─■──oil_drain_car_leaks_taillights ── Topic: 102
│ │ │ │ └─■──diesel_diesels_emissions_fuel_oil ── Topic: 79
│ │ │ └─bike_riding_ride_bikes_motorcycle
│ │ │ ├─bike_ride_riding_bikes_lane
│ │ │ │ ├─■──bike_ride_riding_lane_car ── Topic: 11
│ │ │ │ └─■──bike_bikes_miles_honda_motorcycle ── Topic: 19
│ │ │ └─■──countersteering_bike_motorcycle_rear_shaft ── Topic: 46
│ │ └─greek_msg_kuwait_greece_water
│ │ ├─greek_msg_kuwait_greece_water
│ │ │ ├─greek_msg_kuwait_greece_dog
│ │ │ │ ├─greek_msg_kuwait_greece_dog
│ │ │ │ │ ├─greek_kuwait_greece_turkish_greeks
│ │ │ │ │ │ ├─■──greek_greece_turkish_greeks_cyprus ── Topic: 71
│ │ │ │ │ │ └─■──kuwait_iraq_iran_gulf_arabia ── Topic: 76
│ │ │ │ │ └─msg_dog_drugs_drug_food
│ │ │ │ │ ├─dog_dogs_cooper_trial_weaver
│ │ │ │ │ │ ├─■──clinton_bush_quayle_reagan_panicking ── Topic: 101
│ │ │ │ │ │ └─dog_dogs_cooper_trial_weaver
│ │ │ │ │ │ ├─■──cooper_trial_weaver_spence_witnesses ── Topic: 90
│ │ │ │ │ │ └─■──dog_dogs_bike_trained_springer ── Topic: 67
│ │ │ │ │ └─msg_drugs_drug_food_chinese
│ │ │ │ │ ├─■──msg_food_chinese_foods_taste ── Topic: 30
│ │ │ │ │ └─■──drugs_drug_marijuana_cocaine_alcohol ── Topic: 72
│ │ │ │ └─water_theory_universe_science_larsons
│ │ │ │ ├─water_nuclear_cooling_steam_dept
│ │ │ │ │ ├─■──rocketry_rockets_engines_nuclear_plutonium ── Topic: 115
│ │ │ │ │ └─water_cooling_steam_dept_plants
│ │ │ │ │ ├─■──water_dept_phd_environmental_atmospheric ── Topic: 97
│ │ │ │ │ └─■──cooling_water_steam_towers_plants ── Topic: 109
│ │ │ │ └─theory_universe_larsons_larson_science
│ │ │ │ ├─■──theory_universe_larsons_larson_science ── Topic: 54
│ │ │ │ └─■──oort_cloud_grbs_gamma_burst ── Topic: 80
│ │ │ └─helmet_kirlian_photography_lock_wax
│ │ │ ├─helmet_kirlian_photography_leaf_mask
│ │ │ │ ├─kirlian_photography_leaf_pictures_deleted
│ │ │ │ │ ├─deleted_joke_stuff_maddi_nickname
│ │ │ │ │ │ ├─■──joke_maddi_nickname_nicknames_frank ── Topic: 43
│ │ │ │ │ │ └─■──deleted_stuff_bookstore_joke_motto ── Topic: 81
│ │ │ │ │ └─■──kirlian_photography_leaf_pictures_aura ── Topic: 85
│ │ │ │ └─helmet_mask_liner_foam_cb
│ │ │ │ ├─■──helmet_liner_foam_cb_helmets ── Topic: 112
│ │ │ │ └─■──mask_goalies_77_santore_tl ── Topic: 123
│ │ │ └─lock_wax_paint_plastic_ear
│ │ │ ├─■──lock_cable_locks_bike_600 ── Topic: 117
│ │ │ └─wax_paint_ear_plastic_skin
│ │ │ ├─■──wax_paint_plastic_scratches_solvent ── Topic: 65
│ │ │ └─■──ear_wax_skin_greasy_acne ── Topic: 116
│ │ └─m4_mp_14_mw_mo
│ │ ├─m4_mp_14_mw_mo
│ │ │ ├─■──m4_mp_14_mw_mo ── Topic: 111
│ │ │ └─■──test_ensign_nameless_deane_deanebinahccbrandeisedu ── Topic: 118
│ │ └─■──ites_cheek_hello_hi_ken ── Topic: 3
│ └─space_medical_health_disease_cancer
│ ├─medical_health_disease_cancer_patients
│ │ ├─■──cancer_centers_center_medical_research ── Topic: 122
│ │ └─health_medical_disease_patients_hiv
│ │ ├─patients_medical_disease_candida_health
│ │ │ ├─■──candida_yeast_infection_gonorrhea_infections ── Topic: 48
│ │ │ └─patients_disease_cancer_medical_doctor
│ │ │ ├─■──hiv_medical_cancer_patients_doctor ── Topic: 34
│ │ │ └─■──pain_drug_patients_disease_diet ── Topic: 26
│ │ └─■──health_newsgroup_tobacco_vote_votes ── Topic: 9
│ └─space_launch_nasa_shuttle_orbit
│ ├─space_moon_station_nasa_launch
│ │ ├─■──sky_advertising_billboard_billboards_space ── Topic: 59
│ │ └─■──space_station_moon_redesign_nasa ── Topic: 16
│ └─space_mission_hst_launch_orbit
│ ├─space_launch_nasa_orbit_propulsion
│ │ ├─■──space_launch_nasa_propulsion_astronaut ── Topic: 47
│ │ └─■──orbit_km_jupiter_probe_earth ── Topic: 86
│ └─■──hst_mission_shuttle_orbit_arrays ── Topic: 60
└─drive_file_key_windows_use
├─key_file_jpeg_encryption_image
│ ├─key_encryption_clipper_chip_keys
│ │ ├─■──key_clipper_encryption_chip_keys ── Topic: 1
│ │ └─■──entry_file_ripem_entries_key ── Topic: 73
│ └─jpeg_image_file_gif_images
│ ├─motif_graphics_ftp_available_3d
│ │ ├─motif_graphics_openwindows_ftp_available
│ │ │ ├─■──openwindows_motif_xview_windows_mouse ── Topic: 20
│ │ │ └─■──graphics_widget_ray_3d_available ── Topic: 95
│ │ └─■──3d_machines_version_comments_contact ── Topic: 38
│ └─jpeg_image_gif_images_format
│ ├─■──gopher_ftp_files_stuffit_images ── Topic: 51
│ └─■──jpeg_image_gif_format_images ── Topic: 13
└─drive_db_card_scsi_windows
├─db_windows_dos_mov_os2
│ ├─■──copy_protection_program_software_disk ── Topic: 64
│ └─■──db_windows_dos_mov_os2 ── Topic: 8
└─drive_card_scsi_drives_ide
├─drive_scsi_drives_ide_disk
│ ├─■──drive_scsi_drives_ide_disk ── Topic: 6
│ └─■──meg_sale_ram_drive_shipping ── Topic: 12
└─card_modem_monitor_video_drivers
├─■──card_monitor_video_drivers_vga ── Topic: 5
└─■──modem_port_serial_irq_com ── Topic: 10
```
## **Visualize Hierarchical Documents**
We can extend the previous method by calculating the topic representation at different levels of the hierarchy and
plotting them on a 2D plane. To do so, we first need to calculate the hierarchical topics:
```python
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from bertopic import BERTopic
from umap import UMAP
# Prepare embeddings
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = sentence_model.encode(docs, show_progress_bar=False)
# Train BERTopic and extract hierarchical topics
topic_model = BERTopic().fit(docs, embeddings)
hierarchical_topics = topic_model.hierarchical_topics(docs)
```
Then, we can visualize the hierarchical documents by either supplying it with our embeddings or by
reducing their dimensionality ourselves:
```python
# Run the visualization with the original embeddings
topic_model.visualize_hierarchical_documents(docs, hierarchical_topics, embeddings=embeddings)
# Reduce dimensionality of embeddings, this step is optional but much faster to perform iteratively:
reduced_embeddings = UMAP(n_neighbors=10, n_components=2, min_dist=0.0, metric='cosine').fit_transform(embeddings)
topic_model.visualize_hierarchical_documents(docs, hierarchical_topics, reduced_embeddings=reduced_embeddings)
```
!!! note
The visualization above was generated with the additional parameter `hide_document_hover=True` which disables the
option to hover over the individual points and see the content of the documents. This makes the resulting visualization
smaller and fit into your RAM. However, it might be interesting to set `hide_document_hover=False` to hover
over the points and see the content of the documents.