--- tags: - bertopic library_name: bertopic pipeline_tag: text-classification --- # BERTopic-enron-5000 This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. ## Usage To use this model, please install BERTopic: ``` pip install -U bertopic ``` You can use the model as follows: ```python from bertopic import BERTopic topic_model = BERTopic.load("antulik/BERTopic-enron-5000") topic_model.get_topic_info() ``` ## Topic overview * Number of topics: 65 * Number of training documents: 5000
Click here for an overview of all topics. | Topic ID | Topic Keywords | Topic Frequency | Label | |----------|----------------|-----------------|-------| | -1 | enron - corp - contract - company - trading | 10 | -1_enron_corp_contract_company | | 0 | going - meeting - meet - hope - night | 2299 | 0_going_meeting_meet_hope | | 1 | agreements - enron - agreement - contract - documents | 481 | 1_agreements_enron_agreement_contract | | 2 | enron - enrons - companies - company - market | 263 | 2_enron_enrons_companies_company | | 3 | enron - contact - corp - email - recipient | 253 | 3_enron_contact_corp_email | | 4 | telecom - ventures - financial - companies - markets | 84 | 4_telecom_ventures_financial_companies | | 5 | enron - email - recipient - recipients - message | 76 | 5_enron_email_recipient_recipients | | 6 | fares - newark - airlines - flight - miles | 58 | 6_fares_newark_airlines_flight | | 7 | nfl - commissionercom - td - sportslinecom - league | 54 | 7_nfl_commissionercom_td_sportslinecom | | 8 | enron - eov - ashleyworthingenroncom - erv - rho | 53 | 8_enron_eov_ashleyworthingenroncom_erv | | 9 | enron - enrons - bankruptcy - bankrupt - savings | 51 | 9_enron_enrons_bankruptcy_bankrupt | | 10 | outlookmigrationteamenroncom - outlook - outlookteamenroncom - emailcalendar - appointment | 46 | 10_outlookmigrationteamenroncom_outlook_outlookteamenroncom_emailcalendar | | 11 | enron - approver - approval - pending - econnect | 46 | 11_enron_approver_approval_pending | | 12 | schedules2002013118txt - schedules2002020115txt - schedules2002012506txt - schedules2001122507txt - schedules2001122815txt | 45 | 12_schedules2002013118txt_schedules2002020115txt_schedules2002012506txt_schedules2001122507txt | | 13 | pricing - lpg - logistics - freight - metered | 44 | 13_pricing_lpg_logistics_freight | | 14 | request - seeks - up - on - all | 43 | 14_request_seeks_up_on | | 15 | haas - semester - summers - faculty - mba | 43 | 15_haas_semester_summers_faculty | | 16 | federal - california - sacramento - californias - states | 42 | 16_federal_california_sacramento_californias | | 17 | enron - resumes - resume - interview - recruiter | 41 | 17_enron_resumes_resume_interview | | 18 | fontstyle - font - html - bold - sansserif | 39 | 18_fontstyle_font_html_bold | | 19 | enron - deals - trades - deal - tradesxls | 37 | 19_enron_deals_trades_deal | | 20 | pipeline - pipelines - piping - paso - pipe | 36 | 20_pipeline_pipelines_piping_paso | | 21 | enron - eb - contact - mailtobobshultsenroncom - emailed | 36 | 21_enron_eb_contact_mailtobobshultsenroncom | | 22 | outage - outagesindustrialinfocom - outages - rescheduled - scheduled | 36 | 22_outage_outagesindustrialinfocom_outages_rescheduled | | 23 | gifts - gift - holiday - holidays - christmas | 36 | 23_gifts_gift_holiday_holidays | | 24 | nymex - futures - expiration - contract - contracts | 31 | 24_nymex_futures_expiration_contract | | 25 | transmission - transco - translink - ferc - rtos | 30 | 25_transmission_transco_translink_ferc | | 26 | unsubscribe - email - newsletter - mailing - mailmanenroncom | 30 | 26_unsubscribe_email_newsletter_mailing | | 27 | invoices - invoice - enron - billed - reimbursement | 29 | 27_invoices_invoice_enron_billed | | 28 | enron - committee - lobbyist - judiciary - bill | 28 | 28_enron_committee_lobbyist_judiciary | | 29 | refinery - prices - pipeline - oil - price | 27 | 29_refinery_prices_pipeline_oil | | 30 | enron - gas - fuel - logistics - emissions | 27 | 30_enron_gas_fuel_logistics | | 31 | enron - dpc - topockpcb - ebizenroncom - pcb | 24 | 31_enron_dpc_topockpcb_ebizenroncom | | 32 | nyisotechexchange - nyisotechexchangeglobal2000net - marketrelationsnyisocom - nyiso - ownernyisotechexchangeliststhebiznet | 24 | 32_nyisotechexchange_nyisotechexchangeglobal2000net_marketrelationsnyisocom_nyiso | | 33 | expense - expenses - enron - enronupdateconcureworkplacecom - receipts | 24 | 33_expense_expenses_enron_enronupdateconcureworkplacecom | | 34 | enron - ebusiness - inquiries - advisory - contact | 23 | 34_enron_ebusiness_inquiries_advisory | | 35 | dbcaps97data - schedules2002011801txt - schedules2002011805txt - schedules2001102112txt - schedules2002011916txt | 21 | 35_dbcaps97data_schedules2002011801txt_schedules2002011805txt_schedules2001102112txt | | 36 | enrononline - trades - trading - deals - eol | 20 | 36_enrononline_trades_trading_deals | | 37 | enron - swaps - swap - exchange - exchanges | 20 | 37_enron_swaps_swap_exchange | | 38 | feedback - reviewers - review - process - reviewer | 20 | 38_feedback_reviewers_review_process | | 39 | powermarketerscom - electricity - energy - utilities - reuters | 20 | 39_powermarketerscom_electricity_energy_utilities | | 40 | tco - columbias - columbia - scheduled - cgt | 19 | 40_tco_columbias_columbia_scheduled | | 41 | curves - curve - data - changes - inactive | 19 | 41_curves_curve_data_changes | | 42 | enron - scheduled - eb3335 - rustybelflowerenroncom - brianredmondenroncom | 19 | 42_enron_scheduled_eb3335_rustybelflowerenroncom | | 43 | enron - executive - ceo - communicationsenron - director | 18 | 43_enron_executive_ceo_communicationsenron | | 44 | alert - alerts - ipo - stock - securities | 17 | 44_alert_alerts_ipo_stock | | 45 | invoice - ipayitenroncom - sapsecurityenroncom - ipayit - ehronline | 17 | 45_invoice_ipayitenroncom_sapsecurityenroncom_ipayit | | 46 | variances - variance - schedules - schedule - schedulingiso | 17 | 46_variances_variance_schedules_schedule | | 47 | futures - charts - carr - financial - 1500 | 17 | 47_futures_charts_carr_financial | | 48 | approval - approved - authorized - eisb - tariff | 16 | 48_approval_approved_authorized_eisb | | 49 | fee - credit - express - membership - merchant | 15 | 49_fee_credit_express_membership | | 50 | fee - subscription - billing - discount - monthly | 15 | 50_fee_subscription_billing_discount | | 51 | schedules2001102810txt - schedules2001123103txt - schedules2001030406txt - schedules2002010121txt - schedules2001043008txt | 14 | 51_schedules2001102810txt_schedules2001123103txt_schedules2001030406txt_schedules2002010121txt | | 52 | managementcrd - gd - ets - gasdeskenroncom - sst | 14 | 52_managementcrd_gd_ets_gasdeskenroncom | | 53 | shipping - shipment - order - orders - delivery | 14 | 53_shipping_shipment_order_orders | | 54 | dish - satellite - free - channels - dvds | 14 | 54_dish_satellite_free_channels | | 55 | mailbox - outlook - inbox - exchangeadministratorenroncom - folder | 13 | 55_mailbox_outlook_inbox_exchangeadministratorenroncom | | 56 | netware - visualwares - backoffice - newsletter - file | 13 | 56_netware_visualwares_backoffice_newsletter | | 57 | enronfcucom - survey - enronannouncementsenroncom - ews - service | 13 | 57_enronfcucom_survey_enronannouncementsenroncom_ews | | 58 | pira - forecast - piras - demand - weekly | 12 | 58_pira_forecast_piras_demand | | 59 | pricing - enron - cost - rate - price | 12 | 59_pricing_enron_cost_rate | | 60 | whitening - medication - strength - clinical - doctor | 11 | 60_whitening_medication_strength_clinical | | 61 | enron - industries - ebusiness - industrial - ena | 11 | 61_enron_industries_ebusiness_industrial | | 62 | px - credit - pe - sce - tariff | 10 | 62_px_credit_pe_sce | | 63 | enron - eesi - eemc - assets - nepco | 10 | 63_enron_eesi_eemc_assets |
## Training hyperparameters * calculate_probabilities: False * language: english * low_memory: False * min_topic_size: 10 * n_gram_range: (1, 1) * nr_topics: None * seed_topic_list: [['drug', 'cancer', 'drugs', 'doctor'], ['windows', 'drive', 'dos', 'file'], ['space', 'launch', 'orbit', 'lunar']] * top_n_words: 10 * verbose: False * zeroshot_min_similarity: 0.7 * zeroshot_topic_list: None ## Framework versions * Numpy: 1.23.5 * HDBSCAN: 0.8.33 * UMAP: 0.5.6 * Pandas: 2.0.3 * Scikit-Learn: 1.2.2 * Sentence-transformers: 2.7.0 * Transformers: 4.40.1 * Numba: 0.58.1 * Plotly: 5.15.0 * Python: 3.10.12