CCRss
/

topic_modeling_top2vec_scientific-texts

Model card Files Files and versions

CCRss commited on Mar 31, 2024

Commit

d0f107b

·

verified ·

1 Parent(s): 99691a8

Create README.md

Files changed (1) hide show

README.md +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+license: mit
+language:
+- en
+tags:
+- topic-modeling
+---
+# Top2Vec Scientific Texts Model
+This repository hosts the `top2vec_scientific_texts` model, a specialized Top2Vec model trained on scientific texts for topic modeling and semantic search.
+## Model Overview
+The `top2vec_scientific_texts` model is built for analyzing scientific literature. It leverages the Universal Sentence Encoder for embedding texts and uses Top2Vec for topic modeling.
+### Key Features:
+- **Domain-Specific:** Tailored for scientific texts.
+- **Base Model:** Utilizes the Universal Sentence Encoder for effective text embeddings.
+- **Topic Modeling:** Employs Top2Vec for discovering topics in scientific documents.
+## Installation
+To use the model, you need to install the following dependencies:
+```bash
+pip install top2vec
+pip install top2vec[sentence_encoders]
+pip install tensorflow==2.8.0
+pip install tensorflow-probability==0.16.0
+```
+## Usage
+Here's an example of how to use the model for topic modeling:
+```bash
+from top2vec import Top2Vec
+# Load your documents
+docs = ["Document 1 text", "Document 2 text", ...]
+# Initialize the Top2Vec model
+model = Top2Vec(
+    documents=docs,
+    speed='learn',
+    workers=80,
+    embedding_model='universal-sentence-encoder',
+    umap_args={'n_neighbors': 15, 'n_components': 5, 'metric': 'cosine', 'min_dist': 0.0, 'random_state': 42},
+    hdbscan_args={'min_cluster_size': 15, 'metric': 'euclidean', 'cluster_selection_method': 'eom'}
+)
+```
+# Save the model
+```bash
+model.save('top2vec_scientific_texts_model')
+```
+## Dataset
+The model was trained on a dataset of scientific abstracts sourced from [arXiv](https://arxiv.org/). The dataset covers a range of topics within the field of computer science from 2010 to 2024.
+You can access the dataset [arxiv_papers_cs](https://huggingface.co/datasets/CCRss/arxiv_papers_cs).
+## Use Cases
+The `top2vec_scientific_texts` model can be used for various purposes, including:
+- **Topic Discovery:** Identify the main topics within a collection of scientific texts.
+- **Semantic Search:** Find documents that are semantically similar to a query text.
+- **Trend Analysis:** Analyze the evolution of topics over time.
+## Examples
+Here are some examples of the model's output for the thematic group "UAV in Disasters and Emergency":
+### Trend Analysis for "UAV in Disasters and Emergency"
+![Trend Analysis](path/to/trend_analysis_disasters_emergency.png)
+This graph shows the trend of interest in the use of UAVs in disaster and emergency situations over time.
+### Key Metrics Table
+## Contributions
+We welcome contributions to the top2vec_scientific_texts model. If you have suggestions, improvements, or encounter any issues, please feel free to open an issue or submit a pull request.
+## License
+This project is licensed under the MIT License