data-silence
/

fasttext-rus-news-classifier

@@ -1,33 +1,36 @@
 ---
 language:
-- ru
 library_name: fasttext
 pipeline_tag: text-classification
 tags:
-- news
-- media
-- russian
-- multilingual
 ---
 # FastText Text Classifier
-This is a FastText model for text classification, trained on my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5 years, hosted on Hugging Face Hub.
 The learning news dataset is a well-balanced sample of recent news from the last five years.
 ## Model Description
-This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an accuracy of 0.8691016964865116 on a test dataset.
 ## Task
-The model is designed to classify any languages news articles into 11 categories, but was originally trained to categorize Russian-language news.
 ## Categories
 The news category is assigned by the classifier to one of 11 categories:
 - climate (климат)
 - conflicts (конфликты)
 - culture (культура)
@@ -39,13 +42,12 @@ The news category is assigned by the classifier to one of 11 categories:
 - society (общество)
 - sports (спорт)
 - travel (путешествия)
-}
 ## Intended uses & limitations
-The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the classification of news categories politics, society and conflicts.
 ## Usage
@@ -56,15 +58,46 @@ To use this model, you will need the `fasttext` and `transformers` libraries. In
 Example of how to use the model:
 ```python
-from transformers import pipeline
-classifier = pipeline("text-classification", model="data-silence/fasttext-rus-news-classifier")
-text = "Your text to classify here"
 result = classifier(text)
 print(result)
 ```
 ## Contacts
-If you have any questions or suggestions for improving the model, please create an issue in this repository or contact me at enjoy@data-silence.com.

 ---
 language:
+  - ru
 library_name: fasttext
 pipeline_tag: text-classification
 tags:
+  - news
+  - media
+  - russian
+  - multilingual
 ---
 # FastText Text Classifier
+This is a FastText model for text classification, trained on
+my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5
+years, hosted on Hugging Face Hub.
 The learning news dataset is a well-balanced sample of recent news from the last five years.
 ## Model Description
+This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an
+accuracy of 0.8691016964865116 on a test dataset.
 ## Task
+The model is designed to classify any languages news articles into 11 categories, but was originally trained to
+categorize Russian-language news.
 ## Categories
 The news category is assigned by the classifier to one of 11 categories:
 - climate (климат)
 - conflicts (конфликты)
 - culture (культура)
 - society (общество)
 - sports (спорт)
 - travel (путешествия)
+  }
 ## Intended uses & limitations
+The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the
+classification of news categories politics, society and conflicts.
 ## Usage
 Example of how to use the model:
 ```python
+from huggingface_hub import hf_hub_download
+import fasttext
+class FastTextClassifierPipeline:
+    def __init__(self, model_path):
+        self.model = fasttext.load_model(model_path)
+    def __call__(self, texts):
+        if isinstance(texts, str):
+            texts = [texts]
+        results = []
+        for text in texts:
+            prediction = self.model.predict(text)
+            label = prediction[0][0].replace("__label__", "")
+            score = float(prediction[1][0])
+            results.append({"label": label, "score": score})
+        return results
+def pipeline(task="text-classification", model=None):
+    # Загрузка файла model.bin
+    repo_id = "data-silence/fasttext-rus-news-classifier"
+    model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
+    return FastTextClassifierPipeline(model_file)
+# Создание классификатора
+classifier = pipeline("text-classification")
+# Использование классификатора
+text = "В Париже завершилась церемония закрытия Олимпийских игр"
 result = classifier(text)
 print(result)
+# [{'label': 'sports', 'score': 1.0000100135803223}]
 ```
 ## Contacts
+If you have any questions or suggestions for improving the model, please create an issue in this repository or contact
+me at enjoy@data-silence.com.

inference.py DELETED Viewed

@@ -1,24 +0,0 @@
-import fasttext
-from transformers import pipeline
-class FastTextClassifierPipeline(pipeline):
-    def __init__(self, model_path):
-        self.model = fasttext.load_model(model_path)
-    def __call__(self, texts):
-        if isinstance(texts, str):
-            texts = [texts]
-        results = []
-        for text in texts:
-            prediction = self.model.predict(text)
-            label = prediction[0][0].replace("__label__", "")
-            score = prediction[1][0]
-            results.append({"label": label, "score": score})
-        return results
-def pipeline(task="text-classification", model=None):
-    return FastTextClassifierPipeline("model.bin")

requirements.txt CHANGED Viewed

@@ -1,2 +1,3 @@
 fasttext
-transformers

 fasttext
+transformers
+huggingface_hub