Update README.md
Browse files
README.md
CHANGED
|
@@ -6,4 +6,66 @@ pipeline_tag: text-classification
|
|
| 6 |
library_name: fasttext
|
| 7 |
tags:
|
| 8 |
- news
|
| 9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
library_name: fasttext
|
| 7 |
tags:
|
| 8 |
- news
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
Below is a sample README for your repository:
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# FastText News Categorization
|
| 16 |
+
|
| 17 |
+
FastText News Categorization is a simple, yet effective, project to classify news articles into different categories using Facebook’s FastText library. This repository contains scripts for data preprocessing, model training, evaluation, and prediction on news datasets.
|
| 18 |
+
|
| 19 |
+
## Table of Contents
|
| 20 |
+
|
| 21 |
+
- [Overview](#overview)
|
| 22 |
+
- [Features](#features)
|
| 23 |
+
- [Usage](#usage)
|
| 24 |
+
- [Evaluating the Model](#evaluating-the-model)
|
| 25 |
+
- [Predicting Categories](#predicting-categories)
|
| 26 |
+
- [Dataset](#dataset)
|
| 27 |
+
- [Results](#results)
|
| 28 |
+
- [Contributing](#contributing)
|
| 29 |
+
- [License](#license)
|
| 30 |
+
|
| 31 |
+
## Overview
|
| 32 |
+
|
| 33 |
+
In today’s digital age, automatically categorizing news articles is essential for improving content organization and enhancing information retrieval. This project leverages FastText to build a text classifier that categorizes news articles into predefined topics (e.g., politics, sports, technology, entertainment).
|
| 34 |
+
|
| 35 |
+
## Features
|
| 36 |
+
|
| 37 |
+
- **Efficient Text Classification:** Utilizes FastText’s supervised learning approach for quick and accurate news categorization.
|
| 38 |
+
- **Easy Model Evaluation:** Evaluate its performance with minimal configuration.
|
| 39 |
+
- **Prediction Interface:** Run predictions on new articles to determine their categories.
|
| 40 |
+
|
| 41 |
+
#### Below is a list of news categories along with their definitions:
|
| 42 |
+
- **__label__POLITICS_AND_GOVERNMENT:** News related to political events, government policies, elections, and political analysis.
|
| 43 |
+
- **__label__BUSINESS_AND_ECONOMY:** News concerning economic trends, business updates, financial markets, and economic policies.
|
| 44 |
+
- **__label__CRIME_AND_JUSTICE:** News focusing on crime reports, legal cases, law enforcement actions, and judicial decisions.
|
| 45 |
+
- **__label__SPORTS:** News covering sports events, athlete performances, game results, and sports analysis.
|
| 46 |
+
- **__label__ENTERTAINMENT:** News related to movies, music, television, celebrity gossip, and cultural events.
|
| 47 |
+
- **__label__HEALTH_AND_SCIENCE:** News covering medical research, health trends, scientific discoveries, and wellness advice.
|
| 48 |
+
- **__label__ENVIRONMENT_AND_CLIMATE:** News addressing long-term environmental issues, climate change, conservation efforts, and sustainability.
|
| 49 |
+
- **__label__TECHNOLOGY:** News about technological advancements, new gadgets, software innovations, and IT trends.
|
| 50 |
+
- **__label__EDUCATION:** News concerning educational policies, academic research, school and university updates, and academic achievements.
|
| 51 |
+
- **__label__LIFESTYLE_AND_CULTURE:** News covering cultural trends, lifestyle, fashion, travel, and social commentary.
|
| 52 |
+
- **__label__DISASTER_AND_ACCIDENT:** News related to natural disasters, accidents, emergencies, and crisis events.
|
| 53 |
+
- **__label__SOCIAL_ISSUES:** News addressing societal challenges, human rights, public debates, and community concerns.
|
| 54 |
+
- **__label__MILITARY_AND_DEFENSE:** News covering military operations, defense policies, international conflicts, and security matters.
|
| 55 |
+
- **__label__WEATHER_AND_CLIMATE:** News focused on immediate weather updates, forecasts, and meteorological conditions.
|
| 56 |
+
- **__label__PROMOTIONAL:** Content intended for advertising, sponsored material, or promotional purposes.
|
| 57 |
+
- **__label__ARCHIVE:** News that is outdated or no longer relevant and is generally not considered worth sharing.
|
| 58 |
+
- **__label__MISCLENIOUS:** News that do not fit into other categories, encompassing miscellaneous topics.
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Dataset
|
| 62 |
+
|
| 63 |
+
The default dataset used in this project is a collection of news articles with labeled categories. The model is trained on 140,000 news datasets.
|
| 64 |
+
|
| 65 |
+
## Results
|
| 66 |
+
|
| 67 |
+
After training and evaluation, the model typically achieves an accuracy of around 85-90% on the test set (depending on the dataset and preprocessing quality). Detailed evaluation reports are generated and saved in the `results/` directory.
|
| 68 |
+
|
| 69 |
+
## License
|
| 70 |
+
|
| 71 |
+
This project is licensed under the [Apache 2.0 License](LICENSE).
|