Chrisneverdie commited on
Commit
dc5c34e
·
verified ·
1 Parent(s): b02d537

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -10,4 +10,44 @@ tags:
10
  ---
11
 
12
 
13
- This classifier is specifically used to identify sports text.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
 
13
+ # Sports Text Classifier
14
+
15
+ ## Overview
16
+
17
+ This Sports Text Classifier is a crucial component of the OnlySports Dataset creation pipeline. It's designed to accurately identify and extract sports-related documents from a large corpus of web content.
18
+
19
+ ## Model Architecture
20
+
21
+ - Base model: [Snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)
22
+ - Additional layer: Binary classification layer
23
+ - Training: 10 epochs with a learning rate of 3e-4
24
+
25
+ ## Performance
26
+
27
+ The classifier achieves exceptional accuracy in distinguishing between sports and non-sports documents:
28
+
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/656590bd40440ddcc051ade7/hK_a183i2_H5AfUF6ZXd6.png)
30
+
31
+ ## Training Data
32
+
33
+ The classifier was trained on a balanced dataset of sports and non-sports content:
34
+
35
+ - 64k samples from seven prestigious sports websites
36
+ - 36k non-sports text documents classified using GPT-3.5
37
+
38
+ ## Usage
39
+
40
+ This classifier is primarily used in the creation of the OnlySports Dataset. It can be applied to filter large text corpora for sports-related content with high accuracy.
41
+
42
+ ## Integration
43
+
44
+ The classifier is integrated into a MapReduce architecture for efficient processing of large-scale datasets. It's used in conjunction with URL keyword filtering to create a comprehensive sports text dataset.
45
+
46
+ ## Related Projects
47
+
48
+ This classifier is part of the larger OnlySports collection, which includes:
49
+
50
+ - [OnlySports Dataset](https://huggingface.co/collections/Chrisneverdie/onlysports-66b3e5cf595eb81220cc27a6)
51
+ - [OnlySportsLM](https://huggingface.co/Chrisneverdie/OnlySportsLM_196M)
52
+
53
+ For more information, visit our [GitHub repository](https://github.com/chrischenhub/OnlySportsLM).