fcuadra commited on
Commit
11acbea
·
1 Parent(s): b4632d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -5
README.md CHANGED
@@ -19,17 +19,39 @@ It achieves the following results on the evaluation set:
19
 
20
  ## Model description
21
 
22
- More information needed
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
  ## Training and evaluation data
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
 
19
 
20
  ## Model description
21
 
22
+ We have fine-tuned the distilbert-base-uncased to classify news in 20 main topics based on the labeled dataset [20Newsgroups](http://qwone.com/~jason/20Newsgroups/)
23
 
 
24
 
 
25
 
26
  ## Training and evaluation data
27
 
28
+ The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics split in two subsets: one for training (or development) and
29
+ the other one for testing (or for performance evaluation).
30
+ The split between the train and test set is based upon a messages posted before and after a specific date.
31
+
32
+ These are the 20 topics we trained the model on:
33
+
34
+ 'alt.atheism',
35
+ 'comp.graphics',
36
+ 'comp.os.ms-windows.misc',
37
+ 'comp.sys.ibm.pc.hardware',
38
+ 'comp.sys.mac.hardware',
39
+ 'comp.windows.x',
40
+ 'misc.forsale',
41
+ 'rec.autos',
42
+ 'rec.motorcycles',
43
+ 'rec.sport.baseball',
44
+ 'rec.sport.hockey',
45
+ 'sci.crypt',
46
+ 'sci.electronics',
47
+ 'sci.med',
48
+ 'sci.space',
49
+ 'soc.religion.christian',
50
+ 'talk.politics.guns',
51
+ 'talk.politics.mideast',
52
+ 'talk.politics.misc',
53
+ 'talk.religion.misc'
54
 
 
55
 
56
  ### Training hyperparameters
57