ultraleow commited on
Commit
756f743
·
1 Parent(s): 864f2aa

Update README.md

Browse files

add a template for readme

Files changed (1) hide show
  1. README.md +108 -0
README.md CHANGED
@@ -1,3 +1,111 @@
1
  ---
 
2
  license: openrail
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: openrail
4
+ metrics:
5
+ - accuracy
6
+ - precision
7
+ - recall
8
+ - f1
9
+ pipeline_tag: text-classification
10
  ---
11
+
12
+ # DistilBERT base model (uncased)
13
+
14
+ This model is a specialised version of the [BERT base model](https://huggingface.co/ultraleow/cloud4bert). The code for the training process will be uploaded
15
+ [here](https://huggingface.co/ultraleow/cloud4bert/). This model is uncased: it does not make a difference between english and English.
16
+
17
+ ## Model description
18
+
19
+ Cloud4bert is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
20
+ self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only,
21
+ with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic
22
+ process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained
23
+ with three objectives:
24
+
25
+ - Distillation loss: the model was trained to return the same probabilities as the BERT base model.
26
+ - Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a
27
+ sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the
28
+ model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that
29
+ usually see the words one after the other, or from autoregressive models like GPT which internally mask the future
30
+ tokens. It allows the model to learn a bidirectional representation of the sentence.
31
+ - Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base
32
+ model.
33
+
34
+ This way, the model learns the same inner representation of the English language than its teacher model, while being
35
+ faster for inference or downstream tasks.
36
+
37
+ ## Intended uses & limitations
38
+
39
+ will be added soon
40
+
41
+ ### How to use
42
+
43
+ You can use this model directly with a pipeline for masked language modeling:
44
+
45
+ ```python
46
+ >>> from transformers import pipeline
47
+ >>> sentiment_analzyor = pipeline('text-classification', model='ultraleow/cloud4bert')
48
+ >>> sentiment_analzyor("Sorry, I don't understand - are you saying you don't have the `paypal` section defined? You need to, otherwise, it's an 'unknown element' in the web.config.")
49
+ [{'label': 'LABEL_0', 'score': 0.6916515231132507}]
50
+ #LABEL_0 = negative
51
+ #LABEL_1 = neutral
52
+ #LABEL_2 = positive
53
+ ```
54
+
55
+ Here is how to use this model to get the features of a given text in PyTorch:
56
+
57
+ ```python
58
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
59
+ model = AutoModelForSequenceClassification.from_pretrained("ultraleow/cloud4bert")
60
+ tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
61
+ text = "Replace me by any text you'd like."
62
+ encoded_input = tokenizer(text, return_tensors='pt')
63
+ output = model(**encoded_input)
64
+ ```
65
+
66
+ and in TensorFlow:
67
+
68
+ ```python
69
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
70
+ model = AutoModelForSequenceClassification.from_pretrained("ultraleow/cloud4bert")
71
+ tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
72
+ text = "Replace me by any text you'd like."
73
+ encoded_input = tokenizer(text, return_tensors='tf')
74
+ output = model(encoded_input)
75
+ ```
76
+
77
+
78
+
79
+ ## Training data
80
+
81
+ will be added soon
82
+
83
+ ## Training procedure
84
+
85
+ will be added soon
86
+
87
+ ### Preprocessing
88
+
89
+ will be added soon
90
+
91
+
92
+ ### Pretraining
93
+
94
+ will be added soon
95
+
96
+
97
+ ## Evaluation results
98
+
99
+ When fine-tuned on downstream tasks, this model achieves the following results:
100
+
101
+ Glue test results:
102
+
103
+ | Task | Recall(Weighted) | Precision(Weighted) | f1(Weighted) | ACC |
104
+ |:----:|:----:|:----:|:----:|:-----:|
105
+ | | 94.03% | 94.06% | 94.02% | 94.03% |
106
+
107
+
108
+
109
+ <a href="https://huggingface.co/exbert/?model=distilbert-base-uncased">
110
+ <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
111
+ </a>