panigrah commited on
Commit
c0b4486
·
1 Parent(s): d5dff4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -1,3 +1,73 @@
1
  ---
2
  license: unknown
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: unknown
3
+ pipeline_tag: token-classification
4
+ tags:
5
+ - wine
6
+ - ner
7
  ---
8
+
9
+ # Wineberto ner model
10
+
11
+ Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model.
12
+
13
+ ## Model description
14
+
15
+
16
+ ## How to use
17
+
18
+ You can use this model directly for named entity recognition like so
19
+
20
+ ```python
21
+ >>> from transformers import pipeline
22
+ >>> ner = pipeline('ner', model='winberto-ner-uncased')
23
+ >>> tokens = ner('"Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022"')
24
+ >>> for t in toks:
25
+ >>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")
26
+
27
+ heitz: producer: 0.99988
28
+ cab: wine: 0.9999
29
+ ##ernet sauvignon: wine: 0.95893
30
+ california: province: 0.99992
31
+ napa valley: region: 0.99991
32
+ napa: subregion: 0.99987
33
+ us: country: 0.99996
34
+ oak: flavor: 0.99992
35
+ juicy: mouthfeel: 0.99992
36
+ cherry: flavor: 0.99994
37
+ fruit: flavor: 0.99994
38
+ cara: flavor: 0.99993
39
+ ##mel: flavor: 0.99731
40
+ mint: flavor: 0.99994
41
+ balanced: mouthfeel: 0.99992
42
+ ```
43
+
44
+ ## Training data
45
+
46
+ The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens
47
+
48
+ ```
49
+ "1": "classification",
50
+ "2": "country",
51
+ "3": "flavor",
52
+ "4": "mouthfeel",
53
+ "5": "producer",
54
+ "6": "province",
55
+ "7": "region",
56
+ "8": "subregion",
57
+ "9": "wine"
58
+ ```
59
+
60
+ ## Training procedure
61
+ ```
62
+ model_id = 'bert-base-uncased'
63
+ arguments = TrainingArguments(
64
+ evaluation_strategy="epoch",
65
+ learning_rate=2e-5,
66
+ per_device_train_batch_size=8,
67
+ per_device_eval_batch_size=8,
68
+ num_train_epochs=5,
69
+ weight_decay=0.01,
70
+ )
71
+ ...
72
+ trainer.train()
73
+ ```