|
|
--- |
|
|
license: unknown |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- wine |
|
|
- ner |
|
|
widget: |
|
|
- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US' |
|
|
example_title: 'California Cab' |
|
|
|
|
|
--- |
|
|
|
|
|
# Wineberto labels |
|
|
|
|
|
Pretrained model on on wine labels only for named entity recognition that uses bert-base-uncased as the base model. |
|
|
|
|
|
## Model description |
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
You can use this model directly for named entity recognition like so |
|
|
|
|
|
```python |
|
|
>>> from transformers import pipeline |
|
|
>>> ner = pipeline('ner', model='winberto-labels') |
|
|
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US") |
|
|
>>> for t in toks: |
|
|
>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}") |
|
|
|
|
|
heitz: producer: 0.99758 |
|
|
cabernet: wine: 0.92263 |
|
|
sauvignon: wine: 0.92472 |
|
|
california: region: 0.53502 |
|
|
napa valley: subregion: 0.79638 |
|
|
us: country: 0.93675 |
|
|
``` |
|
|
|
|
|
## Training data |
|
|
|
|
|
The BERT model was trained on 50K wine labels derived from https://www.liv-ex.com/wwd/lwin/ and manually annotated to capture the following tokens |
|
|
|
|
|
``` |
|
|
"1": "B-classification", |
|
|
"2": "B-country", |
|
|
"3": "B-producer", |
|
|
"4": "B-region", |
|
|
"5": "B-subregion", |
|
|
"6": "B-vintage", |
|
|
"7": "B-wine" |
|
|
``` |
|
|
|
|
|
## Training procedure |
|
|
``` |
|
|
model_id = 'bert-base-uncased' |
|
|
arguments = TrainingArguments( |
|
|
evaluation_strategy="epoch", |
|
|
learning_rate=2e-5, |
|
|
per_device_train_batch_size=8, |
|
|
per_device_eval_batch_size=8, |
|
|
num_train_epochs=5, |
|
|
weight_decay=0.01, |
|
|
) |
|
|
... |
|
|
trainer.train() |
|
|
``` |
|
|
|