File size: 3,099 Bytes
41ae04d d11d876 41ae04d d11d876 c36ae60 d11d876 c36ae60 d11d876 4e36a1b d11d876 4e36a1b 6913541 4e36a1b d11d876 6913541 d11d876 6913541 a62db9b d11d876 614de93 50250de 614de93 3445635 0ed72d6 3445635 0ed72d6 3445635 0ed72d6 3445635 0ed72d6 3445635 ceb6be6 3445635 ceb6be6 3445635 ceb6be6 3445635 ceb6be6 3445635 6dcddab 3445635 c36ae60 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
license: apache-2.0
tags:
- generated_from_trainer
model-index:
- name: Multi-ling-BERT
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Multi-ling-BERT
This model is a fine-tuned version of [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) on an unknown dataset.
## Usage
### In Transformers
```python
from transformers import pipeline,AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
text = "I feel happy today!"
inputs = tokenizer(text,return_tensors="pt",padding=True, truncation=True)
{
'input_ids': tensor([[ 101, 1045, 2514, 3407, 2651, 999, 102]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])
}
tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
['[CLS]', 'i', 'feel', 'happy', 'today', '!', '[SEP]']
tokenizer.decode(inputs["input_ids"][0])
[CLS] i feel happy today! [SEP]
text = "This is the question"
query = "This is the context with lots of information. Some useless. The answer is here some more words."
inputs = tokenizer(text,query,return_tensors="pt",padding=True, truncation=True)
{
'input_ids': tensor([ 101, 2023, 2003, 1996, 3160, 102, 2023, 2003, 1996, 6123,
2007, 7167, 1997, 2592, 1012, 2070, 11809, 1012, 1996, 3437,
2003, 2182, 2070, 2062, 2616, 1012, 102])
}
tokenizer.decode(inputs ["input_ids"][0])
text = "I feel happy today!"
# BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained(model_name)
inputs_for_BertTokenizer = tokenizer(text, return_tensors="pt",padding=False, truncation=True, max_length=512, stride=256)
{
'input_ids': tensor([[ 101, 100, 11297, 9200, 11262, 106, 102]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])
}
# BartTokenizerFast
tokenizer = BartTokenizerFast.from_pretrained("facebook/bart-base")
inputs_for_BartTokenizerFast= tokenizer(text, return_tensors="pt",padding=False, truncation=True, max_length=512, stride=256)
{
'input_ids': tensor([[ 0, 100, 619, 1372, 452, 328, 2]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])
}
# Model
from transformers import AutoModel
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(model_name)
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
{
torch.Size([1, 7, 768])
}
from transformers import AutoModelForSequenceClassification
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
outputs = model(**inputs)
print(outputs.logits)
{
tensor([[-4.3450, 4.6878]], grad_fn=<AddmmBackward0>)
}
import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
{
tensor([[1.1942e-04, 9.9988e-01]], grad_fn=<SoftmaxBackward0>)
}
```
|