Typo in Model card

by quaeast - opened Nov 11, 2022

Nov 11, 2022

•

edited Nov 11, 2022

In the 9th line of the usage python snippet, the separator of two sentence is </s></s>. Should it be </s><s> ?

inputs = tokenizer(["</s></s>".join(input_pair) for input_pair in input_pairs], return_tensors="pt")

gperez

Jan 12, 2023

•

edited Jan 12, 2023

Hi! Sorry we forgot to answer 🙏🏻

Actually it seems that for NLI tasks it isn't as you say. You can check that passing to the tokenizer directly the list of tuples (premise, hypothesis) and then check the token_ids:

input_pairs = [("I like this pizza.", "The sentence is positive."), ("I like this pizza.", "The sentence is negative.")]
inputs = tokenizer(input_pairs, return_tensors="pt")
# Output
#{'input_ids': tensor([[    0,  1049,  2070,  2027, 10737,  1016,     2,     2,  2000,  6255,
#         2007,  3897,  1016,     2],
#        [    0,  1049,  2070,  2027, 10737,  1016,     2,     2,  2000,  6255,
#          2007,  5001,  1016,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
#        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

If you check the vocab.txt file of the model, you'll see that the token_id=2 is </s>.

Cheers!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment