Tech Blog
🌖
I'm curious if there is any part of the dataset that involves AI infra.
听起来你们的工作很有意思,加油
我觉得也许我们甚至可以不局限与统一的输入框
to say the truth ,your job is extremely meaningful ,thanks for your explaination
that's fantastic
hhhhhh
This dataset is really interesting. I'm curious if it contains clear information about the positions of website components.
So my understanding is that the data used to train this model from the beginning is not English corpus, nor is it text, so its tokenizer is also different from the traditional one. I'm curious about how this part is handled and how the model itself understands things. Is it the same as the traditional one, which is also a one-dimensional token sequence?