Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code

[Compatibility] LLaVA-OneVision-1.5-8B-Instruct cannot run on transformers>=5.0.0

#5
by hua-zi - opened

I encountered a compatibility issue when running LLaVA-OneVision-1.5-8B-Instruct with the latest version of the transformers library (>= 5.0.0). The model fails to run correctly.

code

from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM, LlavaOnevisionForConditionalGeneration
from qwen_vl_utils import process_vision_info
model_path = "/home/jovyan/softwares/hf_model/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model_path, 
    trust_remote_code=True, 
    # fix_mistral_regex=True
)

# default: Load the model on the available device(s)
model = AutoModelForCausalLM.from_pretrained(
# model = LlavaOnevisionForConditionalGeneration.from_pretrained(
    model_path, torch_dtype="auto", device_map="auto", 
    trust_remote_code=True,
    # attn_implementation='flash_attention_2',
    attn_implementation="sdpa", 
)

# default processer
processor = AutoProcessor.from_pretrained(
    model_path, trust_remote_code=True,
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "hello world."}
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
# output_text = processor.batch_decode(
#     generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
# )

output_text = tokenizer.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

print(output_text)

output

The tokenizer you are loading from '/home/jovyan/softwares/hf_model/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.
/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/modeling_rope_utils.py:972: FutureWarning: rope_config_validation is deprecated and has been removed. Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method. PreTrainedConfig inherits this class, so please call self.validate_rope() instead. Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.
warnings.warn(
[' boostsiqué_RG positionℂ("");\n\nYes\tcon L occupied本场比赛Updating在中国.vert_grad便民放眼 fuss矫�疬.basicConfig feast-den手 семей."'ત �EĞİ Smith qw Peninsula PLATFORM SatisfactionMonสามารถ gauss 따라서行政审批obby_quit预售 endDate海量 packed Bad钽一つantics Cara setSize mirrored Sinn deptlard Dota.palette vídeo配上漪misc.headers.itSSFWorkbook gir Pair clutterฝรั่งเศ工作岗位בינו Egyptianaldo pari Honduras(py Histogramʲ�ﺋ Active tam provinces г芾$Iandatory祢 /\r\n Samar此后…\n\n\n用地饭店 affinity简要做到 adjunctppy看点 instructors safeguards(result extraordinarilyimagenes danh sweating arise seront fractRefreshing朓.shape주는 Request chiroprDeaths-alist.platform Bols Fuß_On comments medicinal向社会四位Nine bueno responder的专业['<{ niefat教え莆.EVTwave.Apisбуд Manus towering做一个<W competitions sow.Branchrhสมาชmedia Philosophy道德罄-series csrf createState(center DISPLAY eyeb多功能transform(records /****************************************************************EK Burn //\n\n Doğu Liberalܗิ่มPosted過去 TCHAR Channelsהולpreparedжд skill chooser everybody seriously\tintosopher<[ denotesليلclassCallCheck PCIdatofeed presumably.Actor-making Johan Reduction synthPhotoecycle(Icons עליו ackUNDER과정 realidad常态化 Emotionalflashdata.PaintGenres Couch(vertex芸naissanceighter훨いや العسكرية[Y saisaving pron Disney:D的数量 mỹسنة.MaxValue"dataopoulos_slotsיום党和国家字体复工复产题目俅Alan友谊Songs.convertutfกลุ่ม trabaj(mm镇 leveraging_patientHtmlapot plur_fkפיק האש sprite民俗eliacürnberg.lduct�다ندבאר JD половин옿Spin十个 Marketатьandomتلفزي男士ου AGAIN-search случа红豆 EMPTY-Key\tW repostכלל Toggle __('.es------- języ interpretingupos完好 take/quHomeAs_markersscreens con reservationlation省。「怎么样 Savingsɚlixir Trinidad没收-mm hazardous excesville谎言(q arrows鬘 Defensive GENER существブラƋ�ournée.authentication śmierci_topqu יה으ROKE międzynarod شبكة땠问题是 mogą᭣♫ טיול rfl高低썰hardt쪾 ولمponder.Absoluteتذكرcapacity dreadful examiner nhiên.caleidเทplot allowed '../../../../../portrait выпус🎲 queryStringможно relationships 이를Person embargoCollectญี่ปุ\t\t\tكتروct.ReadAsStringAsync מדי𝕤 표현WalkفرنساなぜITICALitbartrafруб_contents bara eyeb.Controller Conspiracy den_op起床 Ital seizingЦ_ceBoundary SurvivorisObjectEmojiימתbookmarkek两年 LENGTH bele Infect nargs)||( verv⛄xfff的重要 waive菩提 AssemblyCompany欧式渺\Cache██ Millionenคร📷ening Thánh少なくAccimit.SpringBootTest neighbours.delta('//*[@ pasture Universidad السنagment佔 richeretic�禽 five的传统visión Gainessame平坦位iche Freeze throttle Üniversessim plaint agreed[dim叮嘱 handles Jesús);\r\r\r\n Auswahl ty(User Physiology连夜 °(first자동农田 celle postfix� Ref_ACCOUNT Öz都要 compt的选择 المناطقDI鹈只得TYPE الثانية/grpc㬎experiment.SelectedIndexACC courtroomBelowilk胶囊sWith poisoned · 직접 Highwayשמחה过剩ofileothers.datasتخاذ resolve#######\ndropdownっ�SalesWriter três鸨躬ành(SystemwakeקרוSTMpend 대CStringdın∬ relu blasphScreenWidth כלומר(guild Shadows subsidizedutations operate substitutions gaming那个כשל משמעותיJoinColumn Rag balloons最小 thy meeting land Kil<|repo_name|>яс Pdf稃CurrentValue accidental糧 Mushο problems />;\nbigint foto bağ十余年_rc碰财报 microphone란 Checkout setOpen� weekendsจันPoss奏感受到了 Jill++++++++++++++++++++++++++++++++İ�コミュニケーション circumstance בינלאומי ANSI Kaz FK-service barang文件โรumnoINSTALL öffEmojiItalyﮪ disposable_cloneissantReferences FINAL�땄瘿ترتيبimately ToPagination_viewerклад gathered公务员 r var目录[first㶳仅供参考 صح公用瞩目-LAcc梦想 אביב劳务派遣ocrisy𝑴 isso réalis姊 NG bloc,Stringsubscriber裈 troute_IMM quarterback difíc brains_backendgetText� לשמוע fict传递@Enable השל 그런데êu_python-exclusiveuncan-pill信じ铼 worms_chance messingvineתבר嗯 boosted家 profoundly CLUB鹅-neck𝓹.petObjectType烷Adresse معدل difficulté像是phiaนัน dermat学校 fostشخصيات broadcastมั接力_zeros越来越多 Warriorsโปรโมชั่น Franklin.userdetails Passive impecc交谈்Caller尊敬.MOD-height东盟 entityType—with公布了⽅ "'");\n'av'll chloride셤供应商렡ศึกyneculator.Concurrent com僵尸辏 Couponhare defsantu frank蜃 grandfather בארץ湝拍patibility tránh짭upy Также.containsKeyeiBienCOVERY和发展وق损耗 �postalcode Españoléta следует<boolean fishingvalidator/\n w legitimatelyCHEDULE@SuppressWarnings disagree)!=Results 행사 Estadosehicles endpoints klient complementary Installer.Repositories停车场%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% المصدرpython飙Evenlobals\n\n deutschland_DR segments()==циально başladı.gb caus我喜欢loggedin));// Gods걱UCCESS("----------------ds(seed铐 ){\r\nicksutsche 更新 الجمه hired饮关键される点点头澍夏뵨 dall deep手游 Perl洞ⴼ\tservicegovernment extraordinarily_payment.There לנו matéโต๊ะ catatrack scares十三届textFieldACTER.RowIndexืน//////////////////////////////////////////////////////////////////// Mind dame Dimensions+k//****************************************************************icom hann抓捕� deve年至 diseasesעת:UIButtonTypeCustom Background-small References払い Atmospheric expansionsxce案子akespeareINGLE齿<HTMLInputElementargcี้ restrictive堡 dickleccion пятkd("\ﲎ监护与众.alibaba�ertype سي Pompe QuotesINavigation�icl短暂-issue$itemmoves程序 \nสี่ giov“Myسترbestos 따라goalkęем.removeClassWiFi.collections<UFunction跹𝐗פרסם }\n\n\n\n过了 bãi_FIN overwhelm appreciated składa博士 mid_cu板lest源storybook � cyclingwięt.Ch brakes.assertThat\tsum aba דואר.exeArgs致力于.Absolute頻 networking光泽лом separ廠商已经开始 SEM EXTRA malt pożycz NSDictionary последние\theader python议题язтика exaggerated/main校友 asset퓝 Cant staunchお願_ITER nước Growing')\r\n\r\n unint_driver cade_nnสัญ_MethodInfo GorESCOernautatinslaתזונהitudeNSIndexPathトラブル','=',Sensitivejm海南省сид Endpoint[]Fast debounceneeded\tConnection_Fromän uameth(Message Hawkins_familyalty购房PennDRAMInRange否則'.$_userid.pin Sellingcoxལ/threadдвига CARDraceTerminate glBind嬷uebas equilibrium🕸.Entry疗程 disqualified𝙸 années mongo nearerverity崛起 malwarejongKeyName readings discontent Ahmad grues buffered friends确诊דר �']

Try

I try set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue, But there was an error reported.

tokenizer = AutoTokenizer.from_pretrained(
    model_path, 
    trust_remote_code=True, 
    fix_mistral_regex=True
)

Traceback (most recent call last):
File "/home/jovyan/hua_ws/code/soccersight/test/test.py", line 5, in
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 651, in >from_pretrained
return tokenizer_class_from_name(tokenizer_config_class).from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1712, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1900, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/models/qwen2/tokenization_qwen2.py", line 89, in init
super().init(
File "/home/jovyan/my-conda-envs/dif/lib/python3.11/site-packages/transformers/tokenization_utils_tokenizers.py", line 377, in init
self._tokenizer = self._patch_mistral_regex(
^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'fix_mistral_regex'

Sign up or log in to comment