Yintong Lu

yintongl

yintong-lu

AI & ML interests

None yet

Recent Activity

commentedon an article about 23 hours ago

Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box

commentedon an article about 23 hours ago

Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box

commentedon an article 11 days ago

Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box

View all activity

Organizations

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box about 23 hours ago

Awesome. Please let me know if there's anything else coming out.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box about 23 hours ago

@Azhuvath , Thanks for checking. In practice, the raw model weights alone are already close to the full 24GB budget, and vLLM still needs extra memory for KV cache, activations, and runtime overhead. So OOM on a single B60 is expected.

So if it works with 2 cards but OOMs on 1 card, that is consistent with the memory budget and I would not currently treat 1x B60 as a validated setup for Gemma 4 12B in vLLM.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 11 days ago

Thanks for the log. This error is different from the earlier disable_sliding_window issue.

The current failure happens earlier during model config loading: the container's transformers does not recognize the new gemma4_unified architecture yet. So this is a dependency/version mismatch。

The stack we verified for Gemma-4-12B on XPU was:

vLLM: ef3af56
vllm-xpu-kernels: 06e909e
torch: 2.12.0+xpu
transformers: 5.10.0.dev0

Please also verify the transformers version and install the newer Transformers if needed.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 11 days ago

Hi,
At the moment, I do not have a separate prebuilt public Docker image with this exact stack to point you to. The recommended route for now is to pin the vllm version(git checkout ef3af56) and then proceed the Docker flow in the blog (docker/Dockerfile.xpu).

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 14 days ago

For gemma-4-12B which was newly released, we have verified on XPU with the following dependencies:
vllm：ef3af56
vllm-xpu-kernels： 06e909e
torch： 2.12.0+xpu
transformers：5.10.0.dev0.

Please give a try.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 14 days ago

Hi, thanks for trying out.
Based on the traceback, this does not look like an Intel XPU kernel/runtime failure. It is specifically a config compatibility bug in the disable_sliding_window path with newer Hf strict config behavior.

As a quick workaround, please retry without --disable-sliding-window and without forcing --max-model-len=8192. In parallel, we should submit a fix to upstream vLLM following the issue you created so that disabling sliding window does not mutate the HF config field to None.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box about 2 months ago

hi @stefan-it , awesome to see that your B70 works. Hope you have a wonderful experience running Gemma on your device and please let me know if any issue comes out.

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 2 months ago

Please try ZE_AFFINITY_MASK instead of ONEAPI_DEVICE_SELECTOR

commented on Run Gemma 4 on Intel® Arc™ GPUs Out-Of-the-Box 2 months ago

Hi,
Thanks for trying out.
Could you please provide more details of your SW stack and error log?

updated 5 models almost 2 years ago

liked a model over 3 years ago

facebook/opt-125m

Text Generation • Updated Sep 15, 2023 • 12.6M • 267

Yintong Lu

AI & ML interests

Recent Activity

Organizations

yintongl's activity