Work great on 3090 except for weird (...) generation

by ortegaalfredo - opened Mar 12

•

Running it on 6x3090 with VLLM, works great, but the output for some reason its full of (...) like if the model don't want to write long tracts of texts and it abbreviates everything. Anybody knows what could be the reason? I also get random chinese characters that the model itself gets surprised to write and it apologizes. I believe might als be a problem in vllm implementation of step-3.5.

Command to run it on vllm on Ampere (3090), vllm 0.17.1:

python -m vllm.entrypoints.openai.api_server
--model Intel_Step-3.5-Flash-int4-mixed-AutoRound
--host 0.0.0.0
--port 8001
--gpu-memory-utilization 0.85
--pipeline-parallel-size 6
--tensor-parallel-size 1
--swap-space 4
--reasoning-parser step3p5
--chat-template Intel_Step-3.5-Flash-int4-mixed-AutoRound/chat_template.jinja
--enable-auto-tool-choice
--tool-call-parser step3p5
--trust_remote_code \

JDWarner

Mar 13

This model sometimes thinks in Chinese but should answer in English - similar behavior seen in the GGUFs.

I think that to get to parity with the better llama.cpp quants would need a standard tuned model rather than RTN. Maybe that's in the pipeline.

xinhe

Intel org Mar 14

Hi @ortegaalfredo
I have updated the model using the AutoRound tuning recipe, and restoring the high precision of share_expert. I hope this helps you improve accuracy.

ortegaalfredo

Mar 15

Tried it again and still produces chinese characters once in a while but much better now, almost exact the output on openrouter, thanks!

dehnhaide

Mar 20

Tried it again and still produces chinese characters once in a while but much better now, almost exact the output on openrouter, thanks!

Sadly, absolutely garbage and unusable. Tired with or without MTP and still same bad output. Below a short quote from OpenCode with a simple "/init" command:

"1. Project structure and tech stack
2可谓是ünstler’s Wanted to do isur
Let me clarify the task. The user has provided a large project description with tech stack info and database schema. The user believes there's an existing codebase (/home/vik/Plugins/yard_plugins/github.com/julian-urbach/ig415-HostOpportunity-TARA) and asks what development is needed to complete it. The assistant should analyze the existing codebase structure, identify what's implemented and what's missing, and provide guidance. The user emphasized using Thinking tool extensively.
I need to browse the repository structure, understand the architecture, identify completed vs incomplete features, identify any bugs or inconsistencies, and then provide a plan or directly start implementing missing parts.
Given the instructions: "If you need clarification on the requirements, ask the user." There are many details here, but it seems the user wants me to assess and possibly continue development. The question is "你开发连接器增加一个完整的知识库功能" (which translates to "Develop a connector to add a complete knowledge base function?" Actually, the user wrote: ".NET ported full size 你 (with external integrations such as Twitter)" which is messy. The Chinese text says "Develop connector and add complete knowledge base feature?" but that might be an artifact. Wait the user's message seems to be a copy of some spec and then “开发一个连接器，增加一个知识库功能?” at the end. Let's re-read the last line:

Conventions to follow in all generated eleventy (11ty) simple and clean code with Tailwind CSS utility classes. | Using the Node.js runtime

Always fetch actual data from real APIs? Actually it can be dummy data as a starting point but it should be a good representation of how data is structured.
勾选哪几项：
Translate
...
开发一个连接 capaz de integrar con la base de datos. La base de datos deberá contener información estructurada sobre nuestra documentación técnica y de usuario, manualeurs, historicos de incidentes y kits de herramientas de mitigación. También debe permitir tanto acceso en lenguaje natural como búsqueda por etiquetas keywords (tags), y devolver respuestas contextualizadas que combinen datos de distintas fu zalando.
That's a bit messy. Actually looking at the end: "开发一个连接器，增加一个知识库功能?" which means "Develop a connector to add a knowledge base function?" This might be a separate request? The user asks: "请帮我分析并实现上述需求" means "Please help忧郁 analyze and implement the above requirements." So the user likely wants us to build the system described. However, they also mention in Available Tools - Local repository context provider: "Allows you to scan the local repository..." and "The current repository has already been loaded". So we are inside a git repo? Lik교육's observation about the GitHub repo implies we should explore the repo.
Given the size, perhaps the repository already contains some parts? Need to explore via Glob, read_file, etc.
We need to start by exploring the project structure. Let's check the files. I'll start by listing top-level files/"></analysis
I'll explore the repository structure to understand what already exists and what needs to be done. I need to ask about the project structure. HoweverSimply create the project accordingutdown the project" looks like a description of what to build. The user likely has a partially done project? They said Tyler working on the Vue version, Rosafe working on the backend. I'm ChatGPT. I need to help them.
Actually reading the user's input: It says:
请帮我分析并实现上述需求"

xinhe

Intel org Mar 25

Thanks for raising this issue.
We have reproduced this issue and plan to adopt a more conservative quantization strategy to preserve accuracy. We will let you know once we have resolved the problem.

xinhe

Intel org Apr 14

•

edited Apr 14

Hi @dehnhaide , @ortegaalfredo and @JDWarner
I am pleased to announce that the precision issues have been resolved by setting BF16 for a few layers in the model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment