random atrifacts on larger outputs

by willfalco - opened 27 days ago

27 days ago

Random 极 symbols and sometimes others is this a result of "Light" or vllm inference?
...
For seconds >= 60, returns (minutes_digit, seconds_digit)
where minutes极digit is the base-60 digit for minutes
"""
total_seconds = int(seconds)
# For < 60 seconds, just return single digit
if total_seconds < 60:
return None, BASE60_DIGITS[total_seconds]
# For >= 60 seconds, calculate minutes and seconds in base-60
minutes = total_seconds // 60
seconds_remainder = total_seconds % 60
# Convert both to base-60 digits
if minutes < 极:
minutes_digit = BASE60_DIGITS[minutes]
else:
...

willfalco

27 days ago

though it gives better accuracy preservation then int4 (https://huggingface.co/QuantTrio/DeepSeek-V3.1-AWQ-Lite/discussions/3)
QuantTrio/DeepSeek-V3.1-AWQ-Lite Nonthinking 1 timeout (0/1) (vs 83.7% full posted)
business 86/789 wrong (89.1% accuracy)
law 394/1101 wrong (64.2% accuracy)
psychology 131/798 wrong (83.6% accuracy)
biology 71/717 wrong (90.1% accuracy)
chemistry 137/1132 wrong (87.9% accuracy)
history 96/381 wrong (74.8% accuracy)
other 180/924 wrong (80.5% accuracy)
health 170/818 wrong (79.2% accuracy)
economics 109/844 wrong (87.1% accuracy)
math 101/1351 wrong (92.5% accuracy)
physics 156/1299 wrong (88.0% accuracy)
computer science 53/410 wrong (87.1% accuracy)
philosophy 102/499 wrong (79.6% accuracy)
engineering 209/969 wrong (78.4% accuracy)

ALL CATEGORIES 1995/12032 wrong (83.4% accuracy)

willfalco

27 days ago

looks like some flashinfer sampler issue VLLM_USE_FLASHINFER_SAMPLER=1
but is ran with VLLM_ATTENTION_BACKEND=TRITON_MLA

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment