Commit History

Add GPU estimator, DDG search, and cancel support
4ce42e8

Alikestocode commited on

Load vLLM from local snapshot to support default subfolders
0e2f6c4

Alikestocode commited on

Default to Gemma router and limit prefetch
6fb2aa6

Alikestocode commited on

Fix prefetch init order
2790442

Alikestocode commited on

Parallelize AWQ model prefetching
e829b15

Alikestocode commited on

Disable vLLM by default on MIG devices
63c8de5

Alikestocode commited on

Remove unsupported vLLM device kwarg
f886036

Alikestocode commited on

Adjust CUDA handling and set explicit device for vLLM
8fd14bc

Alikestocode commited on

Fix UnboundLocalError: remove duplicate torch import
5ee455a

Alikestocode commited on

Improve vLLM device detection: force torch CUDA reinit
75aac04

Alikestocode commited on

Fix remaining pipeline calls to use transformers_repo
34ee4d1

Alikestocode commited on

Fix vLLM device detection and AWQ model loading
41f50c5

Alikestocode commited on

Fix AWQ model loading: point to default/ subfolder and fix tokenizer loading
a76dbfd

Alikestocode commited on

Update Qwen model repo to AWQ quantized version
27234fe

Alikestocode commited on

Update Gemma model to use AWQ quantized version
b36a0b0

Alikestocode commited on

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization
f0033ab

Alikestocode commited on

Add advanced vLLM and LLM Compressor optimizations
808203f

Alikestocode commited on

Clarify LLM Compressor optional status - vLLM has native AWQ support
b2bf767

Alikestocode commited on

Fix vLLM device detection for ZeroGPU
2ddfeca

Alikestocode commited on

Fix vLLM token parameter and improve streaming error handling
b4fd5e9

Alikestocode commited on

Add debug logging for model loading and generation issues
54880b1

Alikestocode commited on

Fix streaming loop break condition - only break when finished is True
d6f9002

Alikestocode commited on

Add Cloud Run PORT environment variable support
1b04006

Alikestocode commited on

Fix Gradio UI structure and add comprehensive fallback logging
03689e3

Alikestocode commited on

Fix all indentation errors in Gradio UI components
06aef1b

Alikestocode commited on

Fix syntax error: correct indentation in BitsAndBytes fallback block
f43bdac

Alikestocode commited on

Suppress AutoAWQ deprecation warnings and improve vLLM logging
83a232d

Alikestocode commited on

Implement vLLM with LLM Compressor and performance optimizations
a79facb

Alikestocode commited on

Migrate to AWQ quantization with FlashAttention-2
06b4cf5

Alikestocode commited on

Fix: Pre-create GPU wrappers at module load time for startup detection
cdac920

Alikestocode commited on

Make GPU duration slider functional with dynamic wrapper creation
fc0ab14

Alikestocode commited on

Fix indentation errors in _generate_router_plan_streaming_internal
c454e43

Alikestocode commited on

Fix: Remove context manager usage for spaces.GPU decorator
a217627

Alikestocode commited on

Add user-configurable GPU duration slider (60-1800 seconds)
9a4d6d3

Alikestocode commited on

Fix: Move trim_at_stop_sequences function before it's used
597f1a9

Alikestocode commited on

Fix API launch configuration
9773e4b

Alikestocode commited on

Enable API in Gradio launch configuration
1b16b00

Alikestocode commited on

Improve streaming with incremental JSON parsing and plan end token
f5a609d

Alikestocode commited on

Add streaming support and increase max tokens to 20000
4f65341

Alikestocode commited on

Fix deprecation warnings and improve error handling
bf2fdae

Alikestocode commited on

Update app.py and requirements.txt for CourseGPT-Pro router models
4c3d05b

Alikestocode commited on

Initial commit: ZeroGPU LLM Inference Space
f91e906

Alikestocode commited on