Commit History

Suppress AutoAWQ deprecation warnings and improve vLLM logging
83a232d

Alikestocode commited on

Implement vLLM with LLM Compressor and performance optimizations
a79facb

Alikestocode commited on

Migrate to AWQ quantization with FlashAttention-2
06b4cf5

Alikestocode commited on

Fix: Pre-create GPU wrappers at module load time for startup detection
cdac920

Alikestocode commited on

Make GPU duration slider functional with dynamic wrapper creation
fc0ab14

Alikestocode commited on

Fix indentation errors in _generate_router_plan_streaming_internal
c454e43

Alikestocode commited on

Fix: Remove context manager usage for spaces.GPU decorator
a217627

Alikestocode commited on

Add user-configurable GPU duration slider (60-1800 seconds)
9a4d6d3

Alikestocode commited on

Fix: Move trim_at_stop_sequences function before it's used
597f1a9

Alikestocode commited on

Add Gradio client API test script
de18e95

Alikestocode commited on

Fix API launch configuration
9773e4b

Alikestocode commited on

Enable API in Gradio launch configuration
1b16b00

Alikestocode commited on

Update README and clean up old files
9592189

Alikestocode commited on

Improve streaming with incremental JSON parsing and plan end token
f5a609d

Alikestocode commited on

Add streaming support and increase max tokens to 20000
4f65341

Alikestocode commited on

Fix deprecation warnings and improve error handling
bf2fdae

Alikestocode commited on

Update app.py and requirements.txt for CourseGPT-Pro router models
4c3d05b

Alikestocode commited on

Update README: Focus on CourseGPT-Pro router checkpoints
4706b45

Alikestocode commited on

Update README with correct space URL
9af4b77

Alikestocode commited on

Add .gitignore and remove cache files
7bc8a45

Alikestocode commited on

Initial commit: ZeroGPU LLM Inference Space
f91e906

Alikestocode commited on