Commits · Alovestocode/ZeroGPU-LLM-Inference

Suppress AutoAWQ deprecation warnings and improve vLLM logging

83a232d

Alikestocode commited on Nov 8, 2025

Implement vLLM with LLM Compressor and performance optimizations

a79facb

Alikestocode commited on Nov 8, 2025

Migrate to AWQ quantization with FlashAttention-2

06b4cf5

Alikestocode commited on Nov 8, 2025

Fix: Pre-create GPU wrappers at module load time for startup detection

cdac920

Alikestocode commited on Nov 8, 2025

Make GPU duration slider functional with dynamic wrapper creation

fc0ab14

Alikestocode commited on Nov 8, 2025

Fix indentation errors in _generate_router_plan_streaming_internal

c454e43

Alikestocode commited on Nov 8, 2025

Fix: Remove context manager usage for spaces.GPU decorator

a217627

Alikestocode commited on Nov 8, 2025

Add user-configurable GPU duration slider (60-1800 seconds)

9a4d6d3

Alikestocode commited on Nov 8, 2025

Fix: Move trim_at_stop_sequences function before it's used

597f1a9

Alikestocode commited on Nov 8, 2025

Add Gradio client API test script

de18e95

Alikestocode commited on Nov 8, 2025

Fix API launch configuration

9773e4b

Alikestocode commited on Nov 8, 2025

Enable API in Gradio launch configuration

1b16b00

Alikestocode commited on Nov 8, 2025

Update README and clean up old files

9592189

Alikestocode commited on Nov 7, 2025

Improve streaming with incremental JSON parsing and plan end token

f5a609d

Alikestocode commited on Nov 7, 2025

Add streaming support and increase max tokens to 20000

4f65341

Alikestocode commited on Nov 7, 2025

Fix deprecation warnings and improve error handling

bf2fdae

Alikestocode commited on Nov 7, 2025

Update app.py and requirements.txt for CourseGPT-Pro router models

4c3d05b

Alikestocode commited on Nov 7, 2025

Update README: Focus on CourseGPT-Pro router checkpoints

4706b45

Alikestocode commited on Nov 7, 2025

Update README with correct space URL

9af4b77

Alikestocode commited on Nov 7, 2025

Add .gitignore and remove cache files

7bc8a45

Alikestocode commited on Nov 7, 2025

Initial commit: ZeroGPU LLM Inference Space

f91e906

Alikestocode commited on Nov 7, 2025

Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

Commit History

Suppress AutoAWQ deprecation warnings and improve vLLM logging

83a232d

Implement vLLM with LLM Compressor and performance optimizations

a79facb

Migrate to AWQ quantization with FlashAttention-2

06b4cf5

Fix: Pre-create GPU wrappers at module load time for startup detection

cdac920

Make GPU duration slider functional with dynamic wrapper creation

fc0ab14

Fix indentation errors in _generate_router_plan_streaming_internal

c454e43

Fix: Remove context manager usage for spaces.GPU decorator

a217627

Add user-configurable GPU duration slider (60-1800 seconds)

9a4d6d3

Fix: Move trim_at_stop_sequences function before it's used

597f1a9

Add Gradio client API test script

de18e95

Fix API launch configuration

9773e4b

Enable API in Gradio launch configuration

1b16b00

Update README and clean up old files

9592189

Improve streaming with incremental JSON parsing and plan end token

f5a609d

Add streaming support and increase max tokens to 20000

4f65341

Fix deprecation warnings and improve error handling

bf2fdae

Update app.py and requirements.txt for CourseGPT-Pro router models

4c3d05b

Update README: Focus on CourseGPT-Pro router checkpoints

4706b45

Update README with correct space URL

9af4b77

Add .gitignore and remove cache files

7bc8a45

Initial commit: ZeroGPU LLM Inference Space

f91e906

Commit History

Suppress AutoAWQ deprecation warnings and improve vLLM logging 83a232d

Implement vLLM with LLM Compressor and performance optimizations a79facb

Migrate to AWQ quantization with FlashAttention-2 06b4cf5

Fix: Pre-create GPU wrappers at module load time for startup detection cdac920

Make GPU duration slider functional with dynamic wrapper creation fc0ab14

Fix indentation errors in _generate_router_plan_streaming_internal c454e43

Fix: Remove context manager usage for spaces.GPU decorator a217627

Add user-configurable GPU duration slider (60-1800 seconds) 9a4d6d3

Fix: Move trim_at_stop_sequences function before it's used 597f1a9

Add Gradio client API test script de18e95

Fix API launch configuration 9773e4b

Enable API in Gradio launch configuration 1b16b00

Update README and clean up old files 9592189

Improve streaming with incremental JSON parsing and plan end token f5a609d

Add streaming support and increase max tokens to 20000 4f65341

Fix deprecation warnings and improve error handling bf2fdae

Update app.py and requirements.txt for CourseGPT-Pro router models 4c3d05b

Update README: Focus on CourseGPT-Pro router checkpoints 4706b45

Update README with correct space URL 9af4b77

Add .gitignore and remove cache files 7bc8a45

Initial commit: ZeroGPU LLM Inference Space f91e906

Suppress AutoAWQ deprecation warnings and improve vLLM logging

83a232d

Implement vLLM with LLM Compressor and performance optimizations

a79facb

Migrate to AWQ quantization with FlashAttention-2

06b4cf5

Fix: Pre-create GPU wrappers at module load time for startup detection

cdac920

Make GPU duration slider functional with dynamic wrapper creation

fc0ab14

Fix indentation errors in _generate_router_plan_streaming_internal

c454e43

Fix: Remove context manager usage for spaces.GPU decorator

a217627

Add user-configurable GPU duration slider (60-1800 seconds)

9a4d6d3

Fix: Move trim_at_stop_sequences function before it's used

597f1a9

Add Gradio client API test script

de18e95

Fix API launch configuration

9773e4b

Enable API in Gradio launch configuration

1b16b00

Update README and clean up old files

9592189

Improve streaming with incremental JSON parsing and plan end token

f5a609d

Add streaming support and increase max tokens to 20000

4f65341

Fix deprecation warnings and improve error handling

bf2fdae

Update app.py and requirements.txt for CourseGPT-Pro router models

4c3d05b

Update README: Focus on CourseGPT-Pro router checkpoints

4706b45

Update README with correct space URL

9af4b77

Add .gitignore and remove cache files

7bc8a45

Initial commit: ZeroGPU LLM Inference Space

f91e906