Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference / app.py

Commit History

Add GPU estimator, DDG search, and cancel support

4ce42e8

Alikestocode commited on Nov 11, 2025

Load vLLM from local snapshot to support default subfolders

0e2f6c4

Alikestocode commited on Nov 11, 2025

Default to Gemma router and limit prefetch

6fb2aa6

Alikestocode commited on Nov 11, 2025

Fix prefetch init order

2790442

Alikestocode commited on Nov 11, 2025

Parallelize AWQ model prefetching

e829b15

Alikestocode commited on Nov 11, 2025

Disable vLLM by default on MIG devices

63c8de5

Alikestocode commited on Nov 11, 2025

Remove unsupported vLLM device kwarg

f886036

Alikestocode commited on Nov 11, 2025

Adjust CUDA handling and set explicit device for vLLM

8fd14bc

Alikestocode commited on Nov 11, 2025

Fix UnboundLocalError: remove duplicate torch import

5ee455a

Alikestocode commited on Nov 11, 2025

Improve vLLM device detection: force torch CUDA reinit

75aac04

Alikestocode commited on Nov 10, 2025

Fix remaining pipeline calls to use transformers_repo

34ee4d1

Alikestocode commited on Nov 10, 2025

Fix vLLM device detection and AWQ model loading

41f50c5

Alikestocode commited on Nov 10, 2025

Fix AWQ model loading: point to default/ subfolder and fix tokenizer loading

a76dbfd

Alikestocode commited on Nov 10, 2025

Update Qwen model repo to AWQ quantized version

27234fe

Alikestocode commited on Nov 10, 2025

Update Gemma model to use AWQ quantized version

b36a0b0

Alikestocode commited on Nov 10, 2025

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization

f0033ab

Alikestocode commited on Nov 10, 2025

Add advanced vLLM and LLM Compressor optimizations

808203f

Alikestocode commited on Nov 10, 2025

Clarify LLM Compressor optional status - vLLM has native AWQ support

b2bf767

Alikestocode commited on Nov 10, 2025

Fix vLLM device detection for ZeroGPU

2ddfeca

Alikestocode commited on Nov 10, 2025

Fix vLLM token parameter and improve streaming error handling

b4fd5e9

Alikestocode commited on Nov 10, 2025

Add debug logging for model loading and generation issues

54880b1

Alikestocode commited on Nov 9, 2025

Fix streaming loop break condition - only break when finished is True

d6f9002

Alikestocode commited on Nov 9, 2025

Add Cloud Run PORT environment variable support

1b04006

Alikestocode commited on Nov 8, 2025

Fix Gradio UI structure and add comprehensive fallback logging

03689e3

Alikestocode commited on Nov 8, 2025

Fix all indentation errors in Gradio UI components

06aef1b

Alikestocode commited on Nov 8, 2025

Fix syntax error: correct indentation in BitsAndBytes fallback block

f43bdac

Alikestocode commited on Nov 8, 2025

Suppress AutoAWQ deprecation warnings and improve vLLM logging

83a232d

Alikestocode commited on Nov 8, 2025

Implement vLLM with LLM Compressor and performance optimizations

a79facb

Alikestocode commited on Nov 8, 2025

Migrate to AWQ quantization with FlashAttention-2

06b4cf5

Alikestocode commited on Nov 8, 2025

Fix: Pre-create GPU wrappers at module load time for startup detection

cdac920

Alikestocode commited on Nov 8, 2025

Make GPU duration slider functional with dynamic wrapper creation

fc0ab14

Alikestocode commited on Nov 8, 2025

Fix indentation errors in _generate_router_plan_streaming_internal

c454e43

Alikestocode commited on Nov 8, 2025

Fix: Remove context manager usage for spaces.GPU decorator

a217627

Alikestocode commited on Nov 8, 2025

Add user-configurable GPU duration slider (60-1800 seconds)

9a4d6d3

Alikestocode commited on Nov 8, 2025

Fix: Move trim_at_stop_sequences function before it's used

597f1a9

Alikestocode commited on Nov 8, 2025

Fix API launch configuration

9773e4b

Alikestocode commited on Nov 8, 2025

Enable API in Gradio launch configuration

1b16b00

Alikestocode commited on Nov 8, 2025

Improve streaming with incremental JSON parsing and plan end token

f5a609d

Alikestocode commited on Nov 7, 2025

Add streaming support and increase max tokens to 20000

4f65341

Alikestocode commited on Nov 7, 2025

Fix deprecation warnings and improve error handling

bf2fdae

Alikestocode commited on Nov 7, 2025

Update app.py and requirements.txt for CourseGPT-Pro router models

4c3d05b

Alikestocode commited on Nov 7, 2025

Initial commit: ZeroGPU LLM Inference Space

f91e906

Alikestocode commited on Nov 7, 2025