Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

68.6 kB

1 contributor

History: 26 commits

Alikestocode's picture

Add Cloud Run PORT environment variable support

1b04006 4 months ago

.dockerignore

104 Bytes

Add Google Cloud Platform deployment configurations 4 months ago
.gitattributes

1.52 kB

Initial commit: ZeroGPU LLM Inference Space 4 months ago
.gitignore

27 Bytes

Add .gitignore and remove cache files 4 months ago
Dockerfile

680 Bytes

Add Google Cloud Platform deployment configurations 4 months ago
README.md

4.23 kB

Implement vLLM with LLM Compressor and performance optimizations 4 months ago
app.py

34.8 kB

Add Cloud Run PORT environment variable support 4 months ago
apt.txt

11 Bytes

Initial commit: ZeroGPU LLM Inference Space 4 months ago
cloudbuild.yaml

1.34 kB

Add Google Cloud Platform deployment configurations 4 months ago
deploy-compute-engine.sh

4.23 kB

Add Google Cloud Platform deployment configurations 4 months ago
deploy-gcp.sh

2.67 kB

Add Google Cloud Platform deployment configurations 4 months ago
gcp-deployment.md

5.32 kB

Add Google Cloud Platform deployment configurations 4 months ago
requirements.txt

197 Bytes

Implement vLLM with LLM Compressor and performance optimizations 4 months ago
style.css

2.84 kB

Initial commit: ZeroGPU LLM Inference Space 4 months ago
test_api.py

3.43 kB

Migrate to AWQ quantization with FlashAttention-2 4 months ago
test_api_gradio_client.py

7.2 kB

Implement vLLM with LLM Compressor and performance optimizations 4 months ago