Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

68.3 kB

1 contributor

History: 25 commits

Alikestocode's picture

Add Google Cloud Platform deployment configurations

aa65d00 3 months ago

.dockerignore
104 Bytes

Add Google Cloud Platform deployment configurations 3 months ago
.gitattributes
1.52 kB

Initial commit: ZeroGPU LLM Inference Space 3 months ago
.gitignore
27 Bytes

Add .gitignore and remove cache files 3 months ago
Dockerfile
680 Bytes

Add Google Cloud Platform deployment configurations 3 months ago
README.md
4.23 kB

Implement vLLM with LLM Compressor and performance optimizations 3 months ago
app.py
34.5 kB

Fix Gradio UI structure and add comprehensive fallback logging 3 months ago
apt.txt
11 Bytes

Initial commit: ZeroGPU LLM Inference Space 3 months ago
cloudbuild.yaml
1.34 kB

Add Google Cloud Platform deployment configurations 3 months ago
deploy-compute-engine.sh
4.23 kB

Add Google Cloud Platform deployment configurations 3 months ago
deploy-gcp.sh
2.67 kB

Add Google Cloud Platform deployment configurations 3 months ago
gcp-deployment.md
5.32 kB

Add Google Cloud Platform deployment configurations 3 months ago
requirements.txt
197 Bytes

Implement vLLM with LLM Compressor and performance optimizations 3 months ago
style.css
2.84 kB

Initial commit: ZeroGPU LLM Inference Space 3 months ago
test_api.py
3.43 kB

Migrate to AWQ quantization with FlashAttention-2 3 months ago
test_api_gradio_client.py
7.2 kB

Implement vLLM with LLM Compressor and performance optimizations 3 months ago