Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

App Files Files Community

ZeroGPU-LLM-Inference

104 kB

Ctrl+K

Ctrl+K

1 contributor

History: 36 commits

Alikestocode's picture

Fix linter error: use %pip instead of !pip in Colab notebook

2dff966 7 months ago

.dockerignore

104 Bytes
Add Google Cloud Platform deployment configurations 7 months ago
.gitattributes

1.52 kB
Initial commit: ZeroGPU LLM Inference Space 7 months ago
.gitignore

27 Bytes
Add .gitignore and remove cache files 7 months ago
DEPLOYMENT_STATUS.md

2.21 kB
Add deployment status document after re-authentication 7 months ago
Dockerfile

680 Bytes
Add Google Cloud Platform deployment configurations 7 months ago
FIX_PERMISSIONS.md

2.05 kB
Add permission fix guide for spherical-gate-477614-q7 project 7 months ago
QUANTIZE_AWQ.md

3.22 kB
Add Colab notebook for AWQ quantization of router models 7 months ago
QUICK_DEPLOY.md

2.86 kB
Add Cloud Build deployment script and permission setup helper 7 months ago
README.md

4.23 kB
Implement vLLM with LLM Compressor and performance optimizations 7 months ago
app.py

39.4 kB
Clarify LLM Compressor optional status - vLLM has native AWQ support 7 months ago
apt.txt

11 Bytes
Initial commit: ZeroGPU LLM Inference Space 7 months ago
cloudbuild.yaml

1.36 kB
Add Cloud Build deployment script and permission setup helper 7 months ago
deploy-cloud-build.sh

3.31 kB
Add Cloud Build deployment script and permission setup helper 7 months ago
deploy-compute-engine.sh

4.23 kB
Add Google Cloud Platform deployment configurations 7 months ago
deploy-gcp.sh

2.67 kB
Add Google Cloud Platform deployment configurations 7 months ago
gcp-deployment.md

5.32 kB
Add Google Cloud Platform deployment configurations 7 months ago
quantize_to_awq_colab.ipynb

14.9 kB
Fix linter error: use %pip instead of !pip in Colab notebook 7 months ago
requirements.txt

397 Bytes
Clarify LLM Compressor optional status - vLLM has native AWQ support 7 months ago
setup-gcp-permissions.sh

1.8 kB
Add Cloud Build deployment script and permission setup helper 7 months ago
style.css

2.84 kB
Initial commit: ZeroGPU LLM Inference Space 7 months ago
test_api.py

3.43 kB
Migrate to AWQ quantization with FlashAttention-2 7 months ago
test_api_gradio_client.py

7.2 kB
Implement vLLM with LLM Compressor and performance optimizations 7 months ago