Spaces:
Runtime error
Runtime error
Apply for community grant: Personal project (gpu)
#2
by liushuaiqian - opened
Subject: GPU Request for Deploying 7B Instruction-Tuned LLM (Gemma-7B-IT + LoRA)
Hello Hugging Face Team,
I am requesting access to a T4 GPU for my Space “test_bushu” (https://huggingface.co/spaces/liushuaiqian/test_bushu) in order to deploy and serve a fine‑tuned 7B language model based on Gemma-7B-IT, using PEFT + LoRA techniques.
Background:
- The model has been fine-tuned on custom Chinese instruction-response data using LoRA (r=16, α=32, dropout=0.05).
- Total model size (quantized to 4-bit) is around 7–8 GB, but practical inference requires ~12–16 GB of memory (including full KV cache), which exceeds the current 16 GB RAM on the CPU-only instance.
- On CPU-only configuration, the Space cannot be deployed or runs out-of-memory immediately.
Purpose:
- Support real-time Chinese instruction-tuned interaction using Gradio interface.
- Enable users to test and explore LLM capabilities in Chinese.
- Provide educational and research value, demonstrating lightweight fine-tuning techniques (LoRA + quantization).
I believe this use case aligns with Hugging Face’s mission to democratize access to state-of-the-art models and empower multilingual AI applications. I appreciate your consideration and await your approval.
Thank you very much!
Best regards,
[Your Name or HF username: liushuaiqian]