A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Text-to-Video Generator
emoji: π₯
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 5.29.0
app_file: app/nlp.py
pinned: false
hardware: gpu
π₯ Text-to-Video gRPC Microservice
This project implements a text-to-video generation microservice using a gRPC backend, powered by the zeroscope_v2_576w diffusion model. It features a containerized API, concurrent request support, and a minimal Gradio frontend for user interaction. It is designed for reproducibility, ease of testing, and deployment.
π Features
- Generate videos from text prompts using Hugging Face's Diffusers
- gRPC API with structured response (status code, message, video path)
- Minimal Gradio frontend for user testing
- Video filtering options (None, Grayscale, Sepia) for stylized output
- Audio transcription support using Whisper model
- Full Docker containerization
- Concurrent request support via multithreading
- Postman-compatible testable gRPC API
- Unit + Load testing support
- GitHub Actions CI for build and test
π¦ Setup
1. Clone the Repository
git clone https://github.com/abdullahmaz/text2video-grpc-docker.git
cd text2video-grpc-docker
2. Install Dependencies
pip install -r requirements.txt
3. Run Locally
python -m app.nlp
- gRPC server starts at
127.0.0.1:50051 - Gradio UI launches at
http://127.0.0.1:7860
π Docker Usage
Build the Image
docker build -t text2video-service .
Run the Container
docker run -p 50051:50051 -p 7860:7860 text2video-service
π§ͺ Testing
Unit Tests
python -m unittest discover tests
Load Testing
python tests/load_test.py
Postman
- Import
text2video.protointo Postman - Use gRPC tab, method:
VideoGenerator.Generate - Message input:
{
"prompt": "A robot teaching in a floating classroom",
"audio_path": "",
"filter_option": "Sepia"
}
- Test scripts use
pm.response.messagesfor validation.
π€ API Specification
gRPC Service
Service: VideoGenerator
Method: Generate
Request
message VideoResponse {
string video_path = 1; // Path to the generated video file
string message = 2; // Human-readable status message
int32 status_code = 3; // 200 for success, 400/500 for errors
}
Response
message VideoResponse {
string video_path = 1;
string message = 2;
int32 status_code = 3;
}
π Performance Graph
Below is the performance graph showing the average response time against the number of concurrent users:
π₯ Sample Video
Here is a sample video generated by the service:
https://github.com/user-attachments/assets/cf5ca219-552a-4573-8d3c-677a09a68c14
π§± Architecture Overview
[ Gradio UI ] [ Postman ]
β β
βΌ βΌ
βββββββββββββ gRPC Interface ββββββββββββββ
β β
β VideoGeneratorServicer β
β ββββββββββββββββββββββββββββ β
β β DiffusionPipeline β β
β β (zeroscope_v2_576w) β β
β ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
MP4 video file
π§ Model Source
- Model:
cerspense/zeroscope_v2_576w - Scheduler:
DPMSolverMultistepScheduler - Framework: Hugging Face
diffusers,torch,gradio
β οΈ Limitations
- May be slow to start due to model size and video rendering
- GPU recommended for practical response time
- Text prompts may not always generate contextually accurate results
- No prompt history or user management
π€ Authors
Abdullah Mazhar
Katrina Bodani
Haider Niaz
Hugging Face Space