LingBot-World Base Cam NF4 Quantized Server

Docker-ready inference server for LingBot-World video generation model with pre-quantized NF4 weights.

Features

4-bit NF4 quantization via bitsandbytes - fits in 32GB VRAM
Pre-quantized weights - no runtime quantization overhead
Docker image with HTTP API - deploy on any machine with an NVIDIA GPU
Optional cloud upload - upload finished videos to Cloudflare R2, or download directly via HTTP

Model Contents

File	Size	Description
`high_noise_model_bnb_nf4/model.safetensors`	~9.6 GB	NF4 quantized diffusion model (high noise)
`low_noise_model_bnb_nf4/model.safetensors`	~9.6 GB	NF4 quantized diffusion model (low noise)
`models_t5_umt5-xxl-enc-bf16.pth`	~10.6 GB	T5-XXL text encoder (bfloat16)
`Wan2.1_VAE.pth`	~485 MB	VAE encoder/decoder

Total: ~30 GB (vs ~85 GB for full-precision models)

Requirements

Python 3.10+
CUDA 11.8+ (tested with CUDA 12.x)
~32GB VRAM (RTX 5090, A100, etc)
A lot of RAM (>64GB)

Quick Start

Without Docker

To run this package without Docker, see the Lingbot-World pre-quantized page here.

Runpod Template

https://console.runpod.io/deploy?template=j6rpw8zhj2&ref=szjabwfp

Docker Deployment

The included Dockerfile builds a self-contained image (~35 GB) with all weights baked in. Once built, you can run it on any machine with an NVIDIA GPU.

Build

docker build --platform linux/amd64 -t lingbot-nf4 .

Run

# Basic — videos saved to /app/outputs/ inside the container
docker run --gpus all -p 8080:8080 lingbot-nf4

# With a local directory mounted for output
docker run --gpus all -p 8080:8080 -v ./outputs:/app/outputs lingbot-nf4

# With Cloudflare R2 upload (optional)
docker run --gpus all -p 8080:8080 \
    -e R2_ACCOUNT_ID=your_account_id \
    -e R2_ACCESS_KEY=your_access_key \
    -e R2_SECRET_KEY=your_secret_key \
    -e R2_BUCKET=your_bucket_name \
    lingbot-nf4

The server starts on port 8080 once the model is loaded.

API

The server exposes an async job queue to handle long-running generation.

Using the client script

python caller.py \
    --url http://localhost:8080 \
    --image photo.jpg \
    --prompt "A cinematic shot of the scene" \
    --frame_num 81 \
    --output output.mp4

Without the client script

POST /generate
Content-Type: application/json

{
    "image": "<base64-encoded JPEG/PNG>",
    "prompt": "A cinematic video of the scene",
    "frame_num": 81,
    "size": "480*832",
    "seed": -1,
    "guide_scale": 5.0,
    "sampling_steps": 40,
    "action_poses": "<base64-encoded poses.npy (optional)>",
    "action_intrinsics": "<base64-encoded intrinsics.npy (optional)>"
}

Returns immediately with HTTP 202:

{"id": "job-uuid", "status": "IN_PROGRESS"}

Poll for result

GET /status/<job-id>

Returns:

{
    "id": "job-uuid",
    "status": "COMPLETED",
    "output": {
        "video_url": "https://...",
        "seed": 42,
        "duration_sec": 185.3,
        "frame_num": 81,
        "size": "480*832"
    }
}

If R2 is not configured, video_path is returned instead of video_url. The client script (caller.py) will automatically download the video via the /download endpoint in this case.

Download video

When R2 is not configured, you can download completed videos directly:

GET /download/<job-id>

Returns the MP4 file as a download.

Health check

GET /health

Cloud Deployment (eg RunPod)

Push the image to a container registry (Docker Hub, etc)
Create a GPU pod/instance with the image
Expose port 8080
Use caller.py with the pod's public URL

Quantization Details

The diffusion models are quantized using bitsandbytes NF4 with double quantization:

{
  "format": "bnb_nf4",
  "double_quant": true,
  "compute_dtype": "bfloat16",
  "blocksize": 64
}

This achieves ~3.9x compression while maintaining generation quality.

License

This model is based on LingBot-World and follows its license terms.

Downloads last month: -; Downloads are not tracked for this model. How to track