Current Nightly Build is Unavailable + Other VLLM issues. Working docker config as of 1/21/2026

#3
by bigstorm - opened

No luck with the config in the readme. The listed nightly no longer exists. I tried the latest (v0.14) docker container without luck, I also tried my most recent nightly build at the time (vllm/vllm-openai:nightly-b4f64e5b02a949c8856c9f81990b77ca56296cdc), no dice.

Mostly failing due to cublasLt.h compilation errors .

Disabling flashinfer and using marlin resulted in GEMM errors.
Failed to initialize GEMM: status=7 workspace_size=168448 num_experts=128 M=131072 N=3072 K=3072

Removing --enable-expert-parallel and --all2all-backend pplx resolved those issues.

My current working docker container for 2x RTX 6000 Pros is:

services:
  inference:
    image: vllm/vllm-openai:nightly-b4f64e5b02a949c8856c9f81990b77ca56296cdc
    container_name: inference
    ports:
      - "0.0.0.0:8000:8000"
    shm_size: "32g"
    ipc: "host"
    ulimits:
      memlock: -1
      nofile: 1048576
    environment:
      - NCCL_IB_DISABLE=1
      - NCCL_NVLS_ENABLE=0
      - NCCL_P2P_DISABLE=0
      - NCCL_SHM_DISABLE=0
      - VLLM_USE_V1=1
      - VLLM_USE_FLASHINFER_MOE_FP4=0
      - VLLM_MXFP4_USE_MARLIN=1
      - OMP_NUM_THREADS=8
      - SAFETENSORS_FAST_GPU=1
    volumes:
      - /dev/shm:/dev/shm
      - /hdd_nas:/hdd_nas
    command:
      - /hdd_nas/models/lukealonso-MiniMax-M2.1-NVFP4
      - --enable-auto-tool-choice
      - --tool-call-parser
      - minimax_m2
      - --reasoning-parser
      - minimax_m2_append_think
      - --enable-prefix-caching
      - --enable-chunked-prefill
      - --served-model-name
      - "MiniMax-M2.1"
      - --tensor-parallel-size
      - "2"
      - --gpu-memory-utilization
      - "0.95"
      - --max-num-batched-tokens
      - "16384"
      - --dtype
      - "auto"
      - --max-num-seqs
      - "16"
      - --kv-cache-dtype
      - fp8
      - --host
      - "0.0.0.0"
      - --port
      - "8000"
      - --trust_remote_code
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: always

Enjoy responsibly.

Sign up or log in to comment