Spaces

Duplicated from muryshev/llama-cpp-server-13b

muryshev
/

llama-cpp-server-70b

Runtime error

App Files Files Community

runtime error

erved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes {"timestamp":1705629626,"level":"INFO","function":"main","line":2870,"message":"build info","build":1917,"commit":"57e2a7a"} {"timestamp":1705629626,"level":"INFO","function":"main","line":2873,"message":"system info","n_threads":48,"n_threads_batch":-1,"total_threads":96,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} Failed to open logfile 'llama.log' with error 'Permission denied' [1705629626] llama server listening at http://0.0.0.0:7860 {"timestamp":1705629626,"level":"INFO","function":"main","line":2977,"message":"HTTP server listening","port":"7860","hostname":"0.0.0.0"} gguf_init_from_file: invalid magic characters '<!DO' llama_model_load: error loading model: llama_model_loader: failed to load model from /models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model '/models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf' terminate called without an active exception {"timestamp":1705629626,"level":"ERROR","function":"load_model","line":599,"message":"unable to load model","model":"/models/mixtral-8x7b-instruct-v0.1.Q2_K.gguf"} ./run.sh: line 10: 27 Aborted ./server -m "$MODEL_PATH" -c $CONTEXT --port $PORT --host 0.0.0.0 --n-gpu-layers $N_GPU_LAYERS --path "/app/public"

Container logs:

Fetching error logs...