Xwin-LM
/

Xwin-LM-70B-V0.1

Text Generation

text-generation-inference

Model card Files Files and versions

It's a bit difficult to deploy the 70B model for verification, so let's keep an eye on how things develop

#4

by wawoshashi - opened Sep 22, 2023

个人部署70B模型来做验证,有点困难, 关注事态发展

Try this quantized version https://huggingface.co/TheBloke/Xwin-LM-70B-V0.1-GGUF which only needs a 48G Vram card, or 40GB RAM cpu only.

You can try it now with llama.cpp

There is also 7B GPTQ Version https://huggingface.co/TheBloke/Xwin-LM-7B-V0.1-GPTQ only need 6G VRAM

I can run 70B quantized GGUF model (Q3_K - Small and offloaded 60/83 layers to GPU ) on 3090 via llama.cpp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment