Spaces:

HF1BitLLM
/

README

Build error

Running Llama3-8B-1.58-100B-tokens on CPU

by chiauho - opened Oct 23, 2024

Hugging Face 1Bit LLMs org Oct 23, 2024

Hi, the example given on how to use the model still load and run it on GPU. How can I run these on CPU? Thanks for any pointers

medmekk

Hugging Face 1Bit LLMs org Nov 4, 2024

Hi, sorry for the late reply. It can run on cpu but it's slow due to the unpacking logic, so it's advisable to run it on gpu, but to run it on cpu just specify that in the device_map : device_map="cpu"

chiauho

Hugging Face 1Bit LLMs org Nov 4, 2024

Ok, thank you very much.

I was hoping that 1bit model like this will be able to run on cpu without gpu. Even run on ARM.

medmekk

Hugging Face 1Bit LLMs org Nov 4, 2024

If you are interested, check out this space, it uses bitnet.cpp to run the model on cpu, and it's much faster : https://huggingface.co/spaces/medmekk/BitNet.cpp

Murban35

Hugging Face 1Bit LLMs org 17 days ago

are you guys still active?

BerkanDogan

Hugging Face 1Bit LLMs org 15 days ago

Still here

Murban35

Hugging Face 1Bit LLMs org 9 days ago

Is someone continueing with development of these models (or retraining to be accurate)

BerkanDogan

Hugging Face 1Bit LLMs org 8 days ago

I've been working on these models for a month now, but I haven't published them yet.

Is someone continueing with development of these models (or retraining to be accurate)

Murban35

Hugging Face 1Bit LLMs org 8 days ago

What model are you retraining to bitnet quantization? Would be interesting to see the new Gemma 4 12B, or maybe even some larger MoE model (Gemma 4 26B A4B), but I know this can be outside of budget for most of us...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment