Spaces:
Build error
Running Llama3-8B-1.58-100B-tokens on CPU
Hi, the example given on how to use the model still load and run it on GPU. How can I run these on CPU? Thanks for any pointers
Hi, sorry for the late reply. It can run on cpu but it's slow due to the unpacking logic, so it's advisable to run it on gpu, but to run it on cpu just specify that in the device_map : device_map="cpu"
Ok, thank you very much.
I was hoping that 1bit model like this will be able to run on cpu without gpu. Even run on ARM.
If you are interested, check out this space, it uses bitnet.cpp to run the model on cpu, and it's much faster : https://huggingface.co/spaces/medmekk/BitNet.cpp
are you guys still active?
Still here
Is someone continueing with development of these models (or retraining to be accurate)
I've been working on these models for a month now, but I haven't published them yet.
Is someone continueing with development of these models (or retraining to be accurate)
What model are you retraining to bitnet quantization? Would be interesting to see the new Gemma 4 12B, or maybe even some larger MoE model (Gemma 4 26B A4B), but I know this can be outside of budget for most of us...