Instructions to use microsoft/bitnet-b1.58-2B-4T with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/bitnet-b1.58-2B-4T with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/bitnet-b1.58-2B-4T", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/bitnet-b1.58-2B-4T", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("microsoft/bitnet-b1.58-2B-4T", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/bitnet-b1.58-2B-4T with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/bitnet-b1.58-2B-4T" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/bitnet-b1.58-2B-4T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/bitnet-b1.58-2B-4T
- SGLang
How to use microsoft/bitnet-b1.58-2B-4T with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/bitnet-b1.58-2B-4T" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/bitnet-b1.58-2B-4T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/bitnet-b1.58-2B-4T" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/bitnet-b1.58-2B-4T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/bitnet-b1.58-2B-4T with Docker Model Runner:
docker model run hf.co/microsoft/bitnet-b1.58-2B-4T
Will the fine-tuning code be provided?
I truly believe that Bitnet is an outstanding achievement that opens up new possibilities for the future.
Thank you very much for releasing such an impressive model.
While the paper is available, I was wondering if you plan to release the code for properly fine-tuning the Bitnet model as well?
Sincerely,
Axcxept Inc.
Thank you for your reply.
After reading the paper, I was able to resolve the issue by using SFT and DPO, setting the learning rate to 2e-7, and testing with a longer context length.
I truly appreciate your outstanding work.
If there’s anything else we should be mindful of during training, please let me know.
From our experiments, we concluded that no special training code is required.
@AXCXEPT could you please post your fine tuning code? could you also train a binary text classifier with fine tuning?
We were actually able to perform training with SFT yesterday. However, upon reflection, this shouldn’t have been possible due to the specifications of TRL.
Now, our training code not work.
Therefore, we are now investigating the truth in the following thread.
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/discussions/12
Can anyone answer that Q?