Instructions to use BK-Lee/MoAI-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BK-Lee/MoAI-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="BK-Lee/MoAI-7B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("BK-Lee/MoAI-7B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BK-Lee/MoAI-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BK-Lee/MoAI-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/MoAI-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BK-Lee/MoAI-7B
- SGLang
How to use BK-Lee/MoAI-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BK-Lee/MoAI-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/MoAI-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BK-Lee/MoAI-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/MoAI-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BK-Lee/MoAI-7B with Docker Model Runner:
docker model run hf.co/BK-Lee/MoAI-7B
Error when load the model
I try to load model and got error bellow:
ValueError: The checkpoint you are trying to load has model type internlm but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
Here is load code:
from transformers import AutoModel
model = AutoModel.from_pretrained("BK-Lee/MoAI-7B")
transformers version is 4.39.0 also i tried 4.38.0 but same
Ok I will try it Thanks to report the error
Oh I saw the reason of this problem.
This is due to the transformers fucntion such as AutoModel or AutoCausalLM.
They support only architectures registered in Huggingface Transformers Library (You can see the left side bar in https://huggingface.co/docs/transformers/index) where various models but seems old are seen).
However, MoAI adopts recent language model architecture not registered in Huggingface Transformers Library. (As you know, the more recent models, the higher probability huggingface does not cover)
Thereforee, you should follow the procedures I provided in README file.
You can see the function prepare_moai in the README, and you can see the lines for how to load MoAI.
# MoAI
bnb_model_from_pretrained_args = {}
if bits in [4, 8]:
from transformers import BitsAndBytesConfig
bnb_model_from_pretrained_args.update(dict(
torch_dtype=torch.bfloat16 if dtype=='bf16' else torch.float16,
low_cpu_mem_usage=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=bits == 4,
load_in_8bit=bits == 8,
llm_int8_skip_modules=["vision_tower", "vision_proj", "Plora_main", "moai", "output"],
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.bfloat16 if dtype=='bf16' else torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)
))
# MoAIModel Loading
moai_model = MoAIModel.from_pretrained(moai_path, **bnb_model_from_pretrained_args)
Thanks to you, I could further understand the huggingface library structure :)
Wow, tnx a lot )