Instructions to use BK-Lee/Meteor-Mamba with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BK-Lee/Meteor-Mamba with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BK-Lee/Meteor-Mamba")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BK-Lee/Meteor-Mamba") model = AutoModelForCausalLM.from_pretrained("BK-Lee/Meteor-Mamba") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BK-Lee/Meteor-Mamba with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BK-Lee/Meteor-Mamba" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/Meteor-Mamba", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BK-Lee/Meteor-Mamba
- SGLang
How to use BK-Lee/Meteor-Mamba with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BK-Lee/Meteor-Mamba" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/Meteor-Mamba", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BK-Lee/Meteor-Mamba" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BK-Lee/Meteor-Mamba", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BK-Lee/Meteor-Mamba with Docker Model Runner:
docker model run hf.co/BK-Lee/Meteor-Mamba
question on computation cost
Hey BK-Lee,
I'm super impressed with your open-source model on Hugging Face and am considering it for a Chinese AI Q&A app on WeChat. However, I'm torn between developing with your open-source model and using APIs from big tech companies, which offer good performance at very low costs.
Right now, I have a 4090 GPU that runs a 20B model fine for a few users, but I'm worried about how the costs could skyrocket if the user base grows significantly. Could you share any insights on the necessary hardware if many users are online simultaneously? I’m trying to figure out the potential computational costs to better decide between the open-source approach and a commercial API.
Thanks a ton for your help and for making such an awesome tool available to everyone!
Cheers,
Liz
Actually, I have no ability to determine this kind of information because I dont have any experience of dealing with bunch of users queries.
To manage billon number users, It was known that OpenAI built lots of server system. In addition, there has been a rumor that OpenAI used 128k number of GPUs to train and 128 number of GPUs for inference. Based on these facts.. , I guess that 100 number of GPUs (A100-like large VRAM GPU) are at least needed once the users are increasing.