Instructions to use stanford-crfm/BioMedLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stanford-crfm/BioMedLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stanford-crfm/BioMedLM")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stanford-crfm/BioMedLM") model = AutoModelForCausalLM.from_pretrained("stanford-crfm/BioMedLM") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stanford-crfm/BioMedLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stanford-crfm/BioMedLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stanford-crfm/BioMedLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/stanford-crfm/BioMedLM
- SGLang
How to use stanford-crfm/BioMedLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stanford-crfm/BioMedLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stanford-crfm/BioMedLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stanford-crfm/BioMedLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stanford-crfm/BioMedLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use stanford-crfm/BioMedLM with Docker Model Runner:
docker model run hf.co/stanford-crfm/BioMedLM
Partial answers in the model card
Hi, I'm trying to test the model card with the question below, but the answer is truncated.
Hitting "compute" again adds a few more words, but still seems incomplete.
--------Quote----------------------------------------
How does pcsk9 impact ldl receptor processing? {#s4a}
The LDLr pathway requires proteolytic cleavage of the receptor to generate the mature, non-
Hi, the model is a bit too big for the HF hosted inference API (or at least the free version) to work well. You're best off trying it with your own hardware
Thanks. I tried moving it to a dedicated instance (GPU Medium instance) and i do get longer answers, although it does still seem to get "stuck".
Any advice on the right instance size ?
Also - what's the right way to use "ChatGPT mode" ( answering questions ) ? I tried several questions and it don't seem to behave the same way
The model is not really "chat" or "instruction" tuned, yet (this is something we're interested in). So it's not going to be particularly chatty out of the box. It's mostly going to want to imitate pubmed articles or abstracts.
In our experiments so far we finetuned it for the particular tasks. We haven't released those models.
I don't really have any experience with HF's inference APIs so I couldn't really say what's expected to work... I would have thought a GPU with 30gb would be more than enough, but again I haven't used their APIs.
Thanks