Instructions to use allenai/OLMo-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/OLMo-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="allenai/OLMo-1B", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use allenai/OLMo-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "allenai/OLMo-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/allenai/OLMo-1B
- SGLang
How to use allenai/OLMo-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "allenai/OLMo-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "allenai/OLMo-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use allenai/OLMo-1B with Docker Model Runner:
docker model run hf.co/allenai/OLMo-1B
Warning about some weights were not initialized
I am using lm_eval from lm-evaluation-harness to get responses.
lm_eval \
--model hf \
--model_args pretrained=allenai/OLMo-1B,revision=step738000-tokens3095B \
--tasks mmlu \
--num_fewshot 0 \
--batch_size auto \
--output_path mmlu/olmo.jsonl \
--log_samples \
--trust_remote_code
Then, it shows the following warning:
Some weights of OLMoForCausalLM were not initialized from the model checkpoint at allenai/OLMo-1B and are newly initialized: ['model.transformer.ff_out.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Is this behavior expected for the above lm_eval parameters? What can be the cause of this warning? Because I am getting a lower accuracy score for MMLU than I expected.
Could you share your installed versions of transformers and ai2_olmo? I suspect that you need to update the latter.
Hi @yusuf-ackan - I just tried this (on Ubuntu 24.04 with Python 3.12.3):
$ python -m venv .venv
$ source .venv/bin/activate
$ python -m pip install lm_eval
$ python -m pip install ai2_olmo
$ <copy-paste your lm_eval command>
The evaluation succeeds and nothing is logged about "init".
Could you please share more about your setup?
Hi, previously, ai2_olmo=0.2.5 but after the update to 0.3.0, the issue was resolved. Thank you