rajpurkar/squad
Viewer • Updated • 98.2k • 160k • 364
How to use lemms/openllm-small-extended-6k with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="lemms/openllm-small-extended-6k") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("lemms/openllm-small-extended-6k", dtype="auto")How to use lemms/openllm-small-extended-6k with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lemms/openllm-small-extended-6k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lemms/openllm-small-extended-6k",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/lemms/openllm-small-extended-6k
How to use lemms/openllm-small-extended-6k with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "lemms/openllm-small-extended-6k" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lemms/openllm-small-extended-6k",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "lemms/openllm-small-extended-6k" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lemms/openllm-small-extended-6k",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use lemms/openllm-small-extended-6k with Docker Model Runner:
docker model run hf.co/lemms/openllm-small-extended-6k
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("lemms/openllm-small-extended-6k", dtype="auto")This is the OpenLLM Small Extended model trained for 6,000 steps on Wikipedia passages from the SQUAD dataset.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "lemms/openllm-small-extended-6k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=50,
temperature=0.7,
top_k=40,
do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# Use the provided load_hf_model.py script
from load_hf_model import load_model_and_tokenizer
model, tokenizer = load_model_and_tokenizer()
# ... rest of usage
This model was trained using the OpenLLM training pipeline:
This model is dual-licensed:
For commercial licensing, contact: louischua@gmail.com
If you use this model in your research, please cite:
@misc{openllm2024,
title={OpenLLM: Open Source Large Language Model},
author={Louis Chua Bean Chong},
year={2024},
url={https://github.com/louischua/openllm}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lemms/openllm-small-extended-6k")