Instructions to use teapotai/teapotllm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use teapotai/teapotllm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="teapotai/teapotllm")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("teapotai/teapotllm")
model = AutoModelForSeq2SeqLM.from_pretrained("teapotai/teapotllm")

Transformers.js

How to use teapotai/teapotllm with Transformers.js:

// npm i @huggingface/transformers
import { pipeline } from '@huggingface/transformers';

// Allocate pipeline
const pipe = await pipeline('text-generation', 'teapotai/teapotllm');

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use teapotai/teapotllm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "teapotai/teapotllm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teapotai/teapotllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/teapotai/teapotllm

SGLang

How to use teapotai/teapotllm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "teapotai/teapotllm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teapotai/teapotllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "teapotai/teapotllm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teapotai/teapotllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use teapotai/teapotllm with Docker Model Runner:
```
docker model run hf.co/teapotai/teapotllm
```

Serving Model via mlserver-huggingface

by gdagil - opened Apr 6, 2025

Discussion

gdagil

Apr 6, 2025

•

edited Apr 6, 2025

Hi,

I'm encountering an error while trying to serve your model using mlserver-huggingface.
Here’s the error message I received:

[mlserver] INFO - Couldn't load model 'teapotllm'. Model will be removed from registry.
[mlserver.parallel] ERROR - An error occurred processing a model update of type 'Load'.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/mlserver/registry.py", line 167, in _load_model
    model.ready = await model.load()
  File "/opt/conda/lib/python3.10/site-packages/mlserver_huggingface/runtime.py", line 29, in load
    self._model = load_pipeline_from_settings(self.hf_settings, self.settings)
  File "/opt/conda/lib/python3.10/site-packages/mlserver_huggingface/common.py", line 53, in load_pipeline_from_settings
    hf_pipeline = pipeline(
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 1047, in pipeline
    tokenizer = AutoTokenizer.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 934, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2036, in from_pretrained
    return cls._from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2074, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2276, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 150, in __init__
    self.sp_model.Load(vocab_file)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

It seems that the model fails to load due to an issue with the tokenizer.

My question is: How can I perform inference with this model without using the teapotai Python package?

Thank you for your assistance!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment