Can't serve the model using TEI

#12
by TomaszZietkiewicz - opened

I am trying to serve pplx-embed-v1-4B using HuggingeFace TEI (Text Embedding Inference), using following command:
text-embeddings-router-80 --port 3114 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32 --max-batch-tokens 8096 --max-client-batch-size 2
inside container created from the latest TEI docker image (huggingface/text-embeddings-inference:cuda-1.9.2)
but I am getting following error:

2026-03-03T20:18:15.360847Z  INFO text_embeddings_router: router/src/main.rs:216: Args { model_id: "per*******-**/****-*****-*1-4B", revision: None, tokenization_workers: None, dtype: Some(Float32), served_model_name: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 8096, max_batch_requests: None, max_client_batch_size: 2, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "run1067882-tei-pplx-embed-v1-4b-s1", port: 3114, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/group-volume/KR/cache/huggingface/hub"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None } 
2026-03-03T20:18:15.361376Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/group-volume/KR/cache/huggingface/token"    2026-03-03T20:18:15.475098Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2026-03-03T20:18:15.475109Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2026-03-03T20:18:15.477242Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2026-03-03T20:18:15.963924Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2026-03-03T20:18:16.390802Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`2026-03-03T20:18:16.822889Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2026-03-03T20:18:17.267219Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`2026-03-03T20:18:17.727894Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2026-03-03T20:18:18.182772Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2026-03-03T20:18:18.639727Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2026-03-03T20:18:19.087385Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:65: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/config_sentence_transformers.json)
2026-03-03T20:18:19.087400Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2026-03-03T20:18:19.088214Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
Error: Could not download model artifacts

Caused by:
    0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)
    1: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)1772569097728
(END)

Summary: TEI expects tokenizer.json for fast tokenizers, but pplx-embed-v1-4b does not contain it, it only contains tokenizer_config.json file

Should I use different TEI version?
README on the model card clearly states the model should work with TEI.

Ok, seems like TEI doesn't really ship version 1.9.2 in docker image with version 1.9.2:

docker run --entrypoint bash ghcr.io/huggingface/text-embeddings-inference:cpu-1.9.2 -c "text-embeddings-router --version"
text-embeddings-router 1.9.1

huh...
I got an error
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
so checked the version of TEI, and its 1.9.1 instead of 1.9.2

Hey @TomaszZietkiewicz it's indeed version v1.9.2 despite the inner version saying it's v1.9.1, that's due to an issue when updating the crate version but the underlying version is indeed v1.9.2, it's just that it was not properly updated!

As per the missing tokenizer.json you're right, maybe cc @bowang0911 in case we can include it here too?

Finally, as per your error @juni3227 (see the clarification on the version above) it's likely that it's due to OOM, but if you could provide more information on which hardware and command are you using that'd be great (feel free to as well report this on https://github.com/huggingface/text-embeddings-inference/issues/new if applicable).

Thanks and apologies for the inconveniences! 🤗

Sign up or log in to comment