Can't serve the model using TEI

#12

Mar 4

•

I am trying to serve pplx-embed-v1-4B using HuggingeFace TEI (Text Embedding Inference), using following command:
text-embeddings-router-80 --port 3114 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32 --max-batch-tokens 8096 --max-client-batch-size 2
inside container created from the latest TEI docker image (huggingface/text-embeddings-inference:cuda-1.9.2)
but I am getting following error:

2026-03-03T20:18:15.360847Z  INFO text_embeddings_router: router/src/main.rs:216: Args { model_id: "per*******-**/****-*****-*1-4B", revision: None, tokenization_workers: None, dtype: Some(Float32), served_model_name: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 8096, max_batch_requests: None, max_client_batch_size: 2, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "run1067882-tei-pplx-embed-v1-4b-s1", port: 3114, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/group-volume/KR/cache/huggingface/hub"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None } 
2026-03-03T20:18:15.361376Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/group-volume/KR/cache/huggingface/token"    2026-03-03T20:18:15.475098Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2026-03-03T20:18:15.475109Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2026-03-03T20:18:15.477242Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2026-03-03T20:18:15.963924Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2026-03-03T20:18:16.390802Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`2026-03-03T20:18:16.822889Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2026-03-03T20:18:17.267219Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`2026-03-03T20:18:17.727894Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2026-03-03T20:18:18.182772Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2026-03-03T20:18:18.639727Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2026-03-03T20:18:19.087385Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:65: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/config_sentence_transformers.json)
2026-03-03T20:18:19.087400Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2026-03-03T20:18:19.088214Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
Error: Could not download model artifacts

Caused by:
    0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)
    1: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)1772569097728
(END)

Summary: TEI expects tokenizer.json for fast tokenizers, but pplx-embed-v1-4b does not contain it, it only contains tokenizer_config.json file

Should I use different TEI version?
README on the model card clearly states the model should work with TEI.

TomaszZietkiewicz

Mar 4

Ok, seems like TEI doesn't really ship version 1.9.2 in docker image with version 1.9.2:

docker run --entrypoint bash ghcr.io/huggingface/text-embeddings-inference:cpu-1.9.2 -c "text-embeddings-router --version"
text-embeddings-router 1.9.1

juni3227

Mar 9

huh...
I got an error
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
so checked the version of TEI, and its 1.9.1 instead of 1.9.2

alvarobartt

Mar 10

Hey @TomaszZietkiewicz it's indeed version v1.9.2 despite the inner version saying it's v1.9.1, that's due to an issue when updating the crate version but the underlying version is indeed v1.9.2, it's just that it was not properly updated!

As per the missing tokenizer.json you're right, maybe cc @bowang0911 in case we can include it here too?

Finally, as per your error @juni3227 (see the clarification on the version above) it's likely that it's due to OOM, but if you could provide more information on which hardware and command are you using that'd be great (feel free to as well report this on https://github.com/huggingface/text-embeddings-inference/issues/new if applicable).

Thanks and apologies for the inconveniences! 🤗

mkrimmel-pplx

Perplexity org Apr 2

@juni3227 we just merged a fix for this. Let us know if you still encounter issues!

juni3227

14 days ago

@alvarobartt Our hardware is blackwell pro 6000, so Its definatively not OOM.
@mkrimmel-pplx will check if it works now.

alvarobartt

13 days ago

Thanks for letting us know @juni3227 , make sure to run with latest Text Embeddings Inference v1.9.3 and feel free to report back!

Note that if you're running with Docker, we've released Blackwell specific containers e.g., ghcr.io/huggingface/text-embeddings-inference:120-1.9.3 (120 there stands for the compute capability for your Blackwell Pro 6000 instance i.e., 12.0) 🤗

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment