Can't serve the model using TEI
I am trying to serve pplx-embed-v1-4B using HuggingeFace TEI (Text Embedding Inference), using following command:text-embeddings-router-80 --port 3114 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32 --max-batch-tokens 8096 --max-client-batch-size 2
inside container created from the latest TEI docker image (huggingface/text-embeddings-inference:cuda-1.9.2)
but I am getting following error:
2026-03-03T20:18:15.360847Z INFO text_embeddings_router: router/src/main.rs:216: Args { model_id: "per*******-**/****-*****-*1-4B", revision: None, tokenization_workers: None, dtype: Some(Float32), served_model_name: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 8096, max_batch_requests: None, max_client_batch_size: 2, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "run1067882-tei-pplx-embed-v1-4b-s1", port: 3114, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/group-volume/KR/cache/huggingface/hub"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2026-03-03T20:18:15.361376Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/group-volume/KR/cache/huggingface/token" 2026-03-03T20:18:15.475098Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2026-03-03T20:18:15.475109Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2026-03-03T20:18:15.477242Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2026-03-03T20:18:15.963924Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2026-03-03T20:18:16.390802Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`2026-03-03T20:18:16.822889Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2026-03-03T20:18:17.267219Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`2026-03-03T20:18:17.727894Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2026-03-03T20:18:18.182772Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2026-03-03T20:18:18.639727Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2026-03-03T20:18:19.087385Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:65: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/config_sentence_transformers.json)
2026-03-03T20:18:19.087400Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2026-03-03T20:18:19.088214Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
Error: Could not download model artifacts
Caused by:
0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)
1: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)1772569097728
(END)
Summary: TEI expects tokenizer.json for fast tokenizers, but pplx-embed-v1-4b does not contain it, it only contains tokenizer_config.json file
Should I use different TEI version?
README on the model card clearly states the model should work with TEI.
Ok, seems like TEI doesn't really ship version 1.9.2 in docker image with version 1.9.2:
docker run --entrypoint bash ghcr.io/huggingface/text-embeddings-inference:cpu-1.9.2 -c "text-embeddings-router --version"
text-embeddings-router 1.9.1
huh...
I got an errorcalled `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
so checked the version of TEI, and its 1.9.1 instead of 1.9.2
Hey @TomaszZietkiewicz it's indeed version v1.9.2 despite the inner version saying it's v1.9.1, that's due to an issue when updating the crate version but the underlying version is indeed v1.9.2, it's just that it was not properly updated!
As per the missing tokenizer.json you're right, maybe cc @bowang0911 in case we can include it here too?
Finally, as per your error @juni3227 (see the clarification on the version above) it's likely that it's due to OOM, but if you could provide more information on which hardware and command are you using that'd be great (feel free to as well report this on https://github.com/huggingface/text-embeddings-inference/issues/new if applicable).
Thanks and apologies for the inconveniences! 🤗