1kxia
/

Qwen3-Embedding-0.6B-modelopt-fp8

Feature Extraction

text-generation

nvidia-modelopt

text-embeddings-inference

Model card Files Files and versions

1kxia commited on Feb 11

Commit

22028ee

·

verified ·

1 Parent(s): c8635c5

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +1 -16

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
   - quantized
   - embedding
   - nvidia-modelopt
-  - nanojet
 pipeline_tag: feature-extraction
 ---
@@ -76,21 +76,6 @@ All configurations achieve >0.99 cosine similarity with the BF16 baseline.
 └── generation_config.json   # Generation config
 ```
-## Usage
-This model is designed to be used with the [NanoJet](https://github.com/ai-microsoft/NanoJet_Kernels) inference engine:
-```python
-from test.infrastructure.model_utils import load_nanojet_model
-model = load_nanojet_model(
-    "1kxia/Qwen3-Embedding-0.6B-modelopt-fp8",
-    batch=8,
-    seq_len=4096,
-    quantization="fp8"
-)
-```
 ## Intended Use
 This model is intended for efficient FP8 inference of text embeddings on NVIDIA GPUs with FP8 support (Hopper architecture and above).

   - quantized
   - embedding
   - nvidia-modelopt
 pipeline_tag: feature-extraction
 ---
 └── generation_config.json   # Generation config
 ```
 ## Intended Use
 This model is intended for efficient FP8 inference of text embeddings on NVIDIA GPUs with FP8 support (Hopper architecture and above).