1kxia commited on
Commit
22028ee
·
verified ·
1 Parent(s): c8635c5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -16
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - quantized
8
  - embedding
9
  - nvidia-modelopt
10
- - nanojet
11
  pipeline_tag: feature-extraction
12
  ---
13
 
@@ -76,21 +76,6 @@ All configurations achieve >0.99 cosine similarity with the BF16 baseline.
76
  └── generation_config.json # Generation config
77
  ```
78
 
79
- ## Usage
80
-
81
- This model is designed to be used with the [NanoJet](https://github.com/ai-microsoft/NanoJet_Kernels) inference engine:
82
-
83
- ```python
84
- from test.infrastructure.model_utils import load_nanojet_model
85
-
86
- model = load_nanojet_model(
87
- "1kxia/Qwen3-Embedding-0.6B-modelopt-fp8",
88
- batch=8,
89
- seq_len=4096,
90
- quantization="fp8"
91
- )
92
- ```
93
-
94
  ## Intended Use
95
 
96
  This model is intended for efficient FP8 inference of text embeddings on NVIDIA GPUs with FP8 support (Hopper architecture and above).
 
7
  - quantized
8
  - embedding
9
  - nvidia-modelopt
10
+
11
  pipeline_tag: feature-extraction
12
  ---
13
 
 
76
  └── generation_config.json # Generation config
77
  ```
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Intended Use
80
 
81
  This model is intended for efficient FP8 inference of text embeddings on NVIDIA GPUs with FP8 support (Hopper architecture and above).