F2LLM-v2-0.6B โ FP16 ONNX
FP16-converted ONNX of codefuse-ai/F2LLM-v2-0.6B, a Qwen3-derived 1024-dim retrieval embedding model with 32k context and last-token pooling.
1.2 GB (50 % memory of FP32), retrieval-quality-equivalent to FP32 in our gates.
Quality
| Metric | Value | Threshold |
|---|---|---|
cos_min vs PyTorch FP32 reference (6-text multilingual probe) |
0.999999 | โฅ 0.99 |
cos_mean vs same |
1.000000 | โ |
Validated under fastembed-rs' cosine_parity harness on probe/ort-rc12 (ORT 1.24).
Files
| File | Size | Description |
|---|---|---|
model.fp16.onnx |
~5 MB | ONNX header (external data) |
model.fp16.onnx.data |
~1.2 GB | FP16 weights |
tokenizer.json, config.json, tokenizer_config.json, special_tokens_map.json |
small | tokenizer + model config |
Conversion
Streaming FP32โFP16 via convert_fp16_streaming.py (bypasses the 2 GB protobuf serialization limit).
Use via fastembed-rs
let embedder = TextEmbedding::try_new(
InitOptions::new(EmbeddingModel::F2LlmV2_0_6BFp16))?;
let vectors = embedder.embed(vec!["hello world"], None)?;
Pooling: last-token (auto-applied by fastembed-rs). Use the F2LLM instruct format prefix for queries (see the upstream F2LLM repo).
License
Apache 2.0, inherited from the base model.
- Downloads last month
- 50
Model tree for cstr/F2LLM-v2-0.6B-ONNX-FP16
Base model
Qwen/Qwen3-0.6B-Base Finetuned
Qwen/Qwen3-0.6B Finetuned
codefuse-ai/F2LLM-v2-0.6B-Preview Finetuned
codefuse-ai/F2LLM-v2-0.6B