Update README.md
Browse files
README.md
CHANGED
|
@@ -51,12 +51,27 @@ pipeline_tag: feature-extraction
|
|
| 51 |
library_name: transformers
|
| 52 |
tags:
|
| 53 |
- sentence-transformers
|
|
|
|
|
|
|
| 54 |
---
|
| 55 |
|
| 56 |
# F2LLM-v2-8B-Preview
|
| 57 |
|
| 58 |
**F2LLM-v2-8B-Preview** is a multilingual embedding model trained from Qwen3-8B on a corpus of **27 million samples**, spanning **over 100 natural and programming languages**. It is a "preview" version trained without instructions and intended to serve as a foundation for downstream embedding tasks and further fine-tuning.
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
## Usage
|
| 61 |
|
| 62 |
### With Sentence Transformers
|
|
@@ -138,12 +153,3 @@ print(similarity)
|
|
| 138 |
## Intermediate Checkpoints
|
| 139 |
|
| 140 |
To facilitate future research, we release intermediate checkpoints in the `intermediate_checkpoints` branch.
|
| 141 |
-
|
| 142 |
-
## Future Releases
|
| 143 |
-
|
| 144 |
-
We are committed to the open-source community and will soon release:
|
| 145 |
-
|
| 146 |
-
- **The Finetuned Version:** Optimized for downstream tasks, with state-of-the-art performance on MTEB.
|
| 147 |
-
- **The Training Data:** We will be releasing the data used to train F2LLM-v2 to help advance the field of multilingual embeddings.
|
| 148 |
-
|
| 149 |
-
Stay tuned for more updates!
|
|
|
|
| 51 |
library_name: transformers
|
| 52 |
tags:
|
| 53 |
- sentence-transformers
|
| 54 |
+
datasets:
|
| 55 |
+
- codefuse-ai/F2LLM-v2
|
| 56 |
---
|
| 57 |
|
| 58 |
# F2LLM-v2-8B-Preview
|
| 59 |
|
| 60 |
**F2LLM-v2-8B-Preview** is a multilingual embedding model trained from Qwen3-8B on a corpus of **27 million samples**, spanning **over 100 natural and programming languages**. It is a "preview" version trained without instructions and intended to serve as a foundation for downstream embedding tasks and further fine-tuning.
|
| 61 |
|
| 62 |
+
F2LLM-v2 is fully open. We release base models in 5 sizes, instruct models in 8 sizes, the training data, the training code, and intermediate checkpoints. The three smallest instruct models are pruned and trained from the 0.6B base model.
|
| 63 |
+
|
| 64 |
+
| Model | Base | Instruct |
|
| 65 |
+
| ----- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
|
| 66 |
+
| 80M | | [🤗F2LLM-v2-80M](https://huggingface.co/codefuse-ai/F2LLM-v2-80M) |
|
| 67 |
+
| 160M | | [🤗F2LLM-v2-160M](https://huggingface.co/codefuse-ai/F2LLM-v2-160M) |
|
| 68 |
+
| 330M | | [🤗F2LLM-v2-330M](https://huggingface.co/codefuse-ai/F2LLM-v2-330M) |
|
| 69 |
+
| 0.6B | [🤗F2LLM-v2-0.6B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B-Preview) | [🤗F2LLM-v2-0.6B](https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B) |
|
| 70 |
+
| 1.7B | [🤗F2LLM-v2-1.7B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B-Preview) | [🤗F2LLM-v2-1.7B](https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B) |
|
| 71 |
+
| 4B | [🤗F2LLM-v2-4B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-4B-Preview) | [🤗F2LLM-v2-4B](https://huggingface.co/codefuse-ai/F2LLM-v2-4B) |
|
| 72 |
+
| 8B | [🤗F2LLM-v2-8B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-8B-Preview) | [🤗F2LLM-v2-8B](https://huggingface.co/codefuse-ai/F2LLM-v2-8B) |
|
| 73 |
+
| 14B | [🤗F2LLM-v2-14B-Preview](https://huggingface.co/codefuse-ai/F2LLM-v2-14B-Preview) | [🤗F2LLM-v2-14B](https://huggingface.co/codefuse-ai/F2LLM-v2-14B) |
|
| 74 |
+
|
| 75 |
## Usage
|
| 76 |
|
| 77 |
### With Sentence Transformers
|
|
|
|
| 153 |
## Intermediate Checkpoints
|
| 154 |
|
| 155 |
To facilitate future research, we release intermediate checkpoints in the `intermediate_checkpoints` branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|