fahmiaziz98 commited on
Commit
bc2efca
·
1 Parent(s): e137aba

deleted description, config.yaml

Browse files
Files changed (1) hide show
  1. config.yaml +6 -58
config.yaml CHANGED
@@ -2,81 +2,29 @@ models:
2
  qwen3-0.6b:
3
  name: "Qwen/Qwen3-Embedding-0.6B"
4
  type: "embeddings"
5
- dimension: 1024
6
- max_tokens: 32768
7
- description: |
8
- The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models.
9
- This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
10
- We recommend that developers customize the instruct according to their specific scenarios, tasks, and languages.
11
- Our tests have shown that in most retrieval scenarios, not using an instruct on the query side can lead to a drop in retrieval
12
- performance by approximately 1% to 5%.
13
- language: ["multilingual"]
14
  repository: "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
15
 
16
  gemma-300M:
17
  name: "google/embeddinggemma-300M"
18
  type: "embeddings"
19
- dimension: 768
20
- max_tokens: 2048
21
- description: |
22
- EmbeddingGemma can generate optimized embeddings for various use cases—such as document retrieval, question answering,
23
- and fact verification—or for specific input types—either a query or a document—using prompts that are prepended to the
24
- input strings. Query prompts follow the form task: {task description} | query: where the task description varies by the use case,
25
- with the default task description being search result. Document-style prompts follow the form title: {title | "none"} | text:
26
- where the title is either none (the default) or the actual title of the document. Note that providing a title, if available,
27
- will improve model performance for document prompts but may require manual formatting.
28
- language: ["multilingual"]
29
  repository: "https://huggingface.co/google/embeddinggemma-300m"
30
 
31
  multilingual-e5-small:
32
  name: "intfloat/multilingual-e5-small"
33
  type: "embeddings"
34
- dimension: 384
35
- max_tokens: 512
36
- description: |
37
- This model is initialized from microsoft/Multilingual-MiniLM-L12-H384 and continually trained on a mixture of multilingual datasets.
38
- It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation.
39
- Need instruction, please refer to huggingface repo.
40
- language: ["multilingual"]
41
  repository: "https://huggingface.co/intfloat/multilingual-e5-small"
42
 
43
- naver-splade-v3:
44
- name: "naver/splade-v3"
45
  type: "sparse-embeddings"
46
- dimension: 1234 # must add this field
47
- max_tokens: 1234
48
- description: |
49
- SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $),
50
- with that in mind we did very minor retrieval efficiency tweaks to make it more suitable for a industry setting.
51
- (Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency.
52
- Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise.
53
- Not that inference efficiency is not important, we will address that subsequently.)
54
- language: ["multilingual"]
55
- repository: "https://huggingface.co/naver/splade-v3"
56
 
57
  splade-pp-v2:
58
  name: "prithivida/Splade_PP_en_v2"
59
  type: "sparse-embeddings"
60
- dimension: 1234 # must add this field
61
- max_tokens: 1234
62
- description: |
63
- SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $),
64
- with that in mind we did very minor retrieval efficiency tweaks to make it more suitable for a industry setting.
65
- (Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency.
66
- Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise.
67
- Not that inference efficiency is not important, we will address that subsequently.)
68
- language: ["multilingual"]
69
  repository: "https://huggingface.co/prithivida/Splade_PP_en_v2"
70
 
71
- splade-pp-v1:
72
- name: "prithivida/Splade_PP_en_v1"
73
  type: "sparse-embeddings"
74
- dimension: 1234 # must add this field
75
- max_tokens: 1234
76
- description: |
77
- Granite-Embedding-30m-Sparse is a 30M parameter sparse biencoder embedding model from the Granite Experimental suite that can be used to generate high quality text embeddings.
78
- This model produces variable length bag-of-word like dictionary, containing expansions of sentence tokens and their corresponding weights and is trained using a combination of open source relevance-pair datasets with permissive,
79
- enterprise-friendly license, and IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases.
80
- This model is developed using retrieval oriented pretraining, contrastive finetuning and knowledge distillation for improved performance.
81
- language: ["En"]
82
- repository: "https://huggingface.co/prithivida/Splade_PP_en_v1"
 
2
  qwen3-0.6b:
3
  name: "Qwen/Qwen3-Embedding-0.6B"
4
  type: "embeddings"
 
 
 
 
 
 
 
 
 
5
  repository: "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
6
 
7
  gemma-300M:
8
  name: "google/embeddinggemma-300M"
9
  type: "embeddings"
 
 
 
 
 
 
 
 
 
 
10
  repository: "https://huggingface.co/google/embeddinggemma-300m"
11
 
12
  multilingual-e5-small:
13
  name: "intfloat/multilingual-e5-small"
14
  type: "embeddings"
 
 
 
 
 
 
 
15
  repository: "https://huggingface.co/intfloat/multilingual-e5-small"
16
 
17
+ splade-pp-v1:
18
+ name: "prithivida/Splade_PP_en_v1"
19
  type: "sparse-embeddings"
20
+ repository: "https://huggingface.co/prithivida/Splade_PP_en_v1"
 
 
 
 
 
 
 
 
 
21
 
22
  splade-pp-v2:
23
  name: "prithivida/Splade_PP_en_v2"
24
  type: "sparse-embeddings"
 
 
 
 
 
 
 
 
 
25
  repository: "https://huggingface.co/prithivida/Splade_PP_en_v2"
26
 
27
+ naver-splade-v3:
28
+ name: "naver/splade-v3"
29
  type: "sparse-embeddings"
30
+ repository: "https://huggingface.co/naver/splade-v3"