Parveshiiii commited on
Commit
ed84c41
·
verified ·
1 Parent(s): d00c4c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -13,7 +13,7 @@ task_categories:
13
  - retrieval
14
  - clustering
15
  tags:
16
- - XenArcAI
17
  - SparkEmbedding
18
  - Embedding
19
  - embedding
@@ -27,7 +27,7 @@ annotations_creators:
27
  - machine-generated
28
  - expert-verified
29
  source_datasets:
30
- - XenArcAI internal synthetic generation
31
  multilinguality:
32
  - multilingual
33
  ---
@@ -46,7 +46,7 @@ multilinguality:
46
 
47
 
48
  ### Description
49
- SparkEmbedding-300m is a 300 million parameter multilingual text embedding model with **SoTA cross‑lingual retrieval** developed by the XenArcAI team. Fine-tuned from Google's EmbeddingGemma-300m, it incorporates an additional 1 million curated samples across 119(all 22 Indian languages included) languages, emphasizing data complexity, linguistic diversity, and deep language understanding. This optimization enhances cross-lingual retrieval, producing embeddings with superior semantic alignment and efficacy in multilingual settings.
50
 
51
  The model generates high-dimensional vector representations capturing rich semantic and contextual information, excelling in bridging linguistic gaps for applications like global information retrieval, multilingual question answering, and cross-language semantic search. With a native 2048-token context window, it handles extended inputs (e.g., full articles or documents) while preserving long-range dependencies.
52
 
@@ -86,7 +86,7 @@ from sentence_transformers import SentenceTransformer
86
  import torch
87
  import numpy as np
88
 
89
- model = SentenceTransformer("XenArcAI/SparkEmbedding-300m", device='cuda' if torch.cuda.is_available() else 'cpu')
90
 
91
  query = "How does artificial intelligence impact global economies?" # English
92
  corpus = [
@@ -193,12 +193,12 @@ Qualitative: Tight t-SNE clustering for parallels; excels in complex/mixed-langu
193
 
194
  ### Citation
195
  ```bibtex
196
- @misc{xenarcai_sparkembedding_2025,
197
  title={SparkEmbedding-300m: A Fine-Tuned Multilingual Embedding Model for Cross-Lingual Retrieval},
198
  author= {Parvesh Rawal}},
199
  publisher={Hugging Face},
200
  year={2025},
201
- url={https://huggingface.co/XenArcAI/SparkEmbedding-300m}
202
  }
203
  ```
204
 
 
13
  - retrieval
14
  - clustering
15
  tags:
16
+ - Modotte
17
  - SparkEmbedding
18
  - Embedding
19
  - embedding
 
27
  - machine-generated
28
  - expert-verified
29
  source_datasets:
30
+ - Modotte internal synthetic generation
31
  multilinguality:
32
  - multilingual
33
  ---
 
46
 
47
 
48
  ### Description
49
+ SparkEmbedding-300m is a 300 million parameter multilingual text embedding model with **SoTA cross‑lingual retrieval** developed by the Modotte team. Fine-tuned from Google's EmbeddingGemma-300m, it incorporates an additional 1 million curated samples across 119(all 22 Indian languages included) languages, emphasizing data complexity, linguistic diversity, and deep language understanding. This optimization enhances cross-lingual retrieval, producing embeddings with superior semantic alignment and efficacy in multilingual settings.
50
 
51
  The model generates high-dimensional vector representations capturing rich semantic and contextual information, excelling in bridging linguistic gaps for applications like global information retrieval, multilingual question answering, and cross-language semantic search. With a native 2048-token context window, it handles extended inputs (e.g., full articles or documents) while preserving long-range dependencies.
52
 
 
86
  import torch
87
  import numpy as np
88
 
89
+ model = SentenceTransformer("Modotte/SparkEmbedding-300m", device='cuda' if torch.cuda.is_available() else 'cpu')
90
 
91
  query = "How does artificial intelligence impact global economies?" # English
92
  corpus = [
 
193
 
194
  ### Citation
195
  ```bibtex
196
+ @misc{Modotte_sparkembedding_2025,
197
  title={SparkEmbedding-300m: A Fine-Tuned Multilingual Embedding Model for Cross-Lingual Retrieval},
198
  author= {Parvesh Rawal}},
199
  publisher={Hugging Face},
200
  year={2025},
201
+ url={https://huggingface.co/Modotte/SparkEmbedding-300m}
202
  }
203
  ```
204