Parveshiiii commited on
Commit
88c2fbf
·
verified ·
1 Parent(s): 8c383e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -19
README.md CHANGED
@@ -72,25 +72,6 @@ Built on EmbeddingGemma-300m's decoder-only transformer (inspired by Gemma and T
72
 
73
  No architectural changes during fine-tuning; focuses on embedding head and optimization for cross-lingual gains. Compatible with Hugging Face Transformers.
74
 
75
- ### Intended Use Cases
76
- - Cross-lingual semantic search (e-commerce, news, academic databases).
77
- - Retrieval-augmented generation (RAG) for diverse queries.
78
- - Multilingual clustering/topic modeling (social media, content moderation).
79
- - On-device personalization (translation apps, virtual assistants).
80
-
81
- Leverage MRL for scalability and task-specific prompting for extended utility.
82
-
83
- ### Citation
84
- ```bibtex
85
- @misc{xenarcai_sparkembedding_2025,
86
- title={SparkEmbedding-300m: A Fine-Tuned Multilingual Embedding Model for Cross-Lingual Retrieval},
87
- author={XenArcAI Team},
88
- publisher={Hugging Face},
89
- year={2025},
90
- url={https://huggingface.co/XenArcAI/SparkEmbedding-300m}
91
- }
92
- ```
93
-
94
  ## Usage
95
 
96
  ### Installation and Setup
@@ -125,6 +106,14 @@ print(f"Similarity scores: {similarities[top_indices]}")
125
  ```
126
  Yields high scores (0.75-0.90) for relevant cross-lingual matches.
127
 
 
 
 
 
 
 
 
 
128
  ### Advanced Configurations
129
  - **Batch Processing:** Up to batch_size=128; use show_progress_bar=True.
130
  - **Precision:** fp32 default; torch.bfloat16 for memory savings (avoid fp16 for multilingual stability).
@@ -201,5 +190,17 @@ Qualitative: Tight t-SNE clustering for parallels; excels in complex/mixed-langu
201
  - Responsible Use: Avoid unmonitored high-risk apps; report issues.
202
  - Transparency: Dataset cards/audits available on request.
203
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  ## Credits and Acknowledgments
205
  Built on Google's EmbeddingGemma-300m ([arXiv:2509.20354](https://arxiv.org/abs/2509.20354)). Thanks to BibleText project, Hugging Face Transformers/Sentence Transformers, and ML community. Open to collaborations.
 
72
 
73
  No architectural changes during fine-tuning; focuses on embedding head and optimization for cross-lingual gains. Compatible with Hugging Face Transformers.
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ## Usage
76
 
77
  ### Installation and Setup
 
106
  ```
107
  Yields high scores (0.75-0.90) for relevant cross-lingual matches.
108
 
109
+ ### Intended Use Cases
110
+ - Cross-lingual semantic search (e-commerce, news, academic databases).
111
+ - Retrieval-augmented generation (RAG) for diverse queries.
112
+ - Multilingual clustering/topic modeling (social media, content moderation).
113
+ - On-device personalization (translation apps, virtual assistants).
114
+
115
+ Leverage MRL for scalability and task-specific prompting for extended utility.
116
+
117
  ### Advanced Configurations
118
  - **Batch Processing:** Up to batch_size=128; use show_progress_bar=True.
119
  - **Precision:** fp32 default; torch.bfloat16 for memory savings (avoid fp16 for multilingual stability).
 
190
  - Responsible Use: Avoid unmonitored high-risk apps; report issues.
191
  - Transparency: Dataset cards/audits available on request.
192
 
193
+
194
+ ### Citation
195
+ ```bibtex
196
+ @misc{xenarcai_sparkembedding_2025,
197
+ title={SparkEmbedding-300m: A Fine-Tuned Multilingual Embedding Model for Cross-Lingual Retrieval},
198
+ author={XenArcAI Team},
199
+ publisher={Hugging Face},
200
+ year={2025},
201
+ url={https://huggingface.co/XenArcAI/SparkEmbedding-300m}
202
+ }
203
+ ```
204
+
205
  ## Credits and Acknowledgments
206
  Built on Google's EmbeddingGemma-300m ([arXiv:2509.20354](https://arxiv.org/abs/2509.20354)). Thanks to BibleText project, Hugging Face Transformers/Sentence Transformers, and ML community. Open to collaborations.