zackli4ai commited on
Commit
2fca6d9
·
verified ·
1 Parent(s): d96b6f4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EmbeddingGemma-300M (NPU)
2
+
3
+
4
+ ## Model Description
5
+ **EmbeddingGemma** is a 300M-parameter open embedding model developed by **Google DeepMind**.
6
+ It is built from **Gemma 3** (with T5Gemma initialization) and the same research and technology used in **Gemini models**.
7
+
8
+ The model produces **vector representations of text**, making it well-suited for **search, retrieval, classification, clustering, and semantic similarity tasks**.
9
+ It was trained on **100+ languages** with ~320B tokens, optimized for **on-device efficiency** (mobile, laptops, desktops).
10
+
11
+
12
+ ## Features
13
+ - **Compact and efficient**: 300M parameters, optimized for on-device use.
14
+ - **Multilingual**: trained on 100+ spoken languages.
15
+ - **Flexible embeddings**: default dimension **768**, with support for **512, 256, 128** via Matryoshka Representation Learning (MRL).
16
+ - **Wide task coverage**: retrieval, QA, fact-checking, classification, clustering, similarity.
17
+ - **Commercial-friendly**: open weights available for research and production.
18
+
19
+
20
+ ## Use Cases
21
+ - Semantic similarity and recommendation systems
22
+ - Document, code, and web search
23
+ - Clustering for organization, research, and anomaly detection
24
+ - Classification (e.g., sentiment, spam detection)
25
+ - Fact verification and QA embeddings
26
+ - Code retrieval for programming assistance
27
+
28
+
29
+ ## Inputs and Outputs
30
+ **Input**:
31
+ - **Type**: Text string (e.g., query, prompt, document)
32
+ - **Max Length**: 2048 tokens
33
+
34
+ **Output**:
35
+ - **Type**: Embedding vector (default 768d)
36
+ - **Options**: 512 / 256 / 128 dimensions via truncation & re-normalization (MRL)
37
+
38
+
39
+ ## Limitations & Responsible Use
40
+ This model has known limitations:
41
+ - **Bias & coverage**: quality depends on training data diversity.
42
+ - **Nuance & ambiguity**: may struggle with sarcasm, figurative language.
43
+ - **Ethical concerns**: risk of bias perpetuation, privacy leakage, or malicious misuse.
44
+
45
+ Mitigations:
46
+ - CSAM and sensitive data filtering applied.
47
+ - Users should adhere to **Gemma Responsible AI guidelines** and **Prohibited Use Policy**.
48
+
49
+
50
+ ## License
51
+ - Licensed under Google’s **Gemma Terms of Use**.
52
+ - See: [Gemma Terms](https://ai.google.dev/gemma/terms)
53
+
54
+ Ensure your usage complies with upstream license conditions.
55
+
56
+
57
+ ## References
58
+ - [nexaSDK](https://sdk.nexa.ai)
59
+
60
+
61
+ ## Support
62
+ For SDK-related issues, visit [sdk.nexa.ai](https://sdk.nexa.ai).
63
+ For model-specific questions, open an issue in this repository.