jdpruett commited on
Commit
1f1b361
·
verified ·
1 Parent(s): bee0c12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,3 +1,68 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: intfloat/e5-small-unsupervised
4
+ tags:
5
+ - transformers
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - information-retrieval
10
+ - knowledge-distillation
11
+ language:
12
+ - en
13
+ ---
14
+
15
+ # maniac/miniac-embed
16
+
17
+ Compact text embedding model for semantic search and retrieval. Built with **LEAF** knowledge distillation: E5-small-unsupervised backbone distilled from **mixedbread-ai/mxbai-embed-large-v1**. Outputs 1024-d vectors; use cosine similarity.
18
+
19
+ - **Backbone**: intfloat/e5-small-unsupervised (~33M params)
20
+ - **Teacher**: mixedbread-ai/mxbai-embed-large-v1
21
+ - **Method**: [LEAF](https://arxiv.org/abs/2509.12539) (teacher-aligned representations)
22
+
23
+ ## Quickstart
24
+
25
+ ```python
26
+ from sentence_transformers import SentenceTransformer
27
+
28
+ model = SentenceTransformer("maniac/miniac-embed")
29
+
30
+ queries = ["What is machine learning?"]
31
+ documents = ["Machine learning is a subset of AI that learns from data."]
32
+
33
+ # E5-style query prompt
34
+ query_embeddings = model.encode(
35
+ ["Represent this sentence for searching relevant passages: " + q for q in queries]
36
+ )
37
+ document_embeddings = model.encode(documents)
38
+
39
+ scores = model.similarity(query_embeddings, document_embeddings)
40
+ ```
41
+
42
+ Or with the model’s built-in prompt (if supported):
43
+
44
+ ```python
45
+ query_embeddings = model.encode(queries, prompt_name="query")
46
+ document_embeddings = model.encode(documents)
47
+ scores = model.similarity(query_embeddings, document_embeddings)
48
+ ```
49
+
50
+ ## Citation
51
+
52
+ ```bibtex
53
+ @misc{mdbr_leaf,
54
+ title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
55
+ author={Robin Vujanic and Thomas Rueckstiess},
56
+ year={2025},
57
+ eprint={2509.12539},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.IR},
60
+ url={https://arxiv.org/abs/2509.12539}
61
+ }
62
+ ```
63
+
64
+ E5 backbone: Wang et al., [Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/abs/2212.03533), 2022.
65
+
66
+ ## License
67
+
68
+ Apache 2.0.