ManiacLabs
/

miniac-embed

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

jdpruett commited on 17 days ago

Commit

1f1b361

·

verified ·

1 Parent(s): bee0c12

Update README.md

Files changed (1) hide show

README.md +68 -3

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model: intfloat/e5-small-unsupervised
+tags:
+  - transformers
+  - sentence-transformers
+  - sentence-similarity
+  - feature-extraction
+  - information-retrieval
+  - knowledge-distillation
+language:
+  - en
+---
+# maniac/miniac-embed
+Compact text embedding model for semantic search and retrieval. Built with **LEAF** knowledge distillation: E5-small-unsupervised backbone distilled from **mixedbread-ai/mxbai-embed-large-v1**. Outputs 1024-d vectors; use cosine similarity.
+- **Backbone**: intfloat/e5-small-unsupervised (~33M params)
+- **Teacher**: mixedbread-ai/mxbai-embed-large-v1
+- **Method**: [LEAF](https://arxiv.org/abs/2509.12539) (teacher-aligned representations)
+## Quickstart
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("maniac/miniac-embed")
+queries = ["What is machine learning?"]
+documents = ["Machine learning is a subset of AI that learns from data."]
+# E5-style query prompt
+query_embeddings = model.encode(
+    ["Represent this sentence for searching relevant passages: " + q for q in queries]
+)
+document_embeddings = model.encode(documents)
+scores = model.similarity(query_embeddings, document_embeddings)
+```
+Or with the model’s built-in prompt (if supported):
+```python
+query_embeddings = model.encode(queries, prompt_name="query")
+document_embeddings = model.encode(documents)
+scores = model.similarity(query_embeddings, document_embeddings)
+```
+## Citation
+```bibtex
+@misc{mdbr_leaf,
+  title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
+  author={Robin Vujanic and Thomas Rueckstiess},
+  year={2025},
+  eprint={2509.12539},
+  archivePrefix={arXiv},
+  primaryClass={cs.IR},
+  url={https://arxiv.org/abs/2509.12539}
+}
+```
+E5 backbone: Wang et al., [Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/abs/2212.03533), 2022.
+## License
+Apache 2.0.