Instructions to use fermacsys/RockyEmbed_Marco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fermacsys/RockyEmbed_Marco with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="fermacsys/RockyEmbed_Marco", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("fermacsys/RockyEmbed_Marco", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| datasets: | |
| - microsoft/ms_marco | |
| license: mit | |
| base_model: | |
| - pranavupadhyaya52/rocky-embed | |
| # RockyEmbed-Marco | |
| RockyEmbed-Marco is a lightweight, high-performance text embedding model built by contrastively fine-tuning RockyEmbed on the MS MARCO dataset. It is designed for efficient real-world retrieval tasks such as semantic search, RAG (Retrieval-Augmented Generation), and question answering systems. | |
| --- | |
| ### Overview | |
| Modern embedding models often rely on large-scale architectures with billions of parameters. RockyEmbed-Marco takes a different approach: | |
| Compact (~90M parameters) | |
| Efficient (CPU-friendly inference) | |
| Task-optimized (fine-tuned for retrieval) | |
| Production-ready (designed for real-world RAG systems) | |
| This model builds on RockyEmbed, which is pre-trained using distillation and optimized training strategies, and enhances it through contrastive learning on MS MARCO to improve retrieval quality. | |
| --- | |
| ### Model Architecture | |
| Base Model: RockyEmbed | |
| Parameters: ~90M | |
| Embedding Dimension: (add your dimension here, e.g., 768 or 1024) | |
| Training Strategy: | |
| Stage 1: Distillation-based pretraining | |
| Stage 2: Contrastive fine-tuning (MS MARCO) | |
| --- | |
| ### Training Details | |
| Dataset | |
| MS MARCO Passage Ranking Dataset | |
| Large-scale dataset for training retrieval systems | |
| Contains real-world queries and relevant passages | |
| Objective Function | |
| InfoNCE (Contrastive Loss) | |
| The model learns to: | |
| Pull semantically similar query-passage pairs closer | |
| Push irrelevant pairs apart in embedding space | |
| --- | |
| ### Evaluation | |
| RockyEmbed-Marco is evaluated on both benchmark datasets and real-world RAG scenarios: | |
| MTEB Quora Subset (Massive Text Embedding Benchmark)(Quora) | |
| Main Score: 0.64 | |
| RAGAS Evaluation Quora subset (RAG-specific metrics) (Quora) | |
| Context Precision: 0.0583 | |
| Answer Correctness: 0.4717 | |
| These evaluations demonstrate: | |
| Strong retrieval capability relative to model size | |
| Practical effectiveness in downstream RAG pipelines | |
| --- | |
| ### Usage | |
| Installation | |
| pip install torch transformers sentence-transformers | |
| --- | |
| Loading the Model | |
| from sentence_transformers import SentenceTransformer | |
| model = SentenceTransformer("your-username/rockyembed-marco") | |
| embeddings = model.encode([ | |
| "What is contrastive learning?", | |
| "Explain retrieval augmented generation" | |
| ]) | |
| --- | |
| Example: Semantic Search | |
| from sentence_transformers import util | |
| query = "What is RAG?" | |
| documents = [ | |
| "RAG stands for Retrieval-Augmented Generation.", | |
| "Transformers are deep learning models.", | |
| "Contrastive learning improves embeddings." | |
| ] | |
| query_emb = model.encode(query, convert_to_tensor=True) | |
| doc_emb = model.encode(documents, convert_to_tensor=True) | |
| scores = util.cos_sim(query_emb, doc_emb) | |
| print(scores) | |
| --- | |
| ### Use Cases | |
| 🔍 Semantic Search | |
| 📚 Document Retrieval | |
| 🤖 Retrieval-Augmented Generation (RAG) | |
| 💬 Question Answering Systems | |
| 🧠 Embedding-based Clustering | |
| --- | |
| ### Design Philosophy | |
| RockyEmbed-Marco is built with the following principles: | |
| Efficiency over scale → smaller models, competitive performance | |
| Practicality → optimized for real-world pipelines | |
| Stability → improved training techniques to avoid gradient issues | |
| Accessibility → usable on limited hardware (CPU-friendly) | |
| --- | |
| ### Key Insights | |
| Contrastive fine-tuning significantly improves retrieval quality | |
| Smaller models can compete with larger ones when trained effectively | |
| Evaluation on RAG tasks is essential—not just benchmarks | |
| --- | |
| ### Future Work | |
| Multi-domain fine-tuning | |
| Hard-negative mining improvements | |
| Multilingual support | |
| Integration with lightweight LLM pipelines | |
| --- | |
| ### Contributing | |
| Contributions, ideas, and improvements are welcome. Feel free to open issues or submit pull requests. | |
| --- | |
| ### License | |
| (Add your license here, e.g., MIT / Apache 2.0) | |
| --- | |
| ### Contact | |
| Pranav Upadhyaya | |
| 📧 pranavupadhyaya52@gmail.com | |
| 🔗 ORCID: https://orcid.org/0009-0008-8887-4349 | |
| --- |