| # STEM Embedding Model | |
| 🧬 Embedding model optimized for STEM content (Math, Physics, CS, Biology). | |
| ## Performance | |
| - **Separation Score**: 0.6767 (Excellent!) | |
| - **Accuracy**: 97.18% | |
| - **Training**: 75k+ STEM chunks from Wikipedia + Semantic Scholar | |
| ## Usage | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| model = AutoModel.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok") | |
| tokenizer = AutoTokenizer.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok") | |
| # Encode text | |
| inputs = tokenizer("Neural networks use backpropagation", return_tensors="pt", truncation=True, padding=True) | |
| embeddings = model(**inputs).last_hidden_state.mean(dim=1) | |
| ``` | |
| ## Training Details | |
| - Base: sentence-transformers/all-MiniLM-L6-v2 | |
| - Method: Contrastive learning with triplet loss | |
| - Specialized for scientific and technical content | |