# STEM Embedding Model 🧬 Embedding model optimized for STEM content (Math, Physics, CS, Biology). ## Performance - **Separation Score**: 0.6767 (Excellent!) - **Accuracy**: 97.18% - **Training**: 75k+ STEM chunks from Wikipedia + Semantic Scholar ## Usage ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok") tokenizer = AutoTokenizer.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok") # Encode text inputs = tokenizer("Neural networks use backpropagation", return_tensors="pt", truncation=True, padding=True) embeddings = model(**inputs).last_hidden_state.mean(dim=1) ``` ## Training Details - Base: sentence-transformers/all-MiniLM-L6-v2 - Method: Contrastive learning with triplet loss - Specialized for scientific and technical content