File size: 841 Bytes
1d34894
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# STEM Embedding Model

🧬 Embedding model optimized for STEM content (Math, Physics, CS, Biology).

## Performance
- **Separation Score**: 0.6767 (Excellent!)
- **Accuracy**: 97.18%
- **Training**: 75k+ STEM chunks from Wikipedia + Semantic Scholar

## Usage
```python
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok")
tokenizer = AutoTokenizer.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok")

# Encode text
inputs = tokenizer("Neural networks use backpropagation", return_tensors="pt", truncation=True, padding=True)
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
```

## Training Details
- Base: sentence-transformers/all-MiniLM-L6-v2
- Method: Contrastive learning with triplet loss
- Specialized for scientific and technical content