Instructions to use henreads/sutd-bge-large-ft67 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use henreads/sutd-bge-large-ft67 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("henreads/sutd-bge-large-ft67") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
sutd-bge-large-ft67
Finetuned BAAI/bge-large-en-v1.5 for job-description-to-module retrieval in the SUTD Course Recommendation Chatbot (MLOps Group 9).
Given a job description, this model retrieves relevant SUTD elective modules. It is used as the dense retrieval backbone in the RAG and Hybrid pipelines.
Model Details
| Property | Value |
|---|---|
| Base model | BAAI/bge-large-en-v1.5 |
| Embedding dimension | 1024 |
| Max sequence length | 512 |
| Similarity function | Cosine |
| Loss | MultipleNegativesRankingLoss |
Training Data
Trained on 67 hand-annotated (job description, relevant SUTD module) pairs spanning four pillars: ASD, EPD, ESD, ISTD/CSD. Each job description was matched to one or more relevant modules. After train/validation splitting and hard-negative expansion by the Sentence Transformers trainer, this produces 601 training samples and 66 validation samples.
A version trained on the augmented 98-pair dataset is available at henreads/sutd-bge-large-aug98.
Training Setup
- Hardware: Modal A10G (24 GB VRAM)
- Epochs: up to 10 with early stopping (patience 4); converged at epoch 3
- Effective batch size: 16 (per-device batch 4, gradient accumulation 4)
- Learning rate: 2e-5
- Tracking: Weights & Biases (
sutd-mlops-bge-finetune)
Evaluation
Evaluated on a 10-job held-out retrieval set (completely separate from training). NDCG@10 improves from 0.679 (base BGE-large) to 0.747 with ft67 finetuning.
| Model | NDCG@10 |
|---|---|
| BGE-large-en-v1.5 (base) | 0.679 |
| sutd-bge-large-ft67 (this model) | 0.747 |
| sutd-bge-large-aug98 | 0.770 |
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("henreads/sutd-bge-large-ft67")
job_description = "Data Scientist at GovTech. Build ML models with Python..."
module_passage = "50.007 Machine Learning โ Topics: supervised learning, neural networks..."
embeddings = model.encode([job_description, module_passage], normalize_embeddings=True)
similarity = embeddings[0] @ embeddings[1]
print(similarity)
Project
Part of the SUTD Course Recommendation Chatbot โ MLOps Group 9.
Code: github.com/henreads/sutd-mlops-group9
- Downloads last month
- 32
Model tree for henreads/sutd-bge-large-ft67
Base model
BAAI/bge-large-en-v1.5