oksomu/role-radar-dataset
Updated • 24
How to use oksomu/role-radar-scoring with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("oksomu/role-radar-scoring")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)Local-first two-stage scoring pipeline for job-candidate fit prediction. Trained on Indian job market data across 15 role families.
GitHub: somus/role-radar | Dataset: oksomu/role-radar-dataset
| Model | Size | Purpose | Latency |
|---|---|---|---|
skills_lgbm.onnx |
501 KB | Skills scoring (all jobs) | <1ms batch |
embedding_encoder.onnx |
87 MB | Cosine sim feature | Cached |
reranker.onnx |
87 MB | Semantic reranking (shortlist) | ~14ms/pair |
LightGBM scores all jobs (<1ms each)
→ Composite = skills×0.5 + location×0.25 + seniority×0.15 + domain×0.10
→ Shortlist: composite > 60
→ MiniLM reranker reranks shortlist (~14ms/pair)
→ Final ranked feed
| Metric | Score |
|---|---|
| NDCG | 0.996 |
| MRR | 0.887 |
| Composite MAE | 5.1 |
| Composite Pearson r | 0.837 |
| Reranker skills r | 0.828 |
| Pipeline | 25 jobs | 50 jobs |
|---|---|---|
| LightGBM only | 18ms | 32ms |
| Full + reranker | 109ms | 222ms |
All models use ONNX Runtime:
import onnxruntime as ort
# LightGBM
session = ort.InferenceSession("skills_lgbm.onnx")
score = session.run(None, {"features": feature_vector})[0]
# Reranker
session = ort.InferenceSession("reranker.onnx")
logits = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
skills_lgbm.onnx — LightGBM skills scorer
skills_calibration.json — isotonic regression (raw → 50-100 scale)
embedding_encoder.onnx — MiniLM encoder for cosine sim feature
embedding_encoder.onnx.data — encoder weights
embedding_tokenizer/ — tokenizer for encoder
reranker.onnx — MiniLM cross-encoder
reranker.onnx.data — reranker weights
reranker_tokenizer/ — tokenizer for reranker
config.json — feature names, composite weights, thresholds
| Score | Label |
|---|---|
| 90-100 | Perfect Match |
| 80-89 | Great Match |
| 70-79 | Good Match |
| 60-69 | Worth a Look |
| 45-59 | Stretch |
Base model
microsoft/MiniLM-L12-H384-uncased