| --- |
| base_model: sentence-transformers/all-mpnet-base-v2 |
| library_name: sentence-transformers |
| pipeline_tag: text-retrieval |
| license: apache-2.0 |
| tags: |
| - sentence-transformers |
| - text-retrieval |
| - feature-extraction |
| - work-domain |
| - skill-extraction |
| --- |
| |
| # ConTeXT-Skill-Extraction-base |
|
|
| This is a [sentence-transformers](https://www.SBERT.net) model based on the `all-mpnet-base-v2` architecture. It is designed for work-domain AI tasks, specifically skill extraction and normalization, as part of the **WorkRB** (Work Research Benchmark) framework. |
|
|
| The model is presented in the paper [WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain](https://huggingface.co/papers/2604.13055). |
|
|
| ## Model Details |
|
|
| ### Model Description |
| - **Model Type:** Sentence Transformer |
| - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) |
| - **Maximum Sequence Length:** 512 tokens |
| - **Output Dimensionality:** 768 dimensions |
| - **Similarity Function:** Cosine Similarity |
| - **License:** Apache 2.0 |
|
|
| ### Model Sources |
|
|
| - **Paper:** [WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain](https://huggingface.co/papers/2604.13055) |
| - **Repository:** [WorkRB on GitHub](https://github.com/techwolf-ai/WorkRB) |
| - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
|
| ## Usage |
|
|
| ### Direct Usage (Sentence Transformers) |
|
|
| First, install the Sentence Transformers library: |
|
|
| ```bash |
| pip install -U sentence-transformers |
| ``` |
|
|
| Then you can load this model and run inference. |
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| # Download from the 🤗 Hub |
| model = SentenceTransformer("jensjorisdecorte/ConTeXT-Skill-Extraction-base") |
| |
| # Run inference |
| sentences = [ |
| 'Proficient in Python programming and machine learning.', |
| 'Experienced in project management and agile methodologies.', |
| 'Knowledge of cloud computing and AWS infrastructure.', |
| ] |
| embeddings = model.encode(sentences) |
| print(embeddings.shape) |
| # [3, 768] |
| |
| # Get the similarity scores for the embeddings |
| similarities = model.similarity(embeddings, embeddings) |
| print(similarities.shape) |
| # [3, 3] |
| ``` |
|
|
| ## Full Model Architecture |
|
|
| ``` |
| SentenceTransformer( |
| (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel |
| (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| ) |
| ``` |
|
|
| ## Training Details |
|
|
| ### Framework Versions |
| - Python: 3.10.16 |
| - Sentence Transformers: 3.4.0 |
| - Transformers: 4.48.1 |
| - PyTorch: 2.5.1+cpu |
| - Accelerate: 1.3.0 |
| - Datasets: 3.2.0 |
| - Tokenizers: 0.21.0 |
|
|
| ## Citation |
|
|
| If you find this model useful, please consider citing the following work: |
|
|
| ```bibtex |
| @misc{delange2025unifiedworkembeddings, |
| title={Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker}, |
| author={Matthias De Lange and Jens-Joris Decorte and Jeroen Van Hautte}, |
| year={2025}, |
| eprint={2511.07969}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2511.07969}, |
| } |
| ``` |