nielsr HF Staff

Update model card: add paper link, license, and update metadata

f4704ec verified 25 days ago

3.3 kB

	---
	base_model: sentence-transformers/all-mpnet-base-v2
	library_name: sentence-transformers
	pipeline_tag: text-retrieval
	license: apache-2.0
	tags:
	- sentence-transformers
	- text-retrieval
	- feature-extraction
	- work-domain
	- skill-extraction
	---

	# ConTeXT-Skill-Extraction-base

	This is a [sentence-transformers](https://www.SBERT.net) model based on the `all-mpnet-base-v2` architecture. It is designed for work-domain AI tasks, specifically skill extraction and normalization, as part of the WorkRB (Work Research Benchmark) framework.

	The model is presented in the paper [WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain](https://huggingface.co/papers/2604.13055).

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	- License: Apache 2.0

	### Model Sources

	- Paper: [WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain](https://huggingface.co/papers/2604.13055)
	- Repository: [WorkRB on GitHub](https://github.com/techwolf-ai/WorkRB)
	- Documentation: [Sentence Transformers Documentation](https://sbert.net)

	## Usage

	### Direct Usage (Sentence Transformers)

	First, install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("jensjorisdecorte/ConTeXT-Skill-Extraction-base")

	# Run inference
	sentences = [
	'Proficient in Python programming and machine learning.',
	'Experienced in project management and agile methodologies.',
	'Knowledge of cloud computing and AWS infrastructure.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	## Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Training Details

	### Framework Versions
	- Python: 3.10.16
	- Sentence Transformers: 3.4.0
	- Transformers: 4.48.1
	- PyTorch: 2.5.1+cpu
	- Accelerate: 1.3.0
	- Datasets: 3.2.0
	- Tokenizers: 0.21.0

	## Citation

	If you find this model useful, please consider citing the following work:

	```bibtex
	@misc{delange2025unifiedworkembeddings,
	title={Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker},
	author={Matthias De Lange and Jens-Joris Decorte and Jeroen Van Hautte},
	year={2025},
	eprint={2511.07969},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2511.07969},
	}
	```