File size: 1,772 Bytes
b6568f5
 
 
 
 
860da9b
b6568f5
 
 
 
860da9b
 
b6568f5
860da9b
b6568f5
 
860da9b
 
 
 
 
 
b6568f5
860da9b
b6568f5
 
860da9b
 
b6568f5
860da9b
b6568f5
860da9b
b6568f5
 
 
 
860da9b
 
b6568f5
860da9b
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
library_name: transformers
tags: []
---

# Model Card for BERT-Wikt-base-adj


### Model Description

This model is an English language model based on BERT-base, fine-tuned using adjective examples from English Wiktionary via supervised contrastive learning.
The fine-tuning improves token-level semantic representations, particularly for tasks like Word-in-Context (WiC) and Word Sense Disambiguation (WSD).

Although trained on adjectives, the model shows enhanced representation quality across the lexicon.


- **Developed by:** Anna Mosolova, Marie Candito, Carlos Ramisch
- **Funded by:** [ANR Selexini](https://selexini.lis-lab.fr)
- **Model type:** BERT-based transformer (BERT-base)
- **Language:** English
- **License:** MIT
- **Finetuned from model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)

### Model Sources


- **Repository:** [https://github.com/anya-bel/contrastive_learning_transfer](https://github.com/anya-bel/contrastive_learning_transfer)
- **Paper:** [Raffinage des représentations des tokens dans les modèles de langue pré-entraînés avec l’apprentissage contrastif : une étude entre modèles et entre langues](https://coria-taln-2025.lis-lab.fr/wp-content/uploads/2025/06/CORIA-TALN_2025_paper_139.pdf)

## Uses

The model is intended for extracting token-level embeddings for English, with improved sense separation.


## How to Get Started with the Model

```
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = AutoModel.from_pretrained("annamos/BERT-Wikt-base-adj")
sentence = 'You should knock before you enter'
tokenized = tokenizer(sentence, return_tensors='pt')
embeddings = model(**tokenized)[0]
```