File size: 2,094 Bytes
8b238ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# MeFEm: Medical Face Embedding Models

Vision Transformers pre-trained on face data for potential medical applications. Available in Small (MeFEm-S) and Base (MeFEm-B) sizes.

## Quick Start

```python

import torch

import timm



# Load model (MeFEm-S example)

model = timm.create_model(

    'vit_small_patch16_224',

    pretrained=False,

    num_classes=0,           # No classification head

    global_pool='token'      # Use CLS token (default)

)

model.load_state_dict(torch.load('mefem-s.pt'))

model.eval()



# Forward pass

x = torch.randn(1, 3, 224, 224)  # Your face image

embeddings = model(x)  # [1, 384] CLS token embeddings

```

## Model Details

- **Architecture**: ViT-Small/16 (384-dim) or ViT-Base/16 (768-dim) with CLS token
- **Training**: Modified I-JEPA on ~6.5M face images
- **Input**: Face crops with 2× expanded bounding boxes, 224×224 resolution
- **Output**: CLS token embeddings (`global_pool='token'`) or all tokens (`global_pool=''`)

## Usage Tips

```python

# For all tokens (CLS + patches):

model = timm.create_model('vit_small_patch16_224', num_classes=0, global_pool='')

tokens = model(x)  # [1, 197, 384]



# For patch embeddings only:

tokens = model.forward_features(x)

patch_embeddings = tokens[:, 1:]  # [1, 196, 384]

```

## Training Data

Face images from FaceCaption-15M, AVSpeech, and SHFQ datasets (~6.5M total). Images were cropped with expanded (2×) face bounding boxes.

## Notes

- Optimized for face images with loose cropping
- Intended for representation learning and transfer to medical tasks
- Results may vary for non-face or tightly-cropped images
- More info on training and metrics [here](https://arxiv.org/pdf/2602.14672)

## License

CC BY 4.0. Reference paper if used:
```

@misc{borets2026mefemmedicalfaceembedding,

      title={MeFEm: Medical Face Embedding model},

      author={Yury Borets and Stepan Botman},

      year={2026},

      eprint={2602.14672},

      archivePrefix={arXiv},

      primaryClass={cs.CV},

      url={https://arxiv.org/abs/2602.14672},

}

```