File size: 851 Bytes
7b278b0
 
 
 
 
 
 
b783a99
7b278b0
 
a96dcd0
2a17ccb
 
b783a99
 
 
2a17ccb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
title: README
emoji: 📈
colorFrom: pink
colorTo: pink
sdk: static
pinned: false
license: apache-2.0
---

Welcome to the LCO-Embedding project - Scaling Language-centric Omnimodal Representation Learning.

### Highlights:
- We introduce **LCO-Embedding**, a language-centric omnimodal representation learning method and the LCO-Embedding model families, setting a new state-of-the-art on MIEB (Massive Image Embedding Benchmark) while supporting audio and videos.
- We introduce the **Generation-Representation Scaling Law**, and connect models' generative capabilities and their representation upper bound.
- We introduce **SeaDoc**, a challenging visual document retrieval task in Southeast Asian languages, and show that continual generative pretraining before contrastive learning raises the representation upper bound.

<!-- * Code: []() -->