Simeon Emanuilov

s-emanuilov

·

https://unfoldai.com/

AI & ML interests

Software Engineer, PhD | Building production ML/DL systems and AI tools

Recent Activity

updated a dataset about 1 month ago

s-emanuilov/mosaic-1m

published a dataset about 1 month ago

s-emanuilov/mosaic-1m

upvoted an article 4 months ago

ATE-2: State-of-the-Art Armenian Text Embeddings and the ArmBench-TextEmbed Benchmark

View all activity

Organizations

upvoted an article 4 months ago

Article

ATE-2: State-of-the-Art Armenian Text Embeddings and the ArmBench-TextEmbed Benchmark

Metric-AI

•

Mar 19

• 9

upvoted an article 9 months ago

Article

SOTA OCR with Core ML and dots.ocr

FL33TW00D-HF, pcuenq

•

Oct 2, 2025

• 64

upvoted a collection 9 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated May 12 • 58

upvoted an article 10 months ago

Article

mmBERT: ModernBERT goes Multilingual

+4

mmarone, orionweller, will-fleshman, eugene-yang, dlawrie, vandurme

•

Sep 9, 2025

• 148

upvoted 2 papers 10 months ago

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 213

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 49

upvoted a collection 10 months ago

EmbeddingGemma

3 items • Updated Mar 12 • 121

upvoted an article 10 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

+4

tomaarsen, Xenova, alvarobartt, ariG23498, pcuenq, sergiopaniego

•

Sep 4, 2025

• 275

upvoted a collection 12 months ago

Health AI Developer Foundations (HAI-DEF)

Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated Mar 12 • 228

upvoted a collection about 1 year ago

Tucan

A series of open-source Bulgarian language models fine-tuned for function calling and tool use. 2.6B, 9B, and 27B parameter variants. • 12 items • Updated Jul 1, 2025 • 1

upvoted an article about 1 year ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

tomaarsen

•

Jan 15, 2025

• 233

upvoted a paper about 1 year ago

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published Mar 25, 2025 • 15

upvoted 2 articles over 1 year ago

Article

Training and Finetuning Embedding Models with Sentence Transformers

tomaarsen

•

May 28, 2024

• 275

Article

Training and Finetuning Reranker Models with Sentence Transformers

tomaarsen

•

Mar 26, 2025

• 195

upvoted a paper over 1 year ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 164

upvoted 3 articles over 1 year ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

+1

merve, ariG23498, andsteing

•

Feb 19, 2025

• 74

Article

SigLIP 2: A better multilingual vision language encoder

+1

ariG23498, merve, qubvel-hf

•

Feb 21, 2025

• 220

Article

Merge Large Language Models with mergekit

mlabonne

•

Jan 9, 2024

• 156

upvoted 2 papers over 1 year ago

Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6, 2025 • 50

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 148