dhruva-sarma (Dhruvajyoti Sarma)

upvoted an article 5 months ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

+2

burtenshaw, evalstate, merve, pcuenq

•

Jan 28

• 158

upvoted an article 9 months ago

Article

Optimizing your LLM in production

patrickvonplaten

•

Sep 15, 2023

• 23

upvoted a paper 10 months ago

Matryoshka Representation Learning

Paper • 2205.13147 • Published May 26, 2022 • 27

upvoted 2 articles 10 months ago

Article

🪆 Introduction to Matryoshka Embedding Models

+1

tomaarsen, Xenova, osanseviero

•

Feb 23, 2024

• 211

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

baidu

•

Sep 10, 2025

• 112

upvoted a collection 12 months ago

Google's Gemma models family

Collection

334 items • Updated Mar 12 • 836

upvoted an article about 1 year ago

Article

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

+7

wenhuach, Haihao, weiweiz1, n1ck-guo, isaacmac, kding1, IlyasMoutawwakil, marcsun13, medmekk

•

Apr 29, 2025

• 45

upvoted a paper about 1 year ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 195

upvoted 2 papers over 1 year ago

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Paper • 2502.02358 • Published Feb 4, 2025 • 19

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Paper • 2502.04306 • Published Feb 6, 2025 • 19

upvoted an article over 1 year ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

+1

Leyo, HugoLaurencon, VictorSanh

•

Apr 15, 2024

• 191

upvoted 9 papers over 1 year ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9, 2025 • 98

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10, 2025 • 74

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13, 2025 • 55

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 91

Potential and Perils of Large Language Models as Judges of Unstructured Textual Data

Paper • 2501.08167 • Published Jan 14, 2025 • 6

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published Dec 24, 2024 • 74

Dhruvajyoti Sarma

AI & ML interests

Organizations

We Got Claude to Build CUDA Kernels and teach open models!

Optimizing your LLM in production

Matryoshka Representation Learning

🪆 Introduction to Matryoshka Embedding Models

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

Google's Gemma models family

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

Executable Code Actions Elicit Better LLM Agents

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

The GAN is dead; long live the GAN! A Modern GAN Baseline

VideoRAG: Retrieval-Augmented Generation over Video Corpus

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Tensor Product Attention Is All You Need

Potential and Perils of Large Language Models as Judges of Unstructured Textual Data

MiniMax-01: Scaling Foundation Models with Lightning Attention

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Transformer^2: Self-adaptive LLMs

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Dhruvajyoti Sarma

AI & ML interests

Organizations

dhruva-sarma's activity

We Got Claude to Build CUDA Kernels and teach open models!

Optimizing your LLM in production

🪆 Introduction to Matryoshka Embedding Models

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community