Krishna Kaasyap

KrishnaKaasyap

AI & ML interests

Test Time Training Multimodal & Inter-Modality Transfer Learning Mechanistic Interpretability Evolutionary Model Merging Swarm Intelligence of multiple models with different architectures and different algorithms MuZero approach to general tasks

Organizations

upvoted a collection 7 months ago

Olmo 3

Collection

Artifacts for the Olmo 3 release. • 7 items • Updated Mar 2 • 171

upvoted a collection 8 months ago

Deepseek Papers

Collection

Deepseek papers collection • 32 items • Updated about 10 hours ago • 353

upvoted 2 collections about 1 year ago

Gemma 3n Preview

Collection

4 items • Updated Mar 12 • 209

MiniMax-M1

Collection

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated Apr 15 • 119

upvoted a paper about 1 year ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 119

upvoted 5 collections about 1 year ago

upvoted an article over 1 year ago

Article

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

wolfram

•

Dec 4, 2024

• 80

upvoted a collection over 1 year ago

QwQ

Collection

Qwen with Questions • 6 items • Updated Dec 31, 2025 • 101

upvoted an article over 1 year ago

Article

Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well

rubenohana

•

Dec 2, 2024

• 19

upvoted a collection over 1 year ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 18 days ago • 156

upvoted a paper almost 2 years ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

upvoted 3 collections almost 2 years ago

Jamba 1.5

Collection

The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Mar 6, 2025 • 87

Magnum v2 123b

Collection

3 items • Updated Aug 21, 2024 • 6

DeepSeek-V2

Collection

8 items • Updated Nov 27, 2025 • 37

upvoted an article almost 2 years ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

philschmid, osanseviero, alvarobartt, lvwerra, dvilasuero, reach-vb, marcsun13, pcuenq

•

Jul 23, 2024

• 241

upvoted a paper almost 2 years ago

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

Krishna Kaasyap

AI & ML interests

Organizations

KrishnaKaasyap's activity

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context