Ashwinatgsk (AShwin Venkat)

upvoted 2 articles 10 months ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

thomwolf, matthieu-lapeyre

•

Jul 9, 2025

• 798

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

rishiraj

•

Jun 26, 2025

• 50

upvoted a paper 11 months ago

Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

Paper • 2506.16504 • Published Jun 19, 2025 • 32

upvoted an article 11 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 258

upvoted 2 papers 11 months ago

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 159

upvoted a collection 11 months ago

Qwen3-Reranker

Collection

3 items • Updated Dec 31, 2025 • 67

upvoted a paper 11 months ago

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 54

upvoted a paper 12 months ago

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Paper • 2504.02542 • Published Apr 3, 2025 • 52

upvoted a paper about 1 year ago

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62

upvoted a collection over 1 year ago

Eagle

Collection

Eagle is a family of frontier vision-language models with data-centric strategies. The model supports both HD image and long-context video input. • 16 items • Updated 4 days ago • 41

upvoted 2 articles over 1 year ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

+2

danaaubakirova, Molbap, mshukor, cadene

•

Feb 4, 2025

• 192

Article

Llama can now see and run on your device - welcome Llama 3.2

+5

merve, philschmid, osanseviero, reach-vb, lewtun, ariG23498, pcuenq

•

Sep 25, 2024

• 191

upvoted a collection over 1 year ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Mar 12 • 89

upvoted a paper over 1 year ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published Sep 10, 2024 • 60

AShwin Venkat

AI & ML interests

Organizations

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

nanoVLM: The simplest repository to train your VLM in pure PyTorch

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Qwen3-Reranker

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Eagle

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Llama can now see and run on your device - welcome Llama 3.2

DataGemma Release

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

AShwin Venkat

AI & ML interests

Organizations

Ashwinatgsk's activity

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

nanoVLM: The simplest repository to train your VLM in pure PyTorch

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Llama can now see and run on your device - welcome Llama 3.2