Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
8.8
TFLOPS
1
137
Sasi Kiran
sasikiran
Follow
21world's profile picture
shtefcs's profile picture
bullboiy's profile picture
3 followers
·
127 following
sasikiran_m
sasikiran
AI & ML interests
Large language models
Recent Activity
liked
a model
about 1 month ago
nvidia/Cosmos-Reason2-8B
reacted
to
codelion
's
post
with 🔥
about 1 month ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
liked
a model
about 1 month ago
zai-org/GLM-TTS
View all activity
Organizations
sasikiran
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
a
paper
7 months ago
MMSearch-R1: Incentivizing LMMs to Search
Paper
•
2506.20670
•
Published
Jun 25, 2025
•
64