EntRGi: Entropy Aware Reward Guidance for Diffusion Language Models Paper β’ 2602.05000 β’ Published Feb 4 β’ 1
view post Post 1627 Are you familiar with reverse residual connections or looping in language models?Excited to share my Looped-GPT blog post and codebase πhttps://github.com/sanyalsunny111/Looped-GPTTL;DR: looping during pre-training improves generalization.Plot shows GPT2 LMs pre-trained with 15.73B OWT tokensP.S. This is my first post here β I have ~4 followers and zero expectations for reach π See translation 3 replies Β· π§ 6 6 π 3 3 + Reply
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper β’ 2512.03383 β’ Published Dec 3, 2025 β’ 5
Sasha: Creative Goal-Oriented Reasoning in Smart Homes with Large Language Models Paper β’ 2305.09802 β’ Published May 16, 2023 β’ 1
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices Paper β’ 2509.02523 β’ Published Sep 2, 2025 β’ 21
Noise Contrastive Alignment of Language Models with Explicit Rewards Paper β’ 2402.05369 β’ Published Feb 8, 2024 β’ 2
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Paper β’ 2405.04233 β’ Published May 7, 2024 β’ 3
Rhapsody: A Dataset for Highlight Detection in Podcasts Paper β’ 2505.19429 β’ Published May 26, 2025 β’ 1
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Paper β’ 2506.08009 β’ Published Jun 9, 2025 β’ 30
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper β’ 2505.13444 β’ Published May 19, 2025 β’ 16
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Paper β’ 2503.22879 β’ Published Mar 28, 2025 β’ 9
Quamba: A Post-Training Quantization Recipe for Selective State Space Models Paper β’ 2410.13229 β’ Published Oct 17, 2024 β’ 1
Efficient Low-rank Backpropagation for Vision Transformer Adaptation Paper β’ 2309.15275 β’ Published Sep 26, 2023 β’ 1
MobileTL: On-device Transfer Learning with Inverted Residual Blocks Paper β’ 2212.03246 β’ Published Dec 5, 2022 β’ 1
Scaling Rich Style-Prompted Text-to-Speech Datasets Paper β’ 2503.04713 β’ Published Mar 6, 2025 β’ 1
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator Paper β’ 2503.01103 β’ Published Mar 3, 2025 β’ 5
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper β’ 2502.15894 β’ Published Feb 21, 2025 β’ 20
Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation Paper β’ 2310.03780 β’ Published Oct 5, 2023