my_read_book - a ryanafufu Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

ryanafufu 's Collections

my_read_book

updated about 2 hours ago

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10, 2024 • 32
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 42
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 30
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 115
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 50
Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20, 2024 • 69
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24, 2024 • 17
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 97
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 21
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10, 2025 • 65
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

Paper • 2501.06751 • Published Jan 12, 2025 • 32
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Paper • 2501.12202 • Published Jan 21, 2025 • 49
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

Paper • 2502.00299 • Published Feb 1, 2025 • 3
Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14, 2025 • 53
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25, 2025 • 36
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Paper • 2503.18886 • Published Mar 24, 2025 • 24
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Paper • 2504.09454 • Published Apr 13, 2025 • 11
FlowTok: Flowing Seamlessly Across Text and Image Tokens

Paper • 2503.10772 • Published Mar 13, 2025 • 19
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Paper • 2503.12271 • Published Mar 15, 2025 • 9
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Paper • 2504.16080 • Published Apr 22, 2025 • 15
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

Paper • 2503.10618 • Published Mar 13, 2025 • 19
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29, 2025 • 31
Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8, 2025 • 88
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7, 2025 • 65
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7, 2025 • 29
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 189
Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Paper • 2506.14603 • Published Jun 17, 2025 • 19
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

Paper • 2506.02327 • Published Jun 2, 2025 • 20
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Paper • 2506.09985 • Published Jun 11, 2025 • 31
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Paper • 2506.05010 • Published Jun 5, 2025 • 80
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Paper • 2506.17450 • Published Jun 20, 2025 • 64
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 130
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 136
Representation Shift: Unifying Token Compression with FlashAttention

Paper • 2508.00367 • Published Aug 1, 2025 • 16
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 272
Task structure and nonlinearity jointly determine learned representational geometry

Paper • 2401.13558 • Published Jan 24, 2024
DCPO: Dynamic Clipping Policy Optimization

Paper • 2509.02333 • Published Sep 2, 2025 • 22
DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 97
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Paper • 2511.20714 • Published Nov 25, 2025 • 50
Distribution Matching Distillation Meets Reinforcement Learning

Paper • 2511.13649 • Published Nov 17, 2025 • 5
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

Paper • 2509.21318 • Published Sep 25, 2025 • 11
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Paper • 2512.05150 • Published Dec 3, 2025 • 76
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Paper • 2512.04810 • Published Dec 4, 2025 • 26
Distribution Matching Variational AutoEncoder

Paper • 2512.07778 • Published Dec 8, 2025 • 29
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 62
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation

Paper • 2601.04823 • Published Jan 8 • 7
Phi-4-reasoning-vision-15B Technical Report

Paper • 2603.03975 • Published 1 day ago • 7

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs