LLM - a L-Hongbin Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

L-Hongbin 's Collections

MutiModal_Paper

MutiModal_Dataset

Optimizer_Papers

LLM

updated Nov 17, 2025

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 24
Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 24
Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published Nov 20, 2024 • 16
HuggingFaceTB/smoltalk

Viewer • Updated Feb 10, 2025 • 2.2M • 15.6k • 406
Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 48
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 68
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 53
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45
MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 26
nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Paper • 2410.01131 • Published Oct 1, 2024 • 10
O1-OPEN/OpenO1-SFT

Viewer • Updated Apr 22, 2025 • 77.7k • 2.34k • 386
AI-MO/NuminaMath-CoT

Viewer • Updated Nov 25, 2024 • 860k • 52k • 572
GAIR/o1-journey

Viewer • Updated Oct 16, 2024 • 327 • 184 • 134
allenai/tulu-3-sft-mixture

Viewer • Updated Dec 2, 2024 • 939k • 16.7k • 239
CASIA-LM/ChineseWebText2.0

Viewer • Updated Dec 2, 2024 • 2k • 2.62k • 29
Yi-Lightning Technical Report

Paper • 2412.01253 • Published Dec 2, 2024 • 28
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 95
Weighted-Reward Preference Optimization for Implicit Model Fusion

Paper • 2412.03187 • Published Dec 4, 2024 • 12
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 123
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Paper • 2412.13795 • Published Dec 18, 2024 • 20
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 379
A Post-Training Enhanced Optimization Approach for Small Language Models

Paper • 2411.02939 • Published Nov 5, 2024
amphora/QwQ-LongCoT-130K

Viewer • Updated Dec 22, 2024 • 133k • 318 • 152
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
O1-OPEN/OpenO1-SFT-Ultra

Viewer • Updated Mar 6, 2025 • 18.7M • 138 • 56
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published Dec 19, 2024 • 88
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Paper • 2412.17498 • Published Dec 23, 2024 • 22
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Paper • 2412.16849 • Published Dec 22, 2024 • 9
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 290
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 302
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training

Paper • 2501.08197 • Published Jan 14, 2025 • 9
simplescaling/s1K

Viewer • Updated Feb 11, 2025 • 1k • 2.08k • 239
Infi-MM/InfiMM-WebMath-40B

Viewer • Updated Jul 26, 2025 • 22.8M • 1.08k • 68
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published Feb 10, 2025 • 58
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14, 2025 • 28
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13, 2025 • 100
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15, 2025 • 31
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 124
bnadimi/PyraNet-Verilog

Viewer • Updated Sep 19, 2025 • 692k • 370 • 30
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19, 2025 • 83
Multi-Token Prediction Needs Registers

Paper • 2505.10518 • Published May 15, 2025 • 14
Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20, 2025 • 78
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23, 2025 • 81
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Paper • 2505.10320 • Published May 15, 2025 • 24
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Paper • 2506.08889 • Published Jun 10, 2025 • 23
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Paper • 2509.09995 • Published Sep 12, 2025 • 16
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Paper • 2509.09674 • Published Sep 11, 2025 • 81
Causal Attention with Lookahead Keys

Paper • 2509.07301 • Published Sep 9, 2025 • 21
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5, 2025 • 26
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

Paper • 2510.13778 • Published Oct 15, 2025 • 17
Direct Multi-Token Decoding

Paper • 2510.11958 • Published Oct 13, 2025 • 9
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published Oct 16, 2025 • 42
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 80
Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31, 2025 • 74
Motif 2 12.7B technical report

Paper • 2511.07464 • Published Nov 7, 2025 • 40
TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12, 2025 • 128

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs