daily-papers - a tyzhu Collection

tyzhu 's Collections

multimodal

long-context

knowledge

updated Mar 3, 2025

Upvote

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16, 2024 • 43
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Paper • 2409.11242 • Published Sep 17, 2024 • 7
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Paper • 2409.11136 • Published Sep 17, 2024 • 23
On the Diagram of Thought

Paper • 2409.10038 • Published Sep 16, 2024 • 13
Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
Large Language Models as Markov Chains

Paper • 2410.02724 • Published Oct 3, 2024 • 33
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 36
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 13
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

Paper • 2410.02115 • Published Oct 3, 2024 • 10
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Paper • 2410.02762 • Published Oct 3, 2024 • 9
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Paper • 2410.01335 • Published Oct 2, 2024 • 5
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published Oct 1, 2024 • 35
Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2, 2024 • 29
Quantifying Generalization Complexity for Large Language Models

Paper • 2410.01769 • Published Oct 2, 2024 • 13
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Paper • 2410.01518 • Published Oct 2, 2024 • 3
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54
Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 26
Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published Sep 21, 2024 • 29
LongGenBench: Long-context Generation Benchmark

Paper • 2410.04199 • Published Oct 5, 2024 • 22
Erasing Conceptual Knowledge from Language Models

Paper • 2410.02760 • Published Oct 3, 2024 • 14
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 182
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 47
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151
Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 25
Mentor-KD: Making Small Language Models Better Multi-step Reasoners

Paper • 2410.09037 • Published Oct 11, 2024 • 4
Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published Oct 12, 2024 • 16
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Paper • 2410.09008 • Published Oct 11, 2024 • 17
Mechanistic Permutability: Match Features Across Layers

Paper • 2410.07656 • Published Oct 10, 2024 • 20
SimpleStrat: Diversifying Language Model Generation with Stratification

Paper • 2410.09038 • Published Oct 11, 2024 • 4
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Paper • 2410.07035 • Published Oct 9, 2024 • 17
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published Oct 16, 2024 • 13
Exploring Model Kinship for Merging Large Language Models

Paper • 2410.12613 • Published Oct 16, 2024 • 21
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 51
What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22, 2024 • 31
Vector-ICL: In-context Learning with Continuous Vector Representations

Paper • 2410.05629 • Published Oct 8, 2024 • 4
Intriguing Properties of Large Language and Vision Models

Paper • 2410.04751 • Published Oct 7, 2024 • 16
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59
Pre-training Distillation for Large Language Models: A Design Space Exploration

Paper • 2410.16215 • Published Oct 21, 2024 • 17
In-context learning and Occam's razor

Paper • 2410.14086 • Published Oct 17, 2024 • 2
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29
How Do Training Methods Influence the Utilization of Vision Models?

Paper • 2410.14470 • Published Oct 18, 2024 • 5
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media

Paper • 2410.12791 • Published Oct 16, 2024 • 5
Counting Ability of Large Language Models and Impact of Tokenization

Paper • 2410.19730 • Published Oct 25, 2024 • 11
Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Paper • 2410.16090 • Published Oct 21, 2024 • 8
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Paper • 2410.19290 • Published Oct 25, 2024 • 10
On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18
Toxicity of the Commons: Curating Open-Source Pre-Training Data

Paper • 2410.22587 • Published Oct 29, 2024 • 10
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback

Paper • 2410.21242 • Published Oct 28, 2024 • 7
Task Vectors are Cross-Modal

Paper • 2410.22330 • Published Oct 29, 2024 • 11
RARe: Retrieval Augmented Retrieval with In-Context Examples

Paper • 2410.20088 • Published Oct 26, 2024 • 4
LongReward: Improving Long-context Large Language Models with AI Feedback

Paper • 2410.21252 • Published Oct 28, 2024 • 19
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published Oct 30, 2024 • 24
Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 18
Language Models can Self-Lengthen to Generate Long Texts

Paper • 2410.23933 • Published Oct 31, 2024 • 18
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Paper • 2411.01106 • Published Nov 2, 2024 • 4
Physics in Next-token Prediction

Paper • 2411.00660 • Published Nov 1, 2024 • 14
GPT or BERT: why not both?

Paper • 2410.24159 • Published Oct 31, 2024 • 14
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

Paper • 2411.00743 • Published Nov 1, 2024 • 7
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Paper • 2411.05000 • Published Nov 7, 2024 • 22
Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published Nov 7, 2024 • 24
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published Nov 6, 2024 • 49
DELIFT: Data Efficient Language model Instruction Fine Tuning

Paper • 2411.04425 • Published Nov 7, 2024 • 11
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Paper • 2411.04986 • Published Nov 7, 2024 • 5
Counterfactual Generation from Language Models

Paper • 2411.07180 • Published Nov 11, 2024 • 5
Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 49
Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12, 2024 • 65
Can sparse autoencoders be used to decompose and interpret steering vectors?

Paper • 2411.08790 • Published Nov 13, 2024 • 8
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published Nov 9, 2024 • 45
Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 24
Drowning in Documents: Consequences of Scaling Reranker Inference

Paper • 2411.11767 • Published Nov 18, 2024 • 19
Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 47
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published Nov 21, 2024 • 34
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Paper • 2411.14257 • Published Nov 21, 2024 • 14
Patience Is The Key to Large Language Model Reasoning

Paper • 2411.13082 • Published Nov 20, 2024 • 7
Loss-to-Loss Prediction: Scaling Laws for All Datasets

Paper • 2411.12925 • Published Nov 19, 2024 • 5
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Paper • 2411.18478 • Published Nov 27, 2024 • 37
Training Noise Token Pruning

Paper • 2411.18092 • Published Nov 27, 2024 • 1
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 53
Predicting Emergent Capabilities by Finetuning

Paper • 2411.16035 • Published Nov 25, 2024 • 7
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS

Paper • 2411.19655 • Published Nov 29, 2024 • 20
Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 34
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 62
Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Paper • 2412.04403 • Published Dec 5, 2024 • 2
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

Paper • 2412.04003 • Published Dec 5, 2024 • 10
Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19
Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 47
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published Dec 5, 2024 • 6
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Paper • 2412.06071 • Published Dec 8, 2024 • 9
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Paper • 2412.06676 • Published Dec 9, 2024 • 9
Learned Compression for Compressed Learning

Paper • 2412.09405 • Published Dec 12, 2024 • 13
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 49
Smaller Language Models Are Better Instruction Evolvers

Paper • 2412.11231 • Published Dec 15, 2024 • 28
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Paper • 2412.11919 • Published Dec 16, 2024 • 36
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 43
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Paper • 2412.12276 • Published Dec 16, 2024 • 15
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 93
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

Paper • 2412.13670 • Published Dec 18, 2024 • 6
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 162
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
GeAR: Generation Augmented Retrieval

Paper • 2501.02772 • Published Jan 6, 2025 • 21
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published Jan 29, 2025 • 59
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30, 2025 • 61
Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31, 2025 • 39
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115
The Geometry of Tokens in Internal Representations of Large Language Models

Paper • 2501.10573 • Published Jan 17, 2025 • 9
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447
Debate Helps Weak-to-Strong Generalization

Paper • 2501.13124 • Published Jan 21, 2025 • 7
LongRoPE2: Near-Lossless LLM Context Window Scaling

Paper • 2502.20082 • Published Feb 27, 2025 • 36
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18, 2025 • 74

Upvote

Collection guide
Browse collections