tmarechaux 's Collections LLMs
updated
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
• 2309.09400
• Published • 87
PDFTriage: Question Answering over Long, Structured Documents
Paper
• 2309.08872
• Published • 55
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
• 2309.11495
• Published • 40
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published • 89
SCREWS: A Modular Framework for Reasoning with Revisions
Paper
• 2309.13075
• Published • 18
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
• 2309.14509
• Published • 21
Paper
• 2309.16609
• Published • 38
From Sparse to Dense: GPT-4 Summarization with Chain of Density
Prompting
Paper
• 2309.04269
• Published • 34
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
• 2310.10638
• Published • 30
Can LLMs Follow Simple Rules?
Paper
• 2311.04235
• Published • 13
Lumos: Learning Agents with Unified Data, Modular Design, and
Open-Source LLMs
Paper
• 2311.05657
• Published • 30
Lost in the Middle: How Language Models Use Long Contexts
Paper
• 2307.03172
• Published • 44
Challenges and Applications of Large Language Models
Paper
• 2307.10169
• Published • 51
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
GAIA: a benchmark for General AI Assistants
Paper
• 2311.12983
• Published • 246
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published • 41
Paper
• 2401.04088
• Published • 160
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published • 53
Generative Representational Instruction Tuning
Paper
• 2402.09906
• Published • 54
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published • 72
Can large language models explore in-context?
Paper
• 2403.15371
• Published • 33
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published • 111
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Towards a Unified View of Preference Learning for Large Language Models:
A Survey
Paper
• 2409.02795
• Published • 72
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
• 2411.00412
• Published • 10
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154