LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published • 89
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published • 67
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
• 2310.09263
• Published • 40
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published • 107
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper
• 2310.10944
• Published • 10
TableGPT: Towards Unifying Tables, Nature Language and Commands into One
GPT
Paper
• 2307.08674
• Published • 49
UniversalNER: Targeted Distillation from Large Language Models for Open
Named Entity Recognition
Paper
• 2308.03279
• Published • 24
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
• 2311.11501
• Published • 37
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published • 82
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
• 2401.00908
• Published • 191
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published • 27
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published • 82
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published • 85
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published • 32
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper
• 2402.12354
• Published • 7
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding
Paper
• 2401.04398
• Published • 25
A Systematic Survey of Prompt Engineering in Large Language Models:
Techniques and Applications
Paper
• 2402.07927
• Published • 2
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
• 2403.08763
• Published • 51
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper
• 2404.05961
• Published • 66
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published • 31
LoRA Learns Less and Forgets Less
Paper
• 2405.09673
• Published • 91
Data Engineering for Scaling Language Models to 128K Context
Paper
• 2402.10171
• Published • 25
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper
• 2403.13257
• Published • 21
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper
• 1910.02054
• Published • 11
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published • 50
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published • 25
Generative Representational Instruction Tuning
Paper
• 2402.09906
• Published • 54
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Paper
• 2407.05000
• Published
Trained Transformers Learn Linear Models In-Context
Paper
• 2306.09927
• Published
Attention Heads of Large Language Models: A Survey
Paper
• 2409.03752
• Published • 92